Re: [sqlalchemy] sqlalchemy generated query returns fewer objects than rows

2016-06-16 Thread Damian Dimmich
Hi Michael,

Thanks for the speedy reply!

That makes sense.

Ok - so in this case, how would we actually get sqlalchemy to return N 
distinct 'content' items, rather than N content-plus-joined-table sets?  

All the results have full primary keys, so its more likely they are being 
de-duped.

Thank you,
Damian

On Tuesday, June 7, 2016 at 6:34:02 PM UTC+4, Mike Bayer wrote:
>
>
>
> On 06/07/2016 03:34 AM, Damian Dimmich wrote: 
> > Hi, 
> > 
> > We've been staring at a problem for some time, where an sqlalchemy 
> > generated query we run returns less objects when run with .all() than 
> > the number given to us with a .count(). 
> > 
> > The same, generated query which we get in our logs/from print(query) 
> > returns the same number of rows as the count() when run through 
> > DBSession.execute("..."). 
> > 
> > This happens with both sqlalchemy 1.0.12 and 1.0.13 
> > 
>
>
> the Query.all() method, assuming you are querying for complete 
> instances, will first off deduplicate a redundant instance, and secondly 
> if a row does not contain a full primary key, no instance will be 
> generated for that row.   The difference in row count is probably due to 
> one of these factors. 
>
>
>
> > 
> > So - running the below by hand using execute in the DBSession gives us 
> > the correct row count: 
> > 
> > SELECT content.id AS content_id, content.title AS 
> content_title, 
> > content.subtitle AS content_subtitle, content.teaser AS content_teaser, 
> > content.published_date AS content_published_date, content.published AS 
> > content_published, content.created_date AS content_created_date, 
> > content.microscription AS content_microscription, 
> > content.product_category_id AS content_product_category_id, 
> > content.discipline_type_id AS content_discipline_type_id, 
> > content.research_approach_id AS content_research_approach_id, 
> > content.weighting_action_id AS content_weighting_action_id, 
> > content.weighting_id AS content_weighting_id, content.channel_id AS 
> > content_channel_id, content.provider_id AS content_provider_id, 
> > content.thumb_url AS content_thumb_url, content.pdf AS content_pdf, 
> > content.meta AS content_meta, content.guide_price AS 
> > content_guide_price, content.currency_id AS content_currency_id, 
> > content.fade_pricing AS content_fade_pricing, content.min_fade_price AS 
> > content_min_fade_price, content.is_research AS content_is_research, 
> > content.non_material AS content_non_material, content.subscription_only 
> > AS content_subscription_only FROM content 
> > LEFT OUTER JOIN (content_sector_ass AS content_sector_ass_1 
> > JOIN sectors AS sectors_1 ON sectors_1.id = 
> > content_sector_ass_1.sector_id) ON content.id = 
> > content_sector_ass_1.content_id 
> > LEFT OUTER JOIN (content_asset_class_ass AS 
> > content_asset_class_ass_1 JOIN asset_classes AS asset_classes_1 ON 
> > asset_classes_1.id = content_asset_class_ass_1.asset_class_id) ON 
> > content.id = content_asset_class_ass_1.content_id 
> > LEFT OUTER JOIN (content_specialty_ass AS 
> > content_specialty_ass_1 JOIN specialties AS specialties_1 ON 
> > specialties_1.id = content_specialty_ass_1.specialty_id) ON content.id 
> = 
> > content_specialty_ass_1.content_id 
> > WHERE 
> > content.published = true 
> > AND content.published_date <= '2016-06-07' 
> > AND ( 
> > asset_classes_1.name = 'Equity' 
> > OR 
> > specialties_1.name = 'FundamentalGeneralEquity' 
> > ) 
> > AND sectors_1.name = 'Consumer Discretionary' 
> > ORDER BY content.published_date DESC, content.created_date DESC 
> > 
> > return 1663 rows. 
> > 
> > whereas running the sqlalchemy generated query that returns "Content" 
> > objects: 
> > 
> > 2016-06-07 11:24:29,477 INFO  [sqlalchemy.engine.base.Engine][waitress] 
> > 
> > SELECT content.id AS content_id, content.title AS content_title, 
> > content.subtitle AS content_subtitle, content.teaser AS content_teaser, 
> > content.published_date AS content_published_date, content.published AS 
> > content_published, content.created_date AS content_created_date, 
> > content.microscription AS content_microscription, 
> > content.product_category_id AS content_product_category_id, 
> > content.discipline_type_id AS content_discipline_type_id, 
> > content.research_approach_id AS content_research_approach_id, 
> > content.weighting_action_id AS content_weighting_action_id, 
> > content.weighting_id AS content_weighting_id, content.channel_id AS 
> > content_channel_id, content.provider_id AS content_provider_id, 
> > content.thumb_url AS content_thumb_url, content.pdf AS content_pdf, 
> > content.meta AS content_meta, content.guide_price AS 
> > content_guide_price, content.currency_id AS content_currency_id, 
> > content.fade_pricing AS content_fade_pricing, content.min_fade_price AS 
> > content_min_fade_price, 

Re: [sqlalchemy] sqlalchemy generated query returns fewer objects than rows

2016-06-07 Thread Mike Bayer



On 06/07/2016 03:34 AM, Damian Dimmich wrote:

Hi,

We've been staring at a problem for some time, where an sqlalchemy
generated query we run returns less objects when run with .all() than
the number given to us with a .count().

The same, generated query which we get in our logs/from print(query)
returns the same number of rows as the count() when run through
DBSession.execute("...").

This happens with both sqlalchemy 1.0.12 and 1.0.13




the Query.all() method, assuming you are querying for complete 
instances, will first off deduplicate a redundant instance, and secondly 
if a row does not contain a full primary key, no instance will be 
generated for that row.   The difference in row count is probably due to 
one of these factors.






So - running the below by hand using execute in the DBSession gives us
the correct row count:

SELECT content.id AS content_id, content.title AS content_title,
content.subtitle AS content_subtitle, content.teaser AS content_teaser,
content.published_date AS content_published_date, content.published AS
content_published, content.created_date AS content_created_date,
content.microscription AS content_microscription,
content.product_category_id AS content_product_category_id,
content.discipline_type_id AS content_discipline_type_id,
content.research_approach_id AS content_research_approach_id,
content.weighting_action_id AS content_weighting_action_id,
content.weighting_id AS content_weighting_id, content.channel_id AS
content_channel_id, content.provider_id AS content_provider_id,
content.thumb_url AS content_thumb_url, content.pdf AS content_pdf,
content.meta AS content_meta, content.guide_price AS
content_guide_price, content.currency_id AS content_currency_id,
content.fade_pricing AS content_fade_pricing, content.min_fade_price AS
content_min_fade_price, content.is_research AS content_is_research,
content.non_material AS content_non_material, content.subscription_only
AS content_subscription_only FROM content
LEFT OUTER JOIN (content_sector_ass AS content_sector_ass_1
JOIN sectors AS sectors_1 ON sectors_1.id =
content_sector_ass_1.sector_id) ON content.id =
content_sector_ass_1.content_id
LEFT OUTER JOIN (content_asset_class_ass AS
content_asset_class_ass_1 JOIN asset_classes AS asset_classes_1 ON
asset_classes_1.id = content_asset_class_ass_1.asset_class_id) ON
content.id = content_asset_class_ass_1.content_id
LEFT OUTER JOIN (content_specialty_ass AS
content_specialty_ass_1 JOIN specialties AS specialties_1 ON
specialties_1.id = content_specialty_ass_1.specialty_id) ON content.id =
content_specialty_ass_1.content_id
WHERE
content.published = true
AND content.published_date <= '2016-06-07'
AND (
asset_classes_1.name = 'Equity'
OR
specialties_1.name = 'FundamentalGeneralEquity'
)
AND sectors_1.name = 'Consumer Discretionary'
ORDER BY content.published_date DESC, content.created_date DESC

return 1663 rows.

whereas running the sqlalchemy generated query that returns "Content"
objects:

2016-06-07 11:24:29,477 INFO  [sqlalchemy.engine.base.Engine][waitress]

SELECT content.id AS content_id, content.title AS content_title,
content.subtitle AS content_subtitle, content.teaser AS content_teaser,
content.published_date AS content_published_date, content.published AS
content_published, content.created_date AS content_created_date,
content.microscription AS content_microscription,
content.product_category_id AS content_product_category_id,
content.discipline_type_id AS content_discipline_type_id,
content.research_approach_id AS content_research_approach_id,
content.weighting_action_id AS content_weighting_action_id,
content.weighting_id AS content_weighting_id, content.channel_id AS
content_channel_id, content.provider_id AS content_provider_id,
content.thumb_url AS content_thumb_url, content.pdf AS content_pdf,
content.meta AS content_meta, content.guide_price AS
content_guide_price, content.currency_id AS content_currency_id,
content.fade_pricing AS content_fade_pricing, content.min_fade_price AS
content_min_fade_price, content.is_research AS content_is_research,
content.non_material AS content_non_material, content.subscription_only
AS content_subscription_only
FROM content

LEFT OUTER JOIN (content_sector_ass AS content_sector_ass_1 JOIN
sectors AS sectors_1 ON sectors_1.id = content_sector_ass_1.sector_id)
ON content.id = content_sector_ass_1.content_id

LEFT OUTER JOIN (content_asset_class_ass AS
content_asset_class_ass_1 JOIN asset_classes AS asset_classes_1 ON
asset_classes_1.id = content_asset_class_ass_1.asset_class_id) ON
content.id = content_asset_class_ass_1.content_id

LEFT OUTER JOIN (content_specialty_ass AS content_specialty_ass_1
JOIN specialties AS specialties_1 ON specialties_1.id =
content_specialty_ass_1.specialty_id) ON content.id =
content_specialty_ass_1.content_id


[sqlalchemy] sqlalchemy generated query returns fewer objects than rows

2016-06-07 Thread Damian Dimmich
Hi,

We've been staring at a problem for some time, where an sqlalchemy
generated query we run returns less objects when run with .all() than
the number given to us with a .count(). 

The same, generated query which we get in our logs/from print(query)
returns the same number of rows as the count() when run through
DBSession.execute("...").

This happens with both sqlalchemy 1.0.12 and 1.0.13


So - running the below by hand using execute in the DBSession gives us
the correct row count:

SELECT content.id AS content_id, content.title AS content_title,
content.subtitle AS content_subtitle, content.teaser AS content_teaser,
content.published_date AS content_published_date, content.published AS
content_published, content.created_date AS content_created_date,
content.microscription AS content_microscription,
content.product_category_id AS content_product_category_id,
content.discipline_type_id AS content_discipline_type_id,
content.research_approach_id AS content_research_approach_id,
content.weighting_action_id AS content_weighting_action_id,
content.weighting_id AS content_weighting_id, content.channel_id AS
content_channel_id, content.provider_id AS content_provider_id,
content.thumb_url AS content_thumb_url, content.pdf AS content_pdf,
content.meta AS content_meta, content.guide_price AS
content_guide_price, content.currency_id AS content_currency_id,
content.fade_pricing AS content_fade_pricing, content.min_fade_price AS
content_min_fade_price, content.is_research AS content_is_research,
content.non_material AS content_non_material, content.subscription_only
AS content_subscription_only FROM content
LEFT OUTER JOIN (content_sector_ass AS content_sector_ass_1
JOIN sectors AS sectors_1 ON sectors_1.id =
content_sector_ass_1.sector_id) ON content.id =
content_sector_ass_1.content_id
LEFT OUTER JOIN (content_asset_class_ass AS
content_asset_class_ass_1 JOIN asset_classes AS asset_classes_1 ON
asset_classes_1.id = content_asset_class_ass_1.asset_class_id) ON
content.id = content_asset_class_ass_1.content_id
LEFT OUTER JOIN (content_specialty_ass AS
content_specialty_ass_1 JOIN specialties AS specialties_1 ON
specialties_1.id = content_specialty_ass_1.specialty_id) ON content.id =
content_specialty_ass_1.content_id
WHERE
content.published = true
AND content.published_date <= '2016-06-07'
AND (
asset_classes_1.name = 'Equity'
OR
specialties_1.name = 'FundamentalGeneralEquity'
)
AND sectors_1.name = 'Consumer Discretionary'
ORDER BY content.published_date DESC, content.created_date DESC

return 1663 rows.

whereas running the sqlalchemy generated query that returns "Content"
objects:

2016-06-07 11:24:29,477 INFO  [sqlalchemy.engine.base.Engine][waitress]

SELECT content.id AS content_id, content.title AS content_title,
content.subtitle AS content_subtitle, content.teaser AS content_teaser,
content.published_date AS content_published_date, content.published AS
content_published, content.created_date AS content_created_date,
content.microscription AS content_microscription,
content.product_category_id AS content_product_category_id,
content.discipline_type_id AS content_discipline_type_id,
content.research_approach_id AS content_research_approach_id,
content.weighting_action_id AS content_weighting_action_id,
content.weighting_id AS content_weighting_id, content.channel_id AS
content_channel_id, content.provider_id AS content_provider_id,
content.thumb_url AS content_thumb_url, content.pdf AS content_pdf,
content.meta AS content_meta, content.guide_price AS
content_guide_price, content.currency_id AS content_currency_id,
content.fade_pricing AS content_fade_pricing, content.min_fade_price AS
content_min_fade_price, content.is_research AS content_is_research,
content.non_material AS content_non_material, content.subscription_only
AS content_subscription_only
FROM content

LEFT OUTER JOIN (content_sector_ass AS content_sector_ass_1 JOIN
sectors AS sectors_1 ON sectors_1.id = content_sector_ass_1.sector_id)
ON content.id = content_sector_ass_1.content_id

LEFT OUTER JOIN (content_asset_class_ass AS
content_asset_class_ass_1 JOIN asset_classes AS asset_classes_1 ON
asset_classes_1.id = content_asset_class_ass_1.asset_class_id) ON
content.id = content_asset_class_ass_1.content_id

LEFT OUTER JOIN (content_specialty_ass AS content_specialty_ass_1
JOIN specialties AS specialties_1 ON specialties_1.id =
content_specialty_ass_1.specialty_id) ON content.id =
content_specialty_ass_1.content_id

WHERE

content.published = true

AND content.published_date <= %(published_date_1)s

AND (

 asset_classes_1.name = %(name_1)s

OR specialties_1.name = %(name_2)s

)

AND sectors_1.name = %(name_3)s ORDER BY content.published_date
DESC, content.created_date DESC

2016-06-07 11:24:29,478 INFO