Re: [sqlalchemy] DISTINCT with LIMIT problem

2013-05-31 Thread Kent
Thanks very much!  I got it to work apparently fine using from_self(). 
I didn't seem to need anything special for eager loads to continue to 
function... were you only expecting I'd have troubles with eager loads 
if I used subquery()?


On 5/30/2013 6:29 PM, Michael Bayer wrote:


On May 30, 2013, at 6:06 PM, Kent jkentbo...@gmail.com wrote:


Thank you, I'll try that, but quick concern:  I specifically skipped trying to use 
.subquery() because the docs say Eager JOIN generation within the query is 
disabled.

Doesn't that mean I won't get my joinedload() results from the inner query?

Or does that refer to the outer query having eager join disabled?



if you want to eager load also from that subquery, you need to sitck it into an 
aliased:


MySubqClass = aliased(MyClass, subq)

query(x, MySubqClass).options(joinedload(MySubqClass.foobar))







On 5/30/2013 5:54 PM, Michael Bayer wrote:


On May 30, 2013, at 5:19 PM, Kent jkentbo...@gmail.com
mailto:jkentbo...@gmail.com wrote:



Solution A:

Group by all columns (yielding the same effect as distinct), but
which makes the window analytical function process *after* the group
by and yields the correct count (17 instead of 72):


are all those columns indexed?  even if they are, crappy query...



OR Solution B:

Put the count(*) over () in an outer select, like this:

select count(*) over () as recordcount, anon.*
from (
  select distinct tablea.colx, tableb.coly
  from tablea, tableb
) as anon
limit 100


this is very much how SQLAlchemy wants you to do it.



Either solution yields the correct answer I believe, but I'm having
difficulty translating the SQL that I know will work into sqlalchemy land.


For Solution B, I don't know how to wrap my query in an outer select
(similar to the LIMIT implementation for Oracle) in a way that will
still allow sqlalchemy to extract rows into instances:


from_self() can do this (probably use add_column() for the window
function), or subquery() should work very well.   subq = q.subquery(); q
= query(func.count('*').over().label(..), subq); .

send me a quick example if that's not working and I'll work out the query.

--
You received this message because you are subscribed to a topic in the
Google Groups sqlalchemy group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/sqlalchemy/_U28GXXR6sg/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to
sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.






--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] DISTINCT with LIMIT problem

2013-05-31 Thread Charlie Clark

Am 30.05.2013, 23:19 Uhr, schrieb Kent jkentbo...@gmail.com:


For example, a query may look like this:



select distinct
  count(*) over () as recordcount, tablea.colx, tableb.coly
from tablea, tableb
where 
limit 100



This doesn't *quite* work because the analytical window function count(*)
over() is applied *before* the distinct, so the count returns the wrong
number (a Cartesian effect, returning 72 instead of 17, in this example).


Why are you generating Cartesian products? DISTINCT is designed to work on  
denormalised result sets, ie. those which can contain duplicates. Can't  
you avoid this with a join between your tables?


Charlie
--
Charlie Clark
Managing Director
Clark Consulting  Research
German Office
Kronenstr. 27a
Düsseldorf
D- 40217
Tel: +49-211-600-3657
Mobile: +49-178-782-6226

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] DISTINCT with LIMIT problem

2013-05-31 Thread Kent
I allow the user to join with other tables for the purpose of filtering 
(even though the joined tables won't be selected).  Cartesian is 
probably the wrong term for the effect, but in the end, I get duplicate 
rows.  I could get rid of the need for distinct by extensively using 
EXISTS clauses instead of joins; this is true.


But when several tables are chained to together with joins, I expect 
using EXISTS to become less manageable and to perform poorer.  (I could 
be wrong on both accounts.)


For example, our interface may allow the query of Employee records.  But 
the user might join with the EmailAddress table to strictly filter results.


Employee records:
idname
  
1 kent
2 charlie

EmailAddress records:
empid   address
===  
1k...@mymail.goo
1k...@mymail.goo
1k...@gmail.de
2char...@gmail.de

session.query(Employee).join(EmailAddress).filter(EmailAddress.contains('@'))

Remember, we are only selecting emp.id, emp.name (but joining with 
another table).


So without DISTINCT:

idname
  
1 kent
1 kent
1 kent
2 charlie

With DISTINCT:

idname
  
1 kent
2 charlie


Like I say, using EXISTS would remove the need for DISTINCT, but I 
haven't gone down that path...



On 5/31/2013 8:41 AM, Charlie Clark wrote:

Am 30.05.2013, 23:19 Uhr, schrieb Kent jkentbo...@gmail.com:


For example, a query may look like this:



select distinct
  count(*) over () as recordcount, tablea.colx, tableb.coly
from tablea, tableb
where 
limit 100



This doesn't *quite* work because the analytical window function count(*)
over() is applied *before* the distinct, so the count returns the wrong
number (a Cartesian effect, returning 72 instead of 17, in this example).


Why are you generating Cartesian products? DISTINCT is designed to work
on denormalised result sets, ie. those which can contain duplicates.
Can't you avoid this with a join between your tables?

Charlie


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[sqlalchemy] DISTINCT with LIMIT problem

2013-05-30 Thread Kent
We use func.count().over() in order to help support result pagination.   
When attempting to limit the result set, I have found that if other tables 
are being joined (for the where clause, but not selected), then I need to 
add DISTINCT to the query or else the Cartesian result of my query messes 
up LIMIT.  (There are, say, 72 rows returned, where this only represents 17 
distinct records, for example.)

For example, a query may look like this:

select distinct 
  count(*) over () as recordcount, tablea.colx, tableb.coly
from tablea, tableb
where 
limit 100

This doesn't *quite* work because the analytical window function count(*) 
over() is applied *before* the distinct, so the count returns the wrong 
number (a Cartesian effect, returning 72 instead of 17, in this example).  

I have two potential solutions: 

Solution A:

Group by all columns (yielding the same effect as distinct), but which 
makes the window analytical function process *after* the group by and 
yields the correct count (17 instead of 72):

select count(*) over () as recordcount, tablea.colx, tableb.coly
from tablea, tableb
where ...
group by tablea.colx, tableb.coly *[all columns]*
limit 100

OR Solution B:

Put the count(*) over () in an outer select, like this:

select count(*) over () as recordcount, anon.* 
from (
  select distinct tablea.colx, tableb.coly
  from tablea, tableb
) as anon
limit 100

Either solution yields the correct answer I believe, but I'm having 
difficulty translating the SQL that I know will work into sqlalchemy land.

For Solution A, in the case of wanting to group by, I don't know how to get 
the full list of all selected columns to add to the group_by in such a way 
that even joinedload() will be included in the group by:

q = Session.query(class).join(joins).filter(...).option(joinedload(...))

q = q.group_by(* ??? How to tell sqlalchemy to group by all selected 
columns, even those which will be join loaded ???* )  

q = q.add_column(func.count().over().label('recordcount'))

For Solution B, I don't know how to wrap my query in an outer select 
(similar to the LIMIT implementation for Oracle) in a way that will still 
allow sqlalchemy to extract rows into instances:

This renders the correct SQL, I think:
qry = Session.query(qry.with_labels().statement, 
func.count().over().label('recordcount'))  

But I'm using SQL statement here so sqlalchemy won't translate result rows 
into object instances.

Can you point me in the right direction for one of these 2 solutions, 
please?

Many thanks,
Kent

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] DISTINCT with LIMIT problem

2013-05-30 Thread Michael Bayer

On May 30, 2013, at 5:19 PM, Kent jkentbo...@gmail.com wrote:

 
 Solution A:
 
 Group by all columns (yielding the same effect as distinct), but which 
 makes the window analytical function process after the group by and yields 
 the correct count (17 instead of 72):

are all those columns indexed?  even if they are, crappy query...

 
 OR Solution B:
 
 Put the count(*) over () in an outer select, like this:
 
 select count(*) over () as recordcount, anon.* 
 from (
   select distinct tablea.colx, tableb.coly
   from tablea, tableb
 ) as anon
 limit 100

this is very much how SQLAlchemy wants you to do it.

 
 Either solution yields the correct answer I believe, but I'm having 
 difficulty translating the SQL that I know will work into sqlalchemy land.
 
 
 For Solution B, I don't know how to wrap my query in an outer select (similar 
 to the LIMIT implementation for Oracle) in a way that will still allow 
 sqlalchemy to extract rows into instances:

from_self() can do this (probably use add_column() for the window function), or 
subquery() should work very well.   subq = q.subquery(); q = 
query(func.count('*').over().label(..), subq); .

send me a quick example if that's not working and I'll work out the query.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] DISTINCT with LIMIT problem

2013-05-30 Thread Kent
Thank you, I'll try that, but quick concern:  I specifically skipped 
trying to use .subquery() because the docs say Eager JOIN generation 
within the query is disabled.


Doesn't that mean I won't get my joinedload() results from the inner query?

Or does that refer to the outer query having eager join disabled?


On 5/30/2013 5:54 PM, Michael Bayer wrote:


On May 30, 2013, at 5:19 PM, Kent jkentbo...@gmail.com
mailto:jkentbo...@gmail.com wrote:



Solution A:

Group by all columns (yielding the same effect as distinct), but
which makes the window analytical function process *after* the group
by and yields the correct count (17 instead of 72):


are all those columns indexed?  even if they are, crappy query...



OR Solution B:

Put the count(*) over () in an outer select, like this:

select count(*) over () as recordcount, anon.*
from (
  select distinct tablea.colx, tableb.coly
  from tablea, tableb
) as anon
limit 100


this is very much how SQLAlchemy wants you to do it.



Either solution yields the correct answer I believe, but I'm having
difficulty translating the SQL that I know will work into sqlalchemy land.


For Solution B, I don't know how to wrap my query in an outer select
(similar to the LIMIT implementation for Oracle) in a way that will
still allow sqlalchemy to extract rows into instances:


from_self() can do this (probably use add_column() for the window
function), or subquery() should work very well.   subq = q.subquery(); q
= query(func.count('*').over().label(..), subq); .

send me a quick example if that's not working and I'll work out the query.

--
You received this message because you are subscribed to a topic in the
Google Groups sqlalchemy group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/sqlalchemy/_U28GXXR6sg/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to
sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] DISTINCT with LIMIT problem

2013-05-30 Thread Michael Bayer

On May 30, 2013, at 6:06 PM, Kent jkentbo...@gmail.com wrote:

 Thank you, I'll try that, but quick concern:  I specifically skipped trying 
 to use .subquery() because the docs say Eager JOIN generation within the 
 query is disabled.
 
 Doesn't that mean I won't get my joinedload() results from the inner query?
 
 Or does that refer to the outer query having eager join disabled?


if you want to eager load also from that subquery, you need to sitck it into an 
aliased:


MySubqClass = aliased(MyClass, subq)

query(x, MySubqClass).options(joinedload(MySubqClass.foobar))




 
 
 On 5/30/2013 5:54 PM, Michael Bayer wrote:
 
 On May 30, 2013, at 5:19 PM, Kent jkentbo...@gmail.com
 mailto:jkentbo...@gmail.com wrote:
 
 
 Solution A:
 
 Group by all columns (yielding the same effect as distinct), but
 which makes the window analytical function process *after* the group
 by and yields the correct count (17 instead of 72):
 
 are all those columns indexed?  even if they are, crappy query...
 
 
 OR Solution B:
 
 Put the count(*) over () in an outer select, like this:
 
 select count(*) over () as recordcount, anon.*
 from (
  select distinct tablea.colx, tableb.coly
  from tablea, tableb
 ) as anon
 limit 100
 
 this is very much how SQLAlchemy wants you to do it.
 
 
 Either solution yields the correct answer I believe, but I'm having
 difficulty translating the SQL that I know will work into sqlalchemy land.
 
 
 For Solution B, I don't know how to wrap my query in an outer select
 (similar to the LIMIT implementation for Oracle) in a way that will
 still allow sqlalchemy to extract rows into instances:
 
 from_self() can do this (probably use add_column() for the window
 function), or subquery() should work very well.   subq = q.subquery(); q
 = query(func.count('*').over().label(..), subq); .
 
 send me a quick example if that's not working and I'll work out the query.
 
 --
 You received this message because you are subscribed to a topic in the
 Google Groups sqlalchemy group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/sqlalchemy/_U28GXXR6sg/unsubscribe?hl=en.
 To unsubscribe from this group and all its topics, send an email to
 sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
 For more options, visit https://groups.google.com/groups/opt_out.
 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
 For more options, visit https://groups.google.com/groups/opt_out.
 
 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.