Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-20 Thread Martijn van Oosterhout
Ok, so this is what I have for today. It works, and handles all kinds of 
corner cases and yet it's not quite what I want. It does everything as a 
joinedload. It's much easier to use now though.

You can do things like:

q = Session.query(Animal)

for animal in yielded_load(q, (joinedload(Animal.owner).joinedload(Human.
family),
   joinedload(Animal.species).joinedload(Species
.phylum)):
do_something(animal)

It says joinedload() but it doesn't actually pay attention to that, it just 
uses it to determine the path. It would be really nice to be able to 
specify that some things should be fetched using subqueryload(), but that 
would require unpacking/manipulating the Load objects and I don't think 
there's a supported interface for that. Additionally, it would be nice if 
could notice that paths share a common prefix and only fetch those once. 
Still, for the amount of code it's pretty effective.

from itertools import groupby, islice
from sqlalchemy.orm import attributes, Load, aliased
from sqlalchemy import tuple_


def yielded_load(query, load_options, N=1000):
# Note: query must return only a single object (for now anyway)
main_query = query.yield_per(N)

main_res = iter(main_query)

while True:
# Fetch block of results from query
objs = list(islice(main_res, N))

if not objs:
break

for load_option in load_options:
# Get path of attributes to follow
path = load_option.path
pk = path[0].prop.parent.primary_key

# Generate query that joins against original table
child_q = main_query.session.query().order_by(*pk)

for i, attr in enumerate(path):
if i == 0:
# For the first relationship we add the target and the 
pkey columns
# Note: add_columns() doesn't work here? 
with_entities() does
next_table = attr.prop.target
child_q = child_q.join(next_table, attr)
child_q = child_q.with_entities(attr.prop.mapper).
add_columns(*pk)
if attr.prop.order_by:
child_q = child_q.order_by(*attr.prop.order_by)
opts = Load(attr.prop.mapper)
else:
# For all relationships after the first we can use 
contains_eager
# Note: The aliasing is to handle cases where the 
relationships loop
next_table = aliased(attr.prop.target)
child_q = child_q.join(next_table, attr, isouter=True)
opts = opts.contains_eager(attr, alias=next_table)

child_q = child_q.options(opts)

keys = [[getattr(obj, col.key) for col in pk] for obj in objs]

child_q = child_q.filter(tuple_(*pk).in_(keys))

# Here we use the fact that the first column is the target 
object
collections = dict((k, [r[0] for r in v]) for k, v in groupby(
child_q,
lambda x: tuple([getattr(x, c.key) for c in pk])
))

for obj in objs:
# We can traverse many-to-one and one-to-many
if path[0].prop.uselist:
attributes.set_committed_value(
obj,
path[0].key,
collections.get(
tuple(getattr(obj, c.key) for c in pk),
())
)
else:
attributes.set_committed_value(
obj,
path[0].key,
collections.get(
tuple(getattr(obj, c.key) for c in pk),
[None])[0]
)

for obj in objs:
yield obj


-- 
Martijn

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-20 Thread Martijn van Oosterhout
On 19 July 2016 at 23:22, Mike Bayer  wrote:

>
>
> On 07/19/2016 05:20 PM, Martijn van Oosterhout wrote:
>
>>
>>
>> Thanks. On the way home though I had a thought: wouldn't it be simpler
>> to run the original query with yield_from(), and then after each block
>> run the query with a filter on the primary keys returned, and add all
>> the joinedload/subqueryload/etc options to this query, run it and rely
>> on the identity map to fix it for the objects returned the first time.
>> Or is that something we cannot rely on?
>>
>
> it works for the loading you're doing, where the primary keys of what's
> been fetched are fed into the subsequent query.  But it doesnt work for
> current subquery loading which does not make use of those identifiers, nor
> for joined loading which does OUTER JOIN onto the original query at once
> (doing "joinedload" as a separate query is essentially what subquery
> loading already is).
>
>
Ah, good point. Pity. I like the whole generative interface for the
joinedload/subqueryload/etc and would have liked to reuse that machinery
somehow. Given I'm trying to eager load several levels of relationships,
it'd be nice to automate that somehow...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-19 Thread Mike Bayer



On 07/19/2016 05:20 PM, Martijn van Oosterhout wrote:


On 19 July 2016 at 18:42, Mike Bayer > wrote:



On 07/19/2016 12:33 PM, Martijn van Oosterhout wrote:

On Sunday, July 17, 2016 at 8:47:11 AM UTC+2, Martijn van
Oosterhout wrote:

But in any case, even this improves performance greatly.


this is adequate for a new recipe if you are interested in adding
it. Also I think sqlalchemy-utils provides a feature along these
lines but I'm not sure how well it works or handles those harder
cases like chained relationships.


Thanks. On the way home though I had a thought: wouldn't it be simpler
to run the original query with yield_from(), and then after each block
run the query with a filter on the primary keys returned, and add all
the joinedload/subqueryload/etc options to this query, run it and rely
on the identity map to fix it for the objects returned the first time.
Or is that something we cannot rely on?


it works for the loading you're doing, where the primary keys of what's 
been fetched are fed into the subsequent query.  But it doesnt work for 
current subquery loading which does not make use of those identifiers, 
nor for joined loading which does OUTER JOIN onto the original query at 
once (doing "joinedload" as a separate query is essentially what 
subquery loading already is).






Have a nice day,
--
Martijn van Oosterhout >
http://svana.org/kleptog/

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-19 Thread Martijn van Oosterhout
On 19 July 2016 at 18:42, Mike Bayer  wrote:

>
>
> On 07/19/2016 12:33 PM, Martijn van Oosterhout wrote:
>
>> On Sunday, July 17, 2016 at 8:47:11 AM UTC+2, Martijn van Oosterhout
>> wrote:
>>
>> But in any case, even this improves performance greatly.
>>
>
> this is adequate for a new recipe if you are interested in adding it. Also
> I think sqlalchemy-utils provides a feature along these lines but I'm not
> sure how well it works or handles those harder cases like chained
> relationships.
>
>
Thanks. On the way home though I had a thought: wouldn't it be simpler to
run the original query with yield_from(), and then after each block run the
query with a filter on the primary keys returned, and add all the
joinedload/subqueryload/etc options to this query, run it and rely on the
identity map to fix it for the objects returned the first time. Or is that
something we cannot rely on?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-19 Thread Mike Bayer



On 07/19/2016 12:33 PM, Martijn van Oosterhout wrote:

On Sunday, July 17, 2016 at 8:47:11 AM UTC+2, Martijn van Oosterhout wrote:


I'll play a bit and see what I can get to work. Thanks again.


So, I have a chance to play and got something that actually works quite
nicely, see below. Two things:

- I switched to referencing the primary key of the original object
directly, because some of our relationships are a bit more complex.
- Chained relationships don't work.

But in any case, even this improves performance greatly.


this is adequate for a new recipe if you are interested in adding it. 
Also I think sqlalchemy-utils provides a feature along these lines but 
I'm not sure how well it works or handles those harder cases like 
chained relationships.







|
fromitertools importgroupby,islice
fromsqlalchemy.orm importattributes,object_session
fromsqlalchemy importtuple_


defyielded_load(query,attrs,N=1000):
# Note: query must return only a single object (for now anyway)
main_query =query.yield_per(N)

main_res =iter(main_query)

whileTrue:
# Fetch block of results from query
objs =list(islice(main_res,N))

ifnotobjs:
break

forattr inattrs:
target =attr.prop.mapper
pk =attr.prop.parent.primary_key

# Generate query that joins against original table
child_q =object_session(objs[0]).query(target,*pk).order_by(*pk)
ifattr.prop.order_by:
child_q =child_q.order_by(*attr.prop.order_by)

keys =[[getattr(obj,col.key)forcol inpk]forobj inobjs]

child_q =child_q.join(attr).filter(tuple_(*pk).in_(keys))

collections =dict((k,[r[0]forr inv])fork,v ingroupby(
child_q,
lambdax:tuple([getattr(x,c.key)forc inpk])
))

forobj inobjs:
attributes.set_committed_value(
obj,
attr.key,
collections.get(
tuple(getattr(obj,c.key)forc inpk),
())
)

forobj inobjs:
yieldobj

|


--
Martijn

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-19 Thread Martijn van Oosterhout
On Sunday, July 17, 2016 at 8:47:11 AM UTC+2, Martijn van Oosterhout wrote:
>
>
> I'll play a bit and see what I can get to work. Thanks again.
>
>
So, I have a chance to play and got something that actually works quite 
nicely, see below. Two things:

- I switched to referencing the primary key of the original object 
directly, because some of our relationships are a bit more complex.
- Chained relationships don't work.

But in any case, even this improves performance greatly.


from itertools import groupby, islice
from sqlalchemy.orm import attributes, object_session
from sqlalchemy import tuple_


def yielded_load(query, attrs, N=1000):
# Note: query must return only a single object (for now anyway)
main_query = query.yield_per(N)

main_res = iter(main_query)

while True:
# Fetch block of results from query
objs = list(islice(main_res, N))

if not objs:
break

for attr in attrs:
target = attr.prop.mapper
pk = attr.prop.parent.primary_key

# Generate query that joins against original table
child_q = object_session(objs[0]).query(target, *pk).order_by(*
pk)
if attr.prop.order_by:
child_q = child_q.order_by(*attr.prop.order_by)

keys = [[getattr(obj, col.key) for col in pk] for obj in objs]

child_q = child_q.join(attr).filter(tuple_(*pk).in_(keys))

collections = dict((k, [r[0] for r in v]) for k, v in groupby(
child_q, 
lambda x:tuple([getattr(x, c.key) for c in pk])
))

for obj in objs:
attributes.set_committed_value(
obj, 
attr.key, 
collections.get(
tuple(getattr(obj, c.key) for c in pk),
())
)

for obj in objs:
yield obj


 
-- 
Martijn

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-17 Thread Martijn van Oosterhout
On 15 Jul 2016 23:57, "Mike Bayer"  wrote:
>
>
>
> OK, but the IN loading you're doing, that's good too, I'd like to add a
loader which uses that someday, and you're right, if we did, we could make
it work with yield_per too due to the nature of it.  If you want to use
that approach take a look at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading
.

Wow, that's another really nice example. Those usage examples really need
some more prominence in the google search results.

The windowed approach has the advantage of the queries being simpler, but
requires scanning the table twice (although the first might be cheaper).
Using the IN() approach is more general perhaps, but the queries would be
much longer so more traffic to the server. It would be nice if it could all
be made to Just Work.

I'll play a bit and see what I can get to work. Thanks again.

Martijn

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Mike Bayer



On 07/15/2016 05:14 PM, Martijn van Oosterhout wrote:


On 15 July 2016 at 18:46, Mike Bayer > wrote:


Here's the problem that cant be solved:

1. fetch rows 0-1000

2. start loading Foo objects:

 Foo(1) ->  eagerload Foo.related with 400 members
 Foo(2) ->  eagerload Foo.related with 500 members
 Foo(3) -> eagerload Foo.related with the first 100 of 250 members

3. yield.  That is, stop reading more rows.  Send out Foo(1),
Foo(2), Foo(3).  The application now can go nuts with these.  It can
read them, write them, iterate through .related.

Right there, we're broken.  Foo(3) is out in the world with less
than half of its ".related" collection - they are still sitting on
the cursor!


Ok, I think I see what you're getting at but I don't think that a
problem here. It looks like you have a query which joins all the
relationships in one go. That's not what I'm doing, I'm relying on the
fact that I can interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched
multiple times, but this is not a problem for me (the whole thing is
read-only anyway). What I need however is for the secondary queries to
populate the relationships in the original BigTable objects.

Vastly simpler is to use the recommended window functions to do
pagination of any scale with no chance of complex failures.


A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking.
It doesn't say explicitly, but it looks like it will work transparently
with eager loading. It basically does the above, but skips the cursor
and replaces it with queries on ranges of the primary key (which is
shorter and probably more efficient to boot).


OK, but the IN loading you're doing, that's good too, I'd like to add a 
loader which uses that someday, and you're right, if we did, we could 
make it work with yield_per too due to the nature of it.  If you want to 
use that approach take a look at 
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading.




Thanks for the tip!
--
Martijn van Oosterhout >
http://svana.org/kleptog/

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Martijn van Oosterhout
On 15 July 2016 at 18:46, Mike Bayer  wrote:


> Here's the problem that cant be solved:
>
> 1. fetch rows 0-1000
>
> 2. start loading Foo objects:
>
>  Foo(1) ->  eagerload Foo.related with 400 members
>  Foo(2) ->  eagerload Foo.related with 500 members
>  Foo(3) -> eagerload Foo.related with the first 100 of 250 members
>
> 3. yield.  That is, stop reading more rows.  Send out Foo(1), Foo(2),
> Foo(3).  The application now can go nuts with these.  It can read them,
> write them, iterate through .related.
>
> Right there, we're broken.  Foo(3) is out in the world with less than half
> of its ".related" collection - they are still sitting on the cursor!
>
>
Ok, I think I see what you're getting at but I don't think that a problem
here. It looks like you have a query which joins all the relationships in
one go. That's not what I'm doing, I'm relying on the fact that I can
interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched multiple
times, but this is not a problem for me (the whole thing is read-only
anyway). What I need however is for the secondary queries to populate the
relationships in the original BigTable objects.

Vastly simpler is to use the recommended window functions to do pagination
> of any scale with no chance of complex failures.
>
>
A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking. It
doesn't say explicitly, but it looks like it will work transparently with
eager loading. It basically does the above, but skips the cursor and
replaces it with queries on ranges of the primary key (which is shorter and
probably more efficient to boot).

Thanks for the tip!
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Mike Bayer



On 07/15/2016 12:08 PM, Martijn van Oosterhout wrote:

Ok, so SQLAlchemy has this nice feature where you can eager load
relationships to significantly reduce the number of queries during
processing.

On the other hand, to reduce memory usage you can use yield_per() (on
Postgres) to significantly reduce the memory usage by not loading the
entire database in memory at once.

For very good reasons mentioned in the documentation you can't use both
of these in the same query, yet that is kind of my goal. What I'd like
to achieve, for a given query which goes over a big table:

while not end of resultset:
   take 1000 results
   eagerload all the relationships
   process them



Here's the problem that cant be solved:

1. fetch rows 0-1000

2. start loading Foo objects:

 Foo(1) ->  eagerload Foo.related with 400 members
 Foo(2) ->  eagerload Foo.related with 500 members
 Foo(3) -> eagerload Foo.related with the first 100 of 250 members

3. yield.  That is, stop reading more rows.  Send out Foo(1), Foo(2), 
Foo(3).  The application now can go nuts with these.  It can read them, 
write them, iterate through .related.


Right there, we're broken.  Foo(3) is out in the world with less than 
half of its ".related" collection - they are still sitting on the cursor!


So let's try, don't actually yield Foo(3) until we've seen Foo(4), or 
the result has ended.  That is, let's try to be clever and look at the 
next row to see that we're definitely done with Foo(3).  This is 
completely complicated to do, but I have made it do this when I tried to 
make this work.


This is more insidious, because it will actually work most of the time. 
 However, it won't work when we don't actually get objects in order 
like that.  If Foo.related is a many-to-many, we could see a Bar in one 
yield batch, then see it all over again in another batch later.  What if 
the user application did all kinds of things to that Bar() in the first 
place, now we're seeing it again, but the identity map has already been 
garbage collected.  Now that Bar() is stale *before it ever left its 
result set*, not to mention if that Bar() has more eagerloaded 
collections, they're all wrong too!  It's a disaster.


Vastly simpler is to use the recommended window functions to do 
pagination of any scale with no chance of complex failures.









Now, the eager loading part is posing difficulties (or I'm not reading
the documentation carefully enough). I found the
|attributes.set_committed_value()|

function which solves half the problem, but I still need to generate the
actual query to return the necessary objects. So perhaps (pseudo-code):

|
defeagerload_for_set(object_list,relationship)
   ids =set(o.get(relationship.left_id)foro inobject_list)
   lookup
=Query(relationship.right_table).filter_by(relationship.right_column.in_(ids)).all()
   foro inobject_list:

 o.set_committed_value(o,relationship.left,lookup[relationship.left_id])
|

Before I go diving into the SQLAlchemy to make the above actually work,
does it seem reasonable? Are there are handy utils somewhere that might
help?

Thanks for any ideas,

Have a nice day,

--
Martijn van Oosterhout



--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Martijn van Oosterhout
Ok, so SQLAlchemy has this nice feature where you can eager load 
relationships to significantly reduce the number of queries during 
processing.

On the other hand, to reduce memory usage you can use yield_per() (on 
Postgres) to significantly reduce the memory usage by not loading the 
entire database in memory at once.

For very good reasons mentioned in the documentation you can't use both of 
these in the same query, yet that is kind of my goal. What I'd like to 
achieve, for a given query which goes over a big table:

while not end of resultset:
   take 1000 results
   eagerload all the relationships
   process them

Now, the eager loading part is posing difficulties (or I'm not reading the 
documentation carefully enough). I found the 
attributes.set_committed_value() 

 
function which solves half the problem, but I still need to generate the 
actual query to return the necessary objects. So perhaps (pseudo-code):

def eagerload_for_set(object_list, relationship)
   ids = set(o.get(relationship.left_id) for o in object_list)
   lookup = Query(relationship.right_table).filter_by(relationship.
right_column.in_(ids)).all()
   for o in object_list:
   o.set_committed_value(o, relationship.left, lookup[relationship.
left_id])

Before I go diving into the SQLAlchemy to make the above actually work, 
does it seem reasonable? Are there are handy utils somewhere that might 
help?

Thanks for any ideas,

Have a nice day,

-- 
Martijn van Oosterhout



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.