subject:"\[sqlalchemy\] \[Q\]\[SA 0.7.9\] yield_per\(\) vs. joinedload\(\) and subqueryload\(\)"

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-05 Thread Ladislav Lenart

Hello.

I have one more question about my approach to WindowedRangeQuery:

 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q and
 execute it in a loop.

Suppose I have a complex query (without options() / order_by()) to obtain the
objects I want. I use this query to calculate window intervals. I don't have to
use it again to fetch the objects, because I already have their id intervals. Am
I right?


Thank you,

Ladislav Lenart


On 4.6.2013 19:15, Ladislav Lenart wrote:
 On 4.6.2013 18:49, Michael Bayer wrote:

 On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:


 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...

 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q and
 execute it in a loop.

 Can this strategy work?

 there's not a publicly supported feature to reset the options right now so 
 you'd probably need to apply them after you get your window ranging query.   
You can probably remove the effect of eager loaders if you were to just 
 assign a blank _attributes dictionary to the new query though (just no long 
 term guarantees).

 other than that, I use the windowing concept extensively and it works very 
 well.
 
 Ok, I will try it.
 
 Thank you very much for your invaluable insights,
 
 Ladislav Lenart

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-05 Thread Ladislav Lenart

Hello.

I've tried an experiment to verify that yield_per() with subqueryload() behaves
as badly as you described, but according to my practical observation, it issues
ONE subqueryload() and everything behaves as I would expect. It emits two
SELECTs, one to fetch the objects and the second one to fetch the related data
and that's about it, see below. So back to square one, what is wrong with it?


Base = declarative_base()

class Contact(Base):
# Just a sketch, I hope you get the picture.
id,
name,
phones = relationship(Phone) # many

class Phone(Base):
# Just a sketch, I hope you get the picture.
id,
number,
contact = relationship(Contact) # one

# Setup engine with echo set to True.

phones = ['123456789', '987654321', '555777999']
for i in range(1, 11):
c = Contact(name=u' '.join([u'Contact', unicode(i)]))
session.add(c)
session.add_all(Phone(contact=c, number=e) for e in phones)
session.flush()
session.expunge_all()

q = session.query(Contact).options(subqueryload(Contact.phones))
for each in q.yield_per(2):
print each.last_name
for e in each.phones:
print e

The output is like this:
SA info about all the inserts after session.flush().
SA info about select for contacts.
SA info about select for their phones.
Contact 10
123456789
987654321
555777999
Contact 9
123456789
987654321
555777999
Contact 8
123456789
987654321
555777999
Contact 7
123456789
987654321
555777999
Contact 6
123456789
987654321
555777999
Contact 5
123456789
987654321
555777999
Contact 4
123456789
987654321
555777999
Contact 3
123456789
987654321
555777999
Contact 2
123456789
987654321
555777999
Contact 1
123456789
987654321
555777999


Thank you,

Ladislav Lenart


On 5.6.2013 11:26, Ladislav Lenart wrote:
 Hello.
 
 I have one more question about my approach to WindowedRangeQuery:
 
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q and
 execute it in a loop.
 
 Suppose I have a complex query (without options() / order_by()) to obtain the
 objects I want. I use this query to calculate window intervals. I don't have 
 to
 use it again to fetch the objects, because I already have their id intervals. 
 Am
 I right?
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 On 4.6.2013 19:15, Ladislav Lenart wrote:
 On 4.6.2013 18:49, Michael Bayer wrote:

 On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:


 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...

 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q 
 and
 execute it in a loop.

 Can this strategy work?

 there's not a publicly supported feature to reset the options right now 
 so you'd probably need to apply them after you get your window ranging 
 query.  You can probably remove the effect of eager loaders if you were 
 to just assign a blank _attributes dictionary to the new query though (just 
 no long term guarantees).

 other than that, I use the windowing concept extensively and it works very 
 well.

 Ok, I will try it.

 Thank you very much for your invaluable insights,

 Ladislav Lenart

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-05 Thread Ladislav Lenart

Hello.

One more note. I've just tried the below experiment with joinedload() instead of
subqueryload() and that does NOT work just as you expected. One contact is
returned several times and the first occurrences have incomplete phones.

However my experiments with subqueryload() suggest that it works just fine with
yield_per(). Can you elaborate on that one please?


Thank you,

Ladislav Lenart


On 5.6.2013 14:03, Ladislav Lenart wrote:
 Hello.
 
 I've tried an experiment to verify that yield_per() with subqueryload() 
 behaves
 as badly as you described, but according to my practical observation, it 
 issues
 ONE subqueryload() and everything behaves as I would expect. It emits two
 SELECTs, one to fetch the objects and the second one to fetch the related data
 and that's about it, see below. So back to square one, what is wrong with it?
 
 
 Base = declarative_base()
 
 class Contact(Base):
 # Just a sketch, I hope you get the picture.
 id,
 name,
 phones = relationship(Phone) # many
 
 class Phone(Base):
 # Just a sketch, I hope you get the picture.
 id,
 number,
 contact = relationship(Contact) # one
 
 # Setup engine with echo set to True.
 
 phones = ['123456789', '987654321', '555777999']
 for i in range(1, 11):
 c = Contact(name=u' '.join([u'Contact', unicode(i)]))
 session.add(c)
 session.add_all(Phone(contact=c, number=e) for e in phones)
 session.flush()
 session.expunge_all()
 
 q = session.query(Contact).options(subqueryload(Contact.phones))
 for each in q.yield_per(2):
 print each.last_name
 for e in each.phones:
 print e
 
 The output is like this:
 SA info about all the inserts after session.flush().
 SA info about select for contacts.
 SA info about select for their phones.
 Contact 10
 123456789
 987654321
 555777999
 Contact 9
 123456789
 987654321
 555777999
 Contact 8
 123456789
 987654321
 555777999
 Contact 7
 123456789
 987654321
 555777999
 Contact 6
 123456789
 987654321
 555777999
 Contact 5
 123456789
 987654321
 555777999
 Contact 4
 123456789
 987654321
 555777999
 Contact 3
 123456789
 987654321
 555777999
 Contact 2
 123456789
 987654321
 555777999
 Contact 1
 123456789
 987654321
 555777999
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 On 5.6.2013 11:26, Ladislav Lenart wrote:
 Hello.

 I have one more question about my approach to WindowedRangeQuery:

 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q and
 execute it in a loop.

 Suppose I have a complex query (without options() / order_by()) to obtain the
 objects I want. I use this query to calculate window intervals. I don't have 
 to
 use it again to fetch the objects, because I already have their id 
 intervals. Am
 I right?


 Thank you,

 Ladislav Lenart


 On 4.6.2013 19:15, Ladislav Lenart wrote:
 On 4.6.2013 18:49, Michael Bayer wrote:

 On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:


 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...

 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q 
 and
 execute it in a loop.

 Can this strategy work?

 there's not a publicly supported feature to reset the options right now 
 so you'd probably need to apply them after you get your window ranging 
 query.  You can probably remove the effect of eager loaders if you 
 were to just assign a blank _attributes dictionary to the new query though 
 (just no long term guarantees).

 other than that, I use the windowing concept extensively and it works very 
 well.

 Ok, I will try it.

 Thank you very much for your invaluable insights,

 Ladislav Lenart

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-05 Thread Michael Bayer



the Query emitted by subqueryload does not use yield_per.  so if your total 
result set is 1000 rows, and the total rows represented by all the collections 
is 1,  the first time that query is emitted, 1 rows will be fully 
loaded and processed into memory at once.  This would occur typically somewhere 
in the first few rows of your 50 -row yield_per batch.   So the intent of 
yield_per, which is to conserve memory and upfront loading overhead, would be 
defeated entirely by this.

The subqueryload is emitting once if I recall correctly because I probably at 
some point have the query result being memoized in the query context to prevent 
it from being emitted many times in a yield_per scenario.


On Jun 5, 2013, at 10:20 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Hello.
 
 One more note. I've just tried the below experiment with joinedload() instead 
 of
 subqueryload() and that does NOT work just as you expected. One contact is
 returned several times and the first occurrences have incomplete phones.
 
 However my experiments with subqueryload() suggest that it works just fine 
 with
 yield_per(). Can you elaborate on that one please?
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 On 5.6.2013 14:03, Ladislav Lenart wrote:
 Hello.
 
 I've tried an experiment to verify that yield_per() with subqueryload() 
 behaves
 as badly as you described, but according to my practical observation, it 
 issues
 ONE subqueryload() and everything behaves as I would expect. It emits two
 SELECTs, one to fetch the objects and the second one to fetch the related 
 data
 and that's about it, see below. So back to square one, what is wrong with it?
 
 
 Base = declarative_base()
 
 class Contact(Base):
# Just a sketch, I hope you get the picture.
id,
name,
phones = relationship(Phone) # many
 
 class Phone(Base):
# Just a sketch, I hope you get the picture.
id,
number,
contact = relationship(Contact) # one
 
 # Setup engine with echo set to True.
 
 phones = ['123456789', '987654321', '555777999']
 for i in range(1, 11):
c = Contact(name=u' '.join([u'Contact', unicode(i)]))
session.add(c)
session.add_all(Phone(contact=c, number=e) for e in phones)
 session.flush()
 session.expunge_all()
 
 q = session.query(Contact).options(subqueryload(Contact.phones))
 for each in q.yield_per(2):
print each.last_name
for e in each.phones:
print e
 
 The output is like this:
 SA info about all the inserts after session.flush().
 SA info about select for contacts.
 SA info about select for their phones.
 Contact 10
 123456789
 987654321
 555777999
 Contact 9
 123456789
 987654321
 555777999
 Contact 8
 123456789
 987654321
 555777999
 Contact 7
 123456789
 987654321
 555777999
 Contact 6
 123456789
 987654321
 555777999
 Contact 5
 123456789
 987654321
 555777999
 Contact 4
 123456789
 987654321
 555777999
 Contact 3
 123456789
 987654321
 555777999
 Contact 2
 123456789
 987654321
 555777999
 Contact 1
 123456789
 987654321
 555777999
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 On 5.6.2013 11:26, Ladislav Lenart wrote:
 Hello.
 
 I have one more question about my approach to WindowedRangeQuery:
 
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
 * It has no options.
   * Can I reset the options with q = q.options(None)?
 * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
   * I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q 
 and
 execute it in a loop.
 
 Suppose I have a complex query (without options() / order_by()) to obtain 
 the
 objects I want. I use this query to calculate window intervals. I don't 
 have to
 use it again to fetch the objects, because I already have their id 
 intervals. Am
 I right?
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 On 4.6.2013 19:15, Ladislav Lenart wrote:
 On 4.6.2013 18:49, Michael Bayer wrote:
 
 On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:
 
 
 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...
 
 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
 * It has no options.
   * Can I reset the options with q = q.options(None)?
 * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
   * I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q 
 and
 execute it in a loop.
 
 Can this strategy work?
 
 there's not a publicly supported feature to reset the options right now 
 so you'd probably need to apply them after you get your window ranging 
 query.  You can probably remove the effect of eager loaders

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-05 Thread Ladislav Lenart

Ok, so yield_per() is useless when subqueryload() is used.

Thank you,

Ladislav Lenart


On 5.6.2013 16:27, Michael Bayer wrote:
 the Query emitted by subqueryload does not use yield_per.  so if your total 
 result set is 1000 rows, and the total rows represented by all the 
 collections is 1,  the first time that query is emitted, 1 rows will 
 be fully loaded and processed into memory at once.  This would occur 
 typically somewhere in the first few rows of your 50 -row yield_per batch.   
 So the intent of yield_per, which is to conserve memory and upfront loading 
 overhead, would be defeated entirely by this.
 
 The subqueryload is emitting once if I recall correctly because I probably at 
 some point have the query result being memoized in the query context to 
 prevent it from being emitted many times in a yield_per scenario.
 
 
 On Jun 5, 2013, at 10:20 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Hello.

 One more note. I've just tried the below experiment with joinedload() 
 instead of
 subqueryload() and that does NOT work just as you expected. One contact is
 returned several times and the first occurrences have incomplete phones.

 However my experiments with subqueryload() suggest that it works just fine 
 with
 yield_per(). Can you elaborate on that one please?


 Thank you,

 Ladislav Lenart


 On 5.6.2013 14:03, Ladislav Lenart wrote:
 Hello.

 I've tried an experiment to verify that yield_per() with subqueryload() 
 behaves
 as badly as you described, but according to my practical observation, it 
 issues
 ONE subqueryload() and everything behaves as I would expect. It emits two
 SELECTs, one to fetch the objects and the second one to fetch the related 
 data
 and that's about it, see below. So back to square one, what is wrong with 
 it?


 Base = declarative_base()

 class Contact(Base):
# Just a sketch, I hope you get the picture.
id,
name,
phones = relationship(Phone) # many

 class Phone(Base):
# Just a sketch, I hope you get the picture.
id,
number,
contact = relationship(Contact) # one

 # Setup engine with echo set to True.

 phones = ['123456789', '987654321', '555777999']
 for i in range(1, 11):
c = Contact(name=u' '.join([u'Contact', unicode(i)]))
session.add(c)
session.add_all(Phone(contact=c, number=e) for e in phones)
 session.flush()
 session.expunge_all()

 q = session.query(Contact).options(subqueryload(Contact.phones))
 for each in q.yield_per(2):
print each.last_name
for e in each.phones:
print e

 The output is like this:
 SA info about all the inserts after session.flush().
 SA info about select for contacts.
 SA info about select for their phones.
 Contact 10
 123456789
 987654321
 555777999
 Contact 9
 123456789
 987654321
 555777999
 Contact 8
 123456789
 987654321
 555777999
 Contact 7
 123456789
 987654321
 555777999
 Contact 6
 123456789
 987654321
 555777999
 Contact 5
 123456789
 987654321
 555777999
 Contact 4
 123456789
 987654321
 555777999
 Contact 3
 123456789
 987654321
 555777999
 Contact 2
 123456789
 987654321
 555777999
 Contact 1
 123456789
 987654321
 555777999


 Thank you,

 Ladislav Lenart


 On 5.6.2013 11:26, Ladislav Lenart wrote:
 Hello.

 I have one more question about my approach to WindowedRangeQuery:

 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
 * It has no options.
   * Can I reset the options with q = q.options(None)?
 * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
   * I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q 
 and
 execute it in a loop.

 Suppose I have a complex query (without options() / order_by()) to obtain 
 the
 objects I want. I use this query to calculate window intervals. I don't 
 have to
 use it again to fetch the objects, because I already have their id 
 intervals. Am
 I right?


 Thank you,

 Ladislav Lenart


 On 4.6.2013 19:15, Ladislav Lenart wrote:
 On 4.6.2013 18:49, Michael Bayer wrote:

 On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:


 Absolutely, you should do whatever you have to in order to get the 
 range
 you want, in fact the recipe even says this...

 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and 
 options(...).
 * I need to create q2 from q such that:
 * It has no options.
   * Can I reset the options with q = q.options(None)?
 * It has select with the over by magic taken from WindowedRangeQuery 
 recipe.
   * I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original 
 q and
 execute it in a loop.

 Can this strategy work?

 there's not a publicly supported feature to reset the options right 
 now so you'd probably need

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Michael Bayer


On Jun 4, 2013, at 8:18 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Hello.
 
 I have a hard time to understand the following comment for Query.yield_per():

   Yield only ``count`` rows at a time.
 
WARNING: use this method with caution; if the same instance is present
in more than one batch of rows, end-user changes to attributes will be
overwritten.
 
In particular, it's usually impossible to use this setting with
eagerly loaded collections (i.e. any lazy='joined' or 'subquery')
since those collections will be cleared for a new load when
encountered in a subsequent result batch.   In the case of 'subquery'
loading, the full result for all rows is fetched which generally
defeats the purpose of :meth:`~sqlalchemy.orm.query.Query.yield_per`.
 
Also note that many DBAPIs do not stream results, pre-buffering
all rows before making them available, including mysql-python and
psycopg2.  :meth:`~sqlalchemy.orm.query.Query.yield_per` will also
set the ``stream_results`` execution
option to ``True``, which currently is only understood by psycopg2
and causes server side cursors to be used.
 
 Suppose I have a code like this:
 
q = session.query(cls).filter(...)
q = q.options(
joinedload(cls.foo),
   subqueryload(cls.bars),
)
for each in q.yield_per(50):
# access each.foo or each.bars
 
 Does it work? Is so, why? If not, why?

assuming cls.foo is a many-to-one, it will produce the correct result, but will 
be far worse in terms of memory and performance, as the subqueryload() call 
will be invoked for each distinct batch of 50 rows, across the *full* result 
set.  So if your result has 1000 rows, and the number of bars total is 1, 
you will load 1 additional rows for *each* yield per of 50.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Ladislav Lenart

Hello.

 assuming cls.foo is a many-to-one, it will produce the correct result,
 but will be far worse in terms of memory and performance, as the
 subqueryload() call will be invoked for each distinct batch of 50 rows,
 across the *full* result set.  So if your result has 1000 rows, and the
 number of bars total is 1, you will load 1 additional rows for
 *each* yield per of 50.

Ok, I think I get it. Is there a way to make it all work without the performance
penalty of subqueryload? For example, what will happen if I replace it with
joinedload(cls.bars)?

Ladislav Lenart


On 4.6.2013 16:19, Michael Bayer wrote:
 
 On Jun 4, 2013, at 8:18 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Hello.

 I have a hard time to understand the following comment for Query.yield_per():
 
  Yield only ``count`` rows at a time.

WARNING: use this method with caution; if the same instance is present
in more than one batch of rows, end-user changes to attributes will be
overwritten.

In particular, it's usually impossible to use this setting with
eagerly loaded collections (i.e. any lazy='joined' or 'subquery')
since those collections will be cleared for a new load when
encountered in a subsequent result batch.   In the case of 'subquery'
loading, the full result for all rows is fetched which generally
defeats the purpose of :meth:`~sqlalchemy.orm.query.Query.yield_per`.

Also note that many DBAPIs do not stream results, pre-buffering
all rows before making them available, including mysql-python and
psycopg2.  :meth:`~sqlalchemy.orm.query.Query.yield_per` will also
set the ``stream_results`` execution
option to ``True``, which currently is only understood by psycopg2
and causes server side cursors to be used.

 Suppose I have a code like this:

q = session.query(cls).filter(...)
q = q.options(
joinedload(cls.foo),
  subqueryload(cls.bars),
)
for each in q.yield_per(50):
# access each.foo or each.bars

 Does it work? Is so, why? If not, why?
 
 assuming cls.foo is a many-to-one, it will produce the correct result, but 
 will be far worse in terms of memory and performance, as the subqueryload() 
 call will be invoked for each distinct batch of 50 rows, across the *full* 
 result set.  So if your result has 1000 rows, and the number of bars total 
 is 1, you will load 1 additional rows for *each* yield per of 50.
 


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Michael Bayer


On Jun 4, 2013, at 10:45 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Hello.
 
 assuming cls.foo is a many-to-one, it will produce the correct result,
 but will be far worse in terms of memory and performance, as the
 subqueryload() call will be invoked for each distinct batch of 50 rows,
 across the *full* result set.  So if your result has 1000 rows, and the
 number of bars total is 1, you will load 1 additional rows for
 *each* yield per of 50.
 
 Ok, I think I get it. Is there a way to make it all work without the 
 performance
 penalty of subqueryload? For example, what will happen if I replace it with
 joinedload(cls.bars)?

You will then get the wrong results.   The docstring tries to explain this - a 
joinedload uses a JOIN.  For each cls instance, there are many rows, one for 
each bar.  If you cut off the results in the middle of populating that 
collection, the collection is incomplete, you'll see the wrong collection on 
your cls.bars.  On the next load, cls.bars will be wiped out and populated with 
the remaining bar objects.

Don't use yield_per.  Use windowing instead, see 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery.




-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Ladislav Lenart

Hello.

 You will then get the wrong results.   The docstring tries to explain this -
 a joinedload uses a JOIN.  For each cls instance, there are many rows, one
 for each bar.  If you cut off the results in the middle of populating that
 collection, the collection is incomplete, you'll see the wrong collection on
 your cls.bars.  On the next load, cls.bars will be wiped out and populated
 with the remaining bar objects.

Ok, I think I understand this too.

I've tried WindowedRangeQuery. It looked promising at first but it is (much)
slower than yield_per() with all its quirks, at least for my usecase. If I
understand the WindowedRangeQuery recipe, it does a full scan of the target
table first to read all the ids and calculate the bounds of all the windows. I
don't want to it like this. I am working with relatively large datasets but it
is still far less than all rows in the table. Something like 10-5 rows from
a table with 1-2 million rows. The windowed query iterates over many completely
empty windows.

Can I modify the recipe so it preserves the filtering and creates windows only
for the interesting subset of the table?

Thank you,

Ladislav Lenart


On 4.6.2013 17:18, Michael Bayer wrote:
 
 On Jun 4, 2013, at 10:45 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Hello.

 assuming cls.foo is a many-to-one, it will produce the correct result,
 but will be far worse in terms of memory and performance, as the
 subqueryload() call will be invoked for each distinct batch of 50 rows,
 across the *full* result set.  So if your result has 1000 rows, and the
 number of bars total is 1, you will load 1 additional rows for
 *each* yield per of 50.

 Ok, I think I get it. Is there a way to make it all work without the 
 performance
 penalty of subqueryload? For example, what will happen if I replace it with
 joinedload(cls.bars)?
 
 You will then get the wrong results.   The docstring tries to explain this - 
 a joinedload uses a JOIN.  For each cls instance, there are many rows, one 
 for each bar.  If you cut off the results in the middle of populating that 
 collection, the collection is incomplete, you'll see the wrong collection on 
 your cls.bars.  On the next load, cls.bars will be wiped out and populated 
 with the remaining bar objects.
 
 Don't use yield_per.  Use windowing instead, see 
 http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Michael Bayer


On Jun 4, 2013, at 11:41 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Hello.
 
 You will then get the wrong results.   The docstring tries to explain this -
 a joinedload uses a JOIN.  For each cls instance, there are many rows, one
 for each bar.  If you cut off the results in the middle of populating that
 collection, the collection is incomplete, you'll see the wrong collection on
 your cls.bars.  On the next load, cls.bars will be wiped out and populated
 with the remaining bar objects.
 
 Ok, I think I understand this too.
 
 I've tried WindowedRangeQuery. It looked promising at first but it is (much)
 slower than yield_per() with all its quirks, at least for my usecase.

OK, but with yield_per() you want to use eagerloading also, so yield_per() not 
fast enough either, I guess


 If I
 understand the WindowedRangeQuery recipe, it does a full scan of the target
 table first to read all the ids and calculate the bounds of all the windows. I
 don't want to it like this. I am working with relatively large datasets but it
 is still far less than all rows in the table. Something like 10-5 rows 
 from
 a table with 1-2 million rows. The windowed query iterates over many 
 completely
 empty windows.
 
 Can I modify the recipe so it preserves the filtering and creates windows only
 for the interesting subset of the table?

Absolutely, you should do whatever you have to in order to get the range you 
want, in fact the recipe even says this:

 Enhance this yourself !  Add a where argument
so that windows of just a subset of rows can
be computed.

if your situation is even simpler than that, such as just querying from PKs 
50-1000, you could just make up your own integer ranges within those two 
endpoints.   


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Ladislav Lenart

Hello.

 OK, but with yield_per() you want to use eagerloading also, so yield_per()
 not fast enough either, I guess

No. I use yield_per() on complex queries with join(), filter() and both
joinedload() and subqueryload(). It is possible that they sometimes returns
wrong results because of yield_per(). I am not sure about that, but it is
definitely much faster than the original WindowedRangeQuery recipe. I can only
speculate that postgres caches subqueryload results...


 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...

Ok. What I want to do is basically the following:
* Suppose a complex query q with join(...), filter(...) and options(...).
* I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery recipe.
* I know I can use with_entities() for this.
* I will use q2 to obtain the window ranges.
* I will iterate over the window ranges and apply each to the original q and
execute it in a loop.

Can this strategy work?


Thank you,

Ladislav Lenart


On 4.6.2013 17:57, Michael Bayer wrote:
 
 On Jun 4, 2013, at 11:41 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Hello.

 You will then get the wrong results.   The docstring tries to explain this -
 a joinedload uses a JOIN.  For each cls instance, there are many rows, one
 for each bar.  If you cut off the results in the middle of populating that
 collection, the collection is incomplete, you'll see the wrong collection on
 your cls.bars.  On the next load, cls.bars will be wiped out and populated
 with the remaining bar objects.

 Ok, I think I understand this too.

 I've tried WindowedRangeQuery. It looked promising at first but it is (much)
 slower than yield_per() with all its quirks, at least for my usecase.
 
 OK, but with yield_per() you want to use eagerloading also, so yield_per() 
 not fast enough either, I guess
 
 
 If I
 understand the WindowedRangeQuery recipe, it does a full scan of the target
 table first to read all the ids and calculate the bounds of all the windows. 
 I
 don't want to it like this. I am working with relatively large datasets but 
 it
 is still far less than all rows in the table. Something like 10-5 rows 
 from
 a table with 1-2 million rows. The windowed query iterates over many 
 completely
 empty windows.

 Can I modify the recipe so it preserves the filtering and creates windows 
 only
 for the interesting subset of the table?
 
 Absolutely, you should do whatever you have to in order to get the range you 
 want, in fact the recipe even says this:
 
  Enhance this yourself !  Add a where argument
 so that windows of just a subset of rows can
 be computed.
 
 if your situation is even simpler than that, such as just querying from PKs 
 50-1000, you could just make up your own integer ranges within those two 
 endpoints.   

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Michael Bayer


On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:

 
 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...
 
 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q and
 execute it in a loop.
 
 Can this strategy work?

there's not a publicly supported feature to reset the options right now so 
you'd probably need to apply them after you get your window ranging query.  
You can probably remove the effect of eager loaders if you were to just assign 
a blank _attributes dictionary to the new query though (just no long term 
guarantees).

other than that, I use the windowing concept extensively and it works very well.



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

2013-06-04 Thread Ladislav Lenart

On 4.6.2013 18:49, Michael Bayer wrote:
 
 On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote:
 

 Absolutely, you should do whatever you have to in order to get the range
 you want, in fact the recipe even says this...

 Ok. What I want to do is basically the following:
 * Suppose a complex query q with join(...), filter(...) and options(...).
 * I need to create q2 from q such that:
  * It has no options.
* Can I reset the options with q = q.options(None)?
  * It has select with the over by magic taken from WindowedRangeQuery recipe.
* I know I can use with_entities() for this.
 * I will use q2 to obtain the window ranges.
 * I will iterate over the window ranges and apply each to the original q and
 execute it in a loop.

 Can this strategy work?
 
 there's not a publicly supported feature to reset the options right now so 
 you'd probably need to apply them after you get your window ranging query.
   You can probably remove the effect of eager loaders if you were to just 
 assign a blank _attributes dictionary to the new query though (just no long 
 term guarantees).
 
 other than that, I use the windowing concept extensively and it works very 
 well.

Ok, I will try it.

Thank you very much for your invaluable insights,

Ladislav Lenart

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()

13 matches

Site Navigation

Mail list logo

Footer information