Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Hello. I have one more question about my approach to WindowedRangeQuery: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Suppose I have a complex query (without options() / order_by()) to obtain the objects I want. I use this query to calculate window intervals. I don't have to use it again to fetch the objects, because I already have their id intervals. Am I right? Thank you, Ladislav Lenart On 4.6.2013 19:15, Ladislav Lenart wrote: On 4.6.2013 18:49, Michael Bayer wrote: On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need to apply them after you get your window ranging query. You can probably remove the effect of eager loaders if you were to just assign a blank _attributes dictionary to the new query though (just no long term guarantees). other than that, I use the windowing concept extensively and it works very well. Ok, I will try it. Thank you very much for your invaluable insights, Ladislav Lenart -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Hello. I've tried an experiment to verify that yield_per() with subqueryload() behaves as badly as you described, but according to my practical observation, it issues ONE subqueryload() and everything behaves as I would expect. It emits two SELECTs, one to fetch the objects and the second one to fetch the related data and that's about it, see below. So back to square one, what is wrong with it? Base = declarative_base() class Contact(Base): # Just a sketch, I hope you get the picture. id, name, phones = relationship(Phone) # many class Phone(Base): # Just a sketch, I hope you get the picture. id, number, contact = relationship(Contact) # one # Setup engine with echo set to True. phones = ['123456789', '987654321', '555777999'] for i in range(1, 11): c = Contact(name=u' '.join([u'Contact', unicode(i)])) session.add(c) session.add_all(Phone(contact=c, number=e) for e in phones) session.flush() session.expunge_all() q = session.query(Contact).options(subqueryload(Contact.phones)) for each in q.yield_per(2): print each.last_name for e in each.phones: print e The output is like this: SA info about all the inserts after session.flush(). SA info about select for contacts. SA info about select for their phones. Contact 10 123456789 987654321 555777999 Contact 9 123456789 987654321 555777999 Contact 8 123456789 987654321 555777999 Contact 7 123456789 987654321 555777999 Contact 6 123456789 987654321 555777999 Contact 5 123456789 987654321 555777999 Contact 4 123456789 987654321 555777999 Contact 3 123456789 987654321 555777999 Contact 2 123456789 987654321 555777999 Contact 1 123456789 987654321 555777999 Thank you, Ladislav Lenart On 5.6.2013 11:26, Ladislav Lenart wrote: Hello. I have one more question about my approach to WindowedRangeQuery: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Suppose I have a complex query (without options() / order_by()) to obtain the objects I want. I use this query to calculate window intervals. I don't have to use it again to fetch the objects, because I already have their id intervals. Am I right? Thank you, Ladislav Lenart On 4.6.2013 19:15, Ladislav Lenart wrote: On 4.6.2013 18:49, Michael Bayer wrote: On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need to apply them after you get your window ranging query. You can probably remove the effect of eager loaders if you were to just assign a blank _attributes dictionary to the new query though (just no long term guarantees). other than that, I use the windowing concept extensively and it works very well. Ok, I will try it. Thank you very much for your invaluable insights, Ladislav Lenart -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Hello. One more note. I've just tried the below experiment with joinedload() instead of subqueryload() and that does NOT work just as you expected. One contact is returned several times and the first occurrences have incomplete phones. However my experiments with subqueryload() suggest that it works just fine with yield_per(). Can you elaborate on that one please? Thank you, Ladislav Lenart On 5.6.2013 14:03, Ladislav Lenart wrote: Hello. I've tried an experiment to verify that yield_per() with subqueryload() behaves as badly as you described, but according to my practical observation, it issues ONE subqueryload() and everything behaves as I would expect. It emits two SELECTs, one to fetch the objects and the second one to fetch the related data and that's about it, see below. So back to square one, what is wrong with it? Base = declarative_base() class Contact(Base): # Just a sketch, I hope you get the picture. id, name, phones = relationship(Phone) # many class Phone(Base): # Just a sketch, I hope you get the picture. id, number, contact = relationship(Contact) # one # Setup engine with echo set to True. phones = ['123456789', '987654321', '555777999'] for i in range(1, 11): c = Contact(name=u' '.join([u'Contact', unicode(i)])) session.add(c) session.add_all(Phone(contact=c, number=e) for e in phones) session.flush() session.expunge_all() q = session.query(Contact).options(subqueryload(Contact.phones)) for each in q.yield_per(2): print each.last_name for e in each.phones: print e The output is like this: SA info about all the inserts after session.flush(). SA info about select for contacts. SA info about select for their phones. Contact 10 123456789 987654321 555777999 Contact 9 123456789 987654321 555777999 Contact 8 123456789 987654321 555777999 Contact 7 123456789 987654321 555777999 Contact 6 123456789 987654321 555777999 Contact 5 123456789 987654321 555777999 Contact 4 123456789 987654321 555777999 Contact 3 123456789 987654321 555777999 Contact 2 123456789 987654321 555777999 Contact 1 123456789 987654321 555777999 Thank you, Ladislav Lenart On 5.6.2013 11:26, Ladislav Lenart wrote: Hello. I have one more question about my approach to WindowedRangeQuery: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Suppose I have a complex query (without options() / order_by()) to obtain the objects I want. I use this query to calculate window intervals. I don't have to use it again to fetch the objects, because I already have their id intervals. Am I right? Thank you, Ladislav Lenart On 4.6.2013 19:15, Ladislav Lenart wrote: On 4.6.2013 18:49, Michael Bayer wrote: On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need to apply them after you get your window ranging query. You can probably remove the effect of eager loaders if you were to just assign a blank _attributes dictionary to the new query though (just no long term guarantees). other than that, I use the windowing concept extensively and it works very well. Ok, I will try it. Thank you very much for your invaluable insights, Ladislav Lenart -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
the Query emitted by subqueryload does not use yield_per. so if your total result set is 1000 rows, and the total rows represented by all the collections is 1, the first time that query is emitted, 1 rows will be fully loaded and processed into memory at once. This would occur typically somewhere in the first few rows of your 50 -row yield_per batch. So the intent of yield_per, which is to conserve memory and upfront loading overhead, would be defeated entirely by this. The subqueryload is emitting once if I recall correctly because I probably at some point have the query result being memoized in the query context to prevent it from being emitted many times in a yield_per scenario. On Jun 5, 2013, at 10:20 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. One more note. I've just tried the below experiment with joinedload() instead of subqueryload() and that does NOT work just as you expected. One contact is returned several times and the first occurrences have incomplete phones. However my experiments with subqueryload() suggest that it works just fine with yield_per(). Can you elaborate on that one please? Thank you, Ladislav Lenart On 5.6.2013 14:03, Ladislav Lenart wrote: Hello. I've tried an experiment to verify that yield_per() with subqueryload() behaves as badly as you described, but according to my practical observation, it issues ONE subqueryload() and everything behaves as I would expect. It emits two SELECTs, one to fetch the objects and the second one to fetch the related data and that's about it, see below. So back to square one, what is wrong with it? Base = declarative_base() class Contact(Base): # Just a sketch, I hope you get the picture. id, name, phones = relationship(Phone) # many class Phone(Base): # Just a sketch, I hope you get the picture. id, number, contact = relationship(Contact) # one # Setup engine with echo set to True. phones = ['123456789', '987654321', '555777999'] for i in range(1, 11): c = Contact(name=u' '.join([u'Contact', unicode(i)])) session.add(c) session.add_all(Phone(contact=c, number=e) for e in phones) session.flush() session.expunge_all() q = session.query(Contact).options(subqueryload(Contact.phones)) for each in q.yield_per(2): print each.last_name for e in each.phones: print e The output is like this: SA info about all the inserts after session.flush(). SA info about select for contacts. SA info about select for their phones. Contact 10 123456789 987654321 555777999 Contact 9 123456789 987654321 555777999 Contact 8 123456789 987654321 555777999 Contact 7 123456789 987654321 555777999 Contact 6 123456789 987654321 555777999 Contact 5 123456789 987654321 555777999 Contact 4 123456789 987654321 555777999 Contact 3 123456789 987654321 555777999 Contact 2 123456789 987654321 555777999 Contact 1 123456789 987654321 555777999 Thank you, Ladislav Lenart On 5.6.2013 11:26, Ladislav Lenart wrote: Hello. I have one more question about my approach to WindowedRangeQuery: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Suppose I have a complex query (without options() / order_by()) to obtain the objects I want. I use this query to calculate window intervals. I don't have to use it again to fetch the objects, because I already have their id intervals. Am I right? Thank you, Ladislav Lenart On 4.6.2013 19:15, Ladislav Lenart wrote: On 4.6.2013 18:49, Michael Bayer wrote: On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need to apply them after you get your window ranging query. You can probably remove the effect of eager loaders
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Ok, so yield_per() is useless when subqueryload() is used. Thank you, Ladislav Lenart On 5.6.2013 16:27, Michael Bayer wrote: the Query emitted by subqueryload does not use yield_per. so if your total result set is 1000 rows, and the total rows represented by all the collections is 1, the first time that query is emitted, 1 rows will be fully loaded and processed into memory at once. This would occur typically somewhere in the first few rows of your 50 -row yield_per batch. So the intent of yield_per, which is to conserve memory and upfront loading overhead, would be defeated entirely by this. The subqueryload is emitting once if I recall correctly because I probably at some point have the query result being memoized in the query context to prevent it from being emitted many times in a yield_per scenario. On Jun 5, 2013, at 10:20 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. One more note. I've just tried the below experiment with joinedload() instead of subqueryload() and that does NOT work just as you expected. One contact is returned several times and the first occurrences have incomplete phones. However my experiments with subqueryload() suggest that it works just fine with yield_per(). Can you elaborate on that one please? Thank you, Ladislav Lenart On 5.6.2013 14:03, Ladislav Lenart wrote: Hello. I've tried an experiment to verify that yield_per() with subqueryload() behaves as badly as you described, but according to my practical observation, it issues ONE subqueryload() and everything behaves as I would expect. It emits two SELECTs, one to fetch the objects and the second one to fetch the related data and that's about it, see below. So back to square one, what is wrong with it? Base = declarative_base() class Contact(Base): # Just a sketch, I hope you get the picture. id, name, phones = relationship(Phone) # many class Phone(Base): # Just a sketch, I hope you get the picture. id, number, contact = relationship(Contact) # one # Setup engine with echo set to True. phones = ['123456789', '987654321', '555777999'] for i in range(1, 11): c = Contact(name=u' '.join([u'Contact', unicode(i)])) session.add(c) session.add_all(Phone(contact=c, number=e) for e in phones) session.flush() session.expunge_all() q = session.query(Contact).options(subqueryload(Contact.phones)) for each in q.yield_per(2): print each.last_name for e in each.phones: print e The output is like this: SA info about all the inserts after session.flush(). SA info about select for contacts. SA info about select for their phones. Contact 10 123456789 987654321 555777999 Contact 9 123456789 987654321 555777999 Contact 8 123456789 987654321 555777999 Contact 7 123456789 987654321 555777999 Contact 6 123456789 987654321 555777999 Contact 5 123456789 987654321 555777999 Contact 4 123456789 987654321 555777999 Contact 3 123456789 987654321 555777999 Contact 2 123456789 987654321 555777999 Contact 1 123456789 987654321 555777999 Thank you, Ladislav Lenart On 5.6.2013 11:26, Ladislav Lenart wrote: Hello. I have one more question about my approach to WindowedRangeQuery: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Suppose I have a complex query (without options() / order_by()) to obtain the objects I want. I use this query to calculate window intervals. I don't have to use it again to fetch the objects, because I already have their id intervals. Am I right? Thank you, Ladislav Lenart On 4.6.2013 19:15, Ladislav Lenart wrote: On 4.6.2013 18:49, Michael Bayer wrote: On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
On Jun 4, 2013, at 8:18 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. I have a hard time to understand the following comment for Query.yield_per(): Yield only ``count`` rows at a time. WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten. In particular, it's usually impossible to use this setting with eagerly loaded collections (i.e. any lazy='joined' or 'subquery') since those collections will be cleared for a new load when encountered in a subsequent result batch. In the case of 'subquery' loading, the full result for all rows is fetched which generally defeats the purpose of :meth:`~sqlalchemy.orm.query.Query.yield_per`. Also note that many DBAPIs do not stream results, pre-buffering all rows before making them available, including mysql-python and psycopg2. :meth:`~sqlalchemy.orm.query.Query.yield_per` will also set the ``stream_results`` execution option to ``True``, which currently is only understood by psycopg2 and causes server side cursors to be used. Suppose I have a code like this: q = session.query(cls).filter(...) q = q.options( joinedload(cls.foo), subqueryload(cls.bars), ) for each in q.yield_per(50): # access each.foo or each.bars Does it work? Is so, why? If not, why? assuming cls.foo is a many-to-one, it will produce the correct result, but will be far worse in terms of memory and performance, as the subqueryload() call will be invoked for each distinct batch of 50 rows, across the *full* result set. So if your result has 1000 rows, and the number of bars total is 1, you will load 1 additional rows for *each* yield per of 50. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Hello. assuming cls.foo is a many-to-one, it will produce the correct result, but will be far worse in terms of memory and performance, as the subqueryload() call will be invoked for each distinct batch of 50 rows, across the *full* result set. So if your result has 1000 rows, and the number of bars total is 1, you will load 1 additional rows for *each* yield per of 50. Ok, I think I get it. Is there a way to make it all work without the performance penalty of subqueryload? For example, what will happen if I replace it with joinedload(cls.bars)? Ladislav Lenart On 4.6.2013 16:19, Michael Bayer wrote: On Jun 4, 2013, at 8:18 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. I have a hard time to understand the following comment for Query.yield_per(): Yield only ``count`` rows at a time. WARNING: use this method with caution; if the same instance is present in more than one batch of rows, end-user changes to attributes will be overwritten. In particular, it's usually impossible to use this setting with eagerly loaded collections (i.e. any lazy='joined' or 'subquery') since those collections will be cleared for a new load when encountered in a subsequent result batch. In the case of 'subquery' loading, the full result for all rows is fetched which generally defeats the purpose of :meth:`~sqlalchemy.orm.query.Query.yield_per`. Also note that many DBAPIs do not stream results, pre-buffering all rows before making them available, including mysql-python and psycopg2. :meth:`~sqlalchemy.orm.query.Query.yield_per` will also set the ``stream_results`` execution option to ``True``, which currently is only understood by psycopg2 and causes server side cursors to be used. Suppose I have a code like this: q = session.query(cls).filter(...) q = q.options( joinedload(cls.foo), subqueryload(cls.bars), ) for each in q.yield_per(50): # access each.foo or each.bars Does it work? Is so, why? If not, why? assuming cls.foo is a many-to-one, it will produce the correct result, but will be far worse in terms of memory and performance, as the subqueryload() call will be invoked for each distinct batch of 50 rows, across the *full* result set. So if your result has 1000 rows, and the number of bars total is 1, you will load 1 additional rows for *each* yield per of 50. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
On Jun 4, 2013, at 10:45 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. assuming cls.foo is a many-to-one, it will produce the correct result, but will be far worse in terms of memory and performance, as the subqueryload() call will be invoked for each distinct batch of 50 rows, across the *full* result set. So if your result has 1000 rows, and the number of bars total is 1, you will load 1 additional rows for *each* yield per of 50. Ok, I think I get it. Is there a way to make it all work without the performance penalty of subqueryload? For example, what will happen if I replace it with joinedload(cls.bars)? You will then get the wrong results. The docstring tries to explain this - a joinedload uses a JOIN. For each cls instance, there are many rows, one for each bar. If you cut off the results in the middle of populating that collection, the collection is incomplete, you'll see the wrong collection on your cls.bars. On the next load, cls.bars will be wiped out and populated with the remaining bar objects. Don't use yield_per. Use windowing instead, see http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Hello. You will then get the wrong results. The docstring tries to explain this - a joinedload uses a JOIN. For each cls instance, there are many rows, one for each bar. If you cut off the results in the middle of populating that collection, the collection is incomplete, you'll see the wrong collection on your cls.bars. On the next load, cls.bars will be wiped out and populated with the remaining bar objects. Ok, I think I understand this too. I've tried WindowedRangeQuery. It looked promising at first but it is (much) slower than yield_per() with all its quirks, at least for my usecase. If I understand the WindowedRangeQuery recipe, it does a full scan of the target table first to read all the ids and calculate the bounds of all the windows. I don't want to it like this. I am working with relatively large datasets but it is still far less than all rows in the table. Something like 10-5 rows from a table with 1-2 million rows. The windowed query iterates over many completely empty windows. Can I modify the recipe so it preserves the filtering and creates windows only for the interesting subset of the table? Thank you, Ladislav Lenart On 4.6.2013 17:18, Michael Bayer wrote: On Jun 4, 2013, at 10:45 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. assuming cls.foo is a many-to-one, it will produce the correct result, but will be far worse in terms of memory and performance, as the subqueryload() call will be invoked for each distinct batch of 50 rows, across the *full* result set. So if your result has 1000 rows, and the number of bars total is 1, you will load 1 additional rows for *each* yield per of 50. Ok, I think I get it. Is there a way to make it all work without the performance penalty of subqueryload? For example, what will happen if I replace it with joinedload(cls.bars)? You will then get the wrong results. The docstring tries to explain this - a joinedload uses a JOIN. For each cls instance, there are many rows, one for each bar. If you cut off the results in the middle of populating that collection, the collection is incomplete, you'll see the wrong collection on your cls.bars. On the next load, cls.bars will be wiped out and populated with the remaining bar objects. Don't use yield_per. Use windowing instead, see http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
On Jun 4, 2013, at 11:41 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. You will then get the wrong results. The docstring tries to explain this - a joinedload uses a JOIN. For each cls instance, there are many rows, one for each bar. If you cut off the results in the middle of populating that collection, the collection is incomplete, you'll see the wrong collection on your cls.bars. On the next load, cls.bars will be wiped out and populated with the remaining bar objects. Ok, I think I understand this too. I've tried WindowedRangeQuery. It looked promising at first but it is (much) slower than yield_per() with all its quirks, at least for my usecase. OK, but with yield_per() you want to use eagerloading also, so yield_per() not fast enough either, I guess If I understand the WindowedRangeQuery recipe, it does a full scan of the target table first to read all the ids and calculate the bounds of all the windows. I don't want to it like this. I am working with relatively large datasets but it is still far less than all rows in the table. Something like 10-5 rows from a table with 1-2 million rows. The windowed query iterates over many completely empty windows. Can I modify the recipe so it preserves the filtering and creates windows only for the interesting subset of the table? Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this: Enhance this yourself ! Add a where argument so that windows of just a subset of rows can be computed. if your situation is even simpler than that, such as just querying from PKs 50-1000, you could just make up your own integer ranges within those two endpoints. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
Hello. OK, but with yield_per() you want to use eagerloading also, so yield_per() not fast enough either, I guess No. I use yield_per() on complex queries with join(), filter() and both joinedload() and subqueryload(). It is possible that they sometimes returns wrong results because of yield_per(). I am not sure about that, but it is definitely much faster than the original WindowedRangeQuery recipe. I can only speculate that postgres caches subqueryload results... Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? Thank you, Ladislav Lenart On 4.6.2013 17:57, Michael Bayer wrote: On Jun 4, 2013, at 11:41 AM, Ladislav Lenart lenart...@volny.cz wrote: Hello. You will then get the wrong results. The docstring tries to explain this - a joinedload uses a JOIN. For each cls instance, there are many rows, one for each bar. If you cut off the results in the middle of populating that collection, the collection is incomplete, you'll see the wrong collection on your cls.bars. On the next load, cls.bars will be wiped out and populated with the remaining bar objects. Ok, I think I understand this too. I've tried WindowedRangeQuery. It looked promising at first but it is (much) slower than yield_per() with all its quirks, at least for my usecase. OK, but with yield_per() you want to use eagerloading also, so yield_per() not fast enough either, I guess If I understand the WindowedRangeQuery recipe, it does a full scan of the target table first to read all the ids and calculate the bounds of all the windows. I don't want to it like this. I am working with relatively large datasets but it is still far less than all rows in the table. Something like 10-5 rows from a table with 1-2 million rows. The windowed query iterates over many completely empty windows. Can I modify the recipe so it preserves the filtering and creates windows only for the interesting subset of the table? Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this: Enhance this yourself ! Add a where argument so that windows of just a subset of rows can be computed. if your situation is even simpler than that, such as just querying from PKs 50-1000, you could just make up your own integer ranges within those two endpoints. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need to apply them after you get your window ranging query. You can probably remove the effect of eager loaders if you were to just assign a blank _attributes dictionary to the new query though (just no long term guarantees). other than that, I use the windowing concept extensively and it works very well. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] [Q][SA 0.7.9] yield_per() vs. joinedload() and subqueryload()
On 4.6.2013 18:49, Michael Bayer wrote: On Jun 4, 2013, at 12:28 PM, Ladislav Lenart lenart...@volny.cz wrote: Absolutely, you should do whatever you have to in order to get the range you want, in fact the recipe even says this... Ok. What I want to do is basically the following: * Suppose a complex query q with join(...), filter(...) and options(...). * I need to create q2 from q such that: * It has no options. * Can I reset the options with q = q.options(None)? * It has select with the over by magic taken from WindowedRangeQuery recipe. * I know I can use with_entities() for this. * I will use q2 to obtain the window ranges. * I will iterate over the window ranges and apply each to the original q and execute it in a loop. Can this strategy work? there's not a publicly supported feature to reset the options right now so you'd probably need to apply them after you get your window ranging query. You can probably remove the effect of eager loaders if you were to just assign a blank _attributes dictionary to the new query though (just no long term guarantees). other than that, I use the windowing concept extensively and it works very well. Ok, I will try it. Thank you very much for your invaluable insights, Ladislav Lenart -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.