Re: Querysets with "only()" and "defer()" slower than without?
Hi Anssi, creating queryset first and reusing it later is a nice idea and I'll add it to my tests, but unfortunately it works only when you can share this queryset between db requests, which is impossible in case of multiply simultaneously requests to server. For example, if you show some profile data by AJAX calls, you get tons of calls to very simple views doing 1-2 queries. And that's what I am trying to test here - 50 000 iterations allows to minimize per-query time and memory measurement error, I am not going to use such queries in real application. On Aug 6, 7:29 pm, Anssi Kaariainenwrote: > On Aug 6, 12:09 am, Jacob Kaplan-Moss wrote: > > > If you're benchmarking this against a small dataset and an in-memory > > database like SQLite I'd fully expect to see the defer/only benchmark > > to be slower. That's because every time a QS is chained it needs to be > > copied, which is a relatively expensive operation. In a setup with > > small data, the time spent in Python is going to outweigh the time > > spent running the query in the database and sending the data over the > > wire. > > A retest with: > for pk in xrange(1,5): > user = User.objects.only("power_level").get(pk = pk) > d = user.power_level > > replaced with: > qs = User.objects.only("power_level") > for pk in xrange(1, 5): > user = qs.get(pk=pk) > d = user.power_level > > # repeat for all tests. > > Should remove the effect of query building. (btw why not "inplace() > function for query set if you know you won't be using the middle steps > in chained query building?) > > I have been looking into object creation speed when loading many > objects simultaneously, for my particular case running the query as > values_list took something like 5ms (sorry, I don't remember the > values and I am not at work at the moment), 50ms when constructing > instances (1500 objects in my case). The problem seems to be that for > every object created we go and check which fields the model has, which > of them come from args, which are deferred and which have default > values. It would probably be much faster to check the field > classifications once, and the bulk create the objects using the > classifications. > > I did a small test for building 1500 objects in this style (only > object building, not from query set), and the result was something > like 50% faster object building. And yes, signals were sent for every > object created + default() was called if it was callable for the > field. The iterator in query makes it hard to use bulk loading style > however... > > The same problem is present in row iteration when constructing objects > from the rows returned from the DB. For each row we check which select > related things we have etc. This could be checked once before starting > to iterate through the rows and then use the calculated values each > time when building the object from the row. > > - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
On Aug 6, 12:09 am, Jacob Kaplan-Mosswrote: > If you're benchmarking this against a small dataset and an in-memory > database like SQLite I'd fully expect to see the defer/only benchmark > to be slower. That's because every time a QS is chained it needs to be > copied, which is a relatively expensive operation. In a setup with > small data, the time spent in Python is going to outweigh the time > spent running the query in the database and sending the data over the > wire. A retest with: for pk in xrange(1,5): user = User.objects.only("power_level").get(pk = pk) d = user.power_level replaced with: qs = User.objects.only("power_level") for pk in xrange(1, 5): user = qs.get(pk=pk) d = user.power_level # repeat for all tests. Should remove the effect of query building. (btw why not "inplace() function for query set if you know you won't be using the middle steps in chained query building?) I have been looking into object creation speed when loading many objects simultaneously, for my particular case running the query as values_list took something like 5ms (sorry, I don't remember the values and I am not at work at the moment), 50ms when constructing instances (1500 objects in my case). The problem seems to be that for every object created we go and check which fields the model has, which of them come from args, which are deferred and which have default values. It would probably be much faster to check the field classifications once, and the bulk create the objects using the classifications. I did a small test for building 1500 objects in this style (only object building, not from query set), and the result was something like 50% faster object building. And yes, signals were sent for every object created + default() was called if it was callable for the field. The iterator in query makes it hard to use bulk loading style however... The same problem is present in row iteration when constructing objects from the rows returned from the DB. For each row we check which select related things we have etc. This could be checked once before starting to iterate through the rows and then use the calculated values each time when building the object from the row. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
Thanks Jacob, I will continue testing and report if something new on this issue comes out. On Aug 6, 3:50 am, Jacob Kaplan-Mosswrote: > On Thu, Aug 5, 2010 at 6:14 PM, OverKrik wrote: > > Hi Jeremy, I will release all my code after finishing the test suite - > > I think, in about 2 weeks. > > I'm looking forward to seeing it. I agree that the results are > counter-intuitive; seems there's *something* going on here that > shouldn't be happening. I'd expect, given your scenario, that the > only/defer versions would be faster, so the fact that they're not > means that the situation's more complicated than my mental model. > > I'm going to be trying to reproduce your results locally, both using a > clone of your setup and also against some of the huge datasets I have > access to. Let's keep on this: even if there's not a bug per se we > still need to understand the implications of what only/defer are doing > so we can accurately represent the pros/cons of using 'em. > > Thanks for bringing this up! > > Jacob -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
On Thu, Aug 5, 2010 at 6:14 PM, OverKrikwrote: > Hi Jeremy, I will release all my code after finishing the test suite - > I think, in about 2 weeks. I'm looking forward to seeing it. I agree that the results are counter-intuitive; seems there's *something* going on here that shouldn't be happening. I'd expect, given your scenario, that the only/defer versions would be faster, so the fact that they're not means that the situation's more complicated than my mental model. I'm going to be trying to reproduce your results locally, both using a clone of your setup and also against some of the huge datasets I have access to. Let's keep on this: even if there's not a bug per se we still need to understand the implications of what only/defer are doing so we can accurately represent the pros/cons of using 'em. Thanks for bringing this up! Jacob -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
Hi Jeremy, I will release all my code after finishing the test suite - I think, in about 2 weeks. On Aug 6, 2:59 am, Jeremy Dunckwrote: > On Thu, Aug 5, 2010 at 4:32 PM, OverKrik wrote: > > I am performing every test 10 times, excluding one fastest and one > > slowest result, restarting db every time and performing 10 000 request > > to warm db before measuring execution time. > > Just in case, I've tried running tests in only-full-only-full and > > defer-full-defer-full patters and got same results. > > This sounds like a pretty good test. Can you attach the code? I'm > sure it's not pretty, but I've been meaning to work on benchmarks for > a long time and it'd be a shame to not reuse your effort. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
On Thu, Aug 5, 2010 at 4:32 PM, OverKrikwrote: > I am performing every test 10 times, excluding one fastest and one > slowest result, restarting db every time and performing 10 000 request > to warm db before measuring execution time. > Just in case, I've tried running tests in only-full-only-full and > defer-full-defer-full patters and got same results. This sounds like a pretty good test. Can you attach the code? I'm sure it's not pretty, but I've been meaning to work on benchmarks for a long time and it'd be a shame to not reuse your effort. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
1. users = User.objects.only("power_level")[:50] for user in users.iterator(): d = user.power_level 2. users = User.objects.all()[:50] for user in users.iterator(): d = user.power_level 1. ~24 sec 2. ~28 sec This one looks correct. But I am a bit confused, does this mean that only() and defer() should not be used in single-item pk queries? On Aug 6, 1:34 am, Alex Gaynorwrote: > On Thu, Aug 5, 2010 at 5:32 PM, OverKrik wrote: > > I am performing every test 10 times, excluding one fastest and one > > slowest result, restarting db every time and performing 10 000 request > > to warm db before measuring execution time. > > Just in case, I've tried running tests in only-full-only-full and > > defer-full-defer-full patters and got same results. > > > On Aug 6, 1:18 am, Dennis Kaarsemaker wrote: > >> On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote: > > >> > - What database engine are you using? > >> > - Where's the database being stored (same server? other server? > >> > in-memory?) > >> > - How much data is in the database? > >> > - How big is that "info" field on an average model? > > >> - Were OS/database level caches equally hot or cold? > >> -- > >> Dennis K. > > >> They've gone to plaid! > > > -- > > You received this message because you are subscribed to the Google Groups > > "Django developers" group. > > To post to this group, send email to django-develop...@googlegroups.com. > > To unsubscribe from this group, send email to > > django-developers+unsubscr...@googlegroups.com. > > For more options, visit this group > > athttp://groups.google.com/group/django-developers?hl=en. > > Can you try comparing: > > Model.objects.only("field1")[:15] > > vs. > > Model.objects.all()[:15] > > Instead of looping and doing individual queries? > > Alex > > -- > "I disapprove of what you say, but I will defend to the death your > right to say it." -- Voltaire > "The people's good is the highest law." -- Cicero > "Code can always be simpler than you think, but never as simple as you > want" -- Me -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
On Thu, Aug 5, 2010 at 5:32 PM, OverKrikwrote: > I am performing every test 10 times, excluding one fastest and one > slowest result, restarting db every time and performing 10 000 request > to warm db before measuring execution time. > Just in case, I've tried running tests in only-full-only-full and > defer-full-defer-full patters and got same results. > > On Aug 6, 1:18 am, Dennis Kaarsemaker wrote: >> On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote: >> >> > - What database engine are you using? >> > - Where's the database being stored (same server? other server? >> > in-memory?) >> > - How much data is in the database? >> > - How big is that "info" field on an average model? >> >> - Were OS/database level caches equally hot or cold? >> -- >> Dennis K. >> >> They've gone to plaid! > > -- > You received this message because you are subscribed to the Google Groups > "Django developers" group. > To post to this group, send email to django-develop...@googlegroups.com. > To unsubscribe from this group, send email to > django-developers+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-developers?hl=en. > > Can you try comparing: Model.objects.only("field1")[:15] vs. Model.objects.all()[:15] Instead of looping and doing individual queries? Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Voltaire "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
I am performing every test 10 times, excluding one fastest and one slowest result, restarting db every time and performing 10 000 request to warm db before measuring execution time. Just in case, I've tried running tests in only-full-only-full and defer-full-defer-full patters and got same results. On Aug 6, 1:18 am, Dennis Kaarsemakerwrote: > On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote: > > > - What database engine are you using? > > - Where's the database being stored (same server? other server? > > in-memory?) > > - How much data is in the database? > > - How big is that "info" field on an average model? > > - Were OS/database level caches equally hot or cold? > -- > Dennis K. > > They've gone to plaid! -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
Hi Jacob, thx for reply and sorry for not enough additional info in original post. I was thinking that this issue can be related only to python part of the bench, as everything looked ok with queries. Just in case I've tested query generated by only\defer queryset using raw SQL bench and compared it to query with every field: 1. for pk in xrange(1,5): cursor.execute ("SELECT `tests_user`.`id`, `tests_user`.`name`, `tests_user`.`email`, `tests_user`.`age`, `tests_user`.`power_level`, `tests_user`.`info` FROM `tests_user` WHERE `tests_user`.`id` = %(pk)s", {'pk' : pk}) row = cursor.fetchone() d = row[4] 2. for pk in xrange(1,5): cursor.execute ("SELECT `tests_user`.`id`,`tests_user`.`power_level` FROM `tests_user` WHERE `tests_user`.`id` = %(pk)s", {'pk' : pk}) row = cursor.fetchone() d = row[1] 1. ~27 sec 2. ~21 sec This is expected result(asking for less fields = better performance) I am using mysql database on the same PC with MyISAM engine "test_user" table has 921,000 entries, all info fields have randomly generated 400 chars of random ascii On Aug 6, 1:09 am, Jacob Kaplan-Mosswrote: > On Thu, Aug 5, 2010 at 3:44 PM, OverKrik wrote: > > Hi, I am testing performance of three querysets > > Good! We need as many benchmarks as we can get our hands on. > > > I was expecting first two querysets to be faster, but for some reason > > it takes about ~105sec to finish (3) and ~130sec for (1) and (2) > > I've checked queries generated by both querysets and can see that I am > > not doing any extra requests to DB, and that (1) and (2) generates > > correct SQL which includes only pk and power_level fields. I have > > DEBUG=False when running tests. > > Can this be a bug? > > Perhaps, but there's not enough information yet to know for sure. > Anytime you're doing database performance testing, the particulars of > the database and data set matter a *huge* deal. Before I could draw > any conclusions from your data I'd want to know: > > - What database engine are you using? > - Where's the database being stored (same server? other server? in-memory?) > - How much data is in the database? > - How big is that "info" field on an average model? > > If you're benchmarking this against a small dataset and an in-memory > database like SQLite I'd fully expect to see the defer/only benchmark > to be slower. That's because every time a QS is chained it needs to be > copied, which is a relatively expensive operation. In a setup with > small data, the time spent in Python is going to outweigh the time > spent running the query in the database and sending the data over the > wire. > > On the other hand, if I you have a million User objects with an > average of 1K in the info field and you're running against a remote > database -- situations that defer/only were specifically designed to > optimize -- I'd be *very* worried if defer/only was slower. > > Make sense? > > Jacob -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote: > - What database engine are you using? > - Where's the database being stored (same server? other server? > in-memory?) > - How much data is in the database? > - How big is that "info" field on an average model? - Were OS/database level caches equally hot or cold? -- Dennis K. They've gone to plaid! -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Querysets with "only()" and "defer()" slower than without?
On Thu, Aug 5, 2010 at 3:44 PM, OverKrikwrote: > Hi, I am testing performance of three querysets Good! We need as many benchmarks as we can get our hands on. > I was expecting first two querysets to be faster, but for some reason > it takes about ~105sec to finish (3) and ~130sec for (1) and (2) > I've checked queries generated by both querysets and can see that I am > not doing any extra requests to DB, and that (1) and (2) generates > correct SQL which includes only pk and power_level fields. I have > DEBUG=False when running tests. > Can this be a bug? Perhaps, but there's not enough information yet to know for sure. Anytime you're doing database performance testing, the particulars of the database and data set matter a *huge* deal. Before I could draw any conclusions from your data I'd want to know: - What database engine are you using? - Where's the database being stored (same server? other server? in-memory?) - How much data is in the database? - How big is that "info" field on an average model? If you're benchmarking this against a small dataset and an in-memory database like SQLite I'd fully expect to see the defer/only benchmark to be slower. That's because every time a QS is chained it needs to be copied, which is a relatively expensive operation. In a setup with small data, the time spent in Python is going to outweigh the time spent running the query in the database and sending the data over the wire. On the other hand, if I you have a million User objects with an average of 1K in the info field and you're running against a remote database -- situations that defer/only were specifically designed to optimize -- I'd be *very* worried if defer/only was slower. Make sense? Jacob -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-develop...@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.