Re: Querysets with "only()" and "defer()" slower than without?

2010-08-08 Thread OverKrik
Hi Anssi,
creating queryset first and reusing it later is a nice idea and I'll
add it to my tests, but unfortunately it works only when you can share
this queryset between db requests, which is impossible in case of
multiply simultaneously requests to server. For example, if you show
some profile data by AJAX calls, you get tons of calls to very simple
views doing 1-2 queries. And that's what I am trying to test here - 50
000 iterations allows to minimize per-query time and memory
measurement error, I am not going to use such queries in real
application.

On Aug 6, 7:29 pm, Anssi Kaariainen  wrote:
> On Aug 6, 12:09 am, Jacob Kaplan-Moss  wrote:
>
> > If you're benchmarking this against a small dataset and an in-memory
> > database like SQLite I'd fully expect to see the defer/only benchmark
> > to be slower. That's because every time a QS is chained it needs to be
> > copied, which is a relatively expensive operation. In a setup with
> > small data, the time spent in Python is going to outweigh the time
> > spent running the query in the database and sending the data over the
> > wire.
>
> A retest with:
> for pk in xrange(1,5):
>        user = User.objects.only("power_level").get(pk = pk)
>        d = user.power_level
>
> replaced with:
> qs = User.objects.only("power_level")
> for pk in xrange(1, 5):
>     user = qs.get(pk=pk)
>     d = user.power_level
>
> # repeat for all tests.
>
> Should remove the effect of query building. (btw why not "inplace()
> function for query set if you know you won't be using the middle steps
> in  chained query building?)
>
> I have been looking into object creation speed when loading many
> objects simultaneously, for my particular case running the query as
> values_list took something like 5ms (sorry, I don't remember the
> values and I am not at work at the moment), 50ms when constructing
> instances (1500 objects in my case). The problem seems to be that for
> every object created we go and check which fields the model has, which
> of them come from args, which are deferred and which have default
> values. It would probably be much faster to check the field
> classifications once, and the bulk create the objects using the
> classifications.
>
> I did a small test for building 1500 objects in this style (only
> object building, not from query set), and the result was something
> like 50% faster object building. And yes, signals were sent for every
> object created + default() was called if it was callable for the
> field. The iterator in query makes it hard to use bulk loading style
> however...
>
> The same problem is present in row iteration when constructing objects
> from the rows returned from the DB. For each row we check which select
> related things we have etc. This could be checked once before starting
> to iterate through the rows and then use the calculated values each
> time when building the object from the row.
>
> - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-06 Thread Anssi Kaariainen

On Aug 6, 12:09 am, Jacob Kaplan-Moss  wrote:
> If you're benchmarking this against a small dataset and an in-memory
> database like SQLite I'd fully expect to see the defer/only benchmark
> to be slower. That's because every time a QS is chained it needs to be
> copied, which is a relatively expensive operation. In a setup with
> small data, the time spent in Python is going to outweigh the time
> spent running the query in the database and sending the data over the
> wire.

A retest with:
for pk in xrange(1,5):
   user = User.objects.only("power_level").get(pk = pk)
   d = user.power_level

replaced with:
qs = User.objects.only("power_level")
for pk in xrange(1, 5):
user = qs.get(pk=pk)
d = user.power_level

# repeat for all tests.

Should remove the effect of query building. (btw why not "inplace()
function for query set if you know you won't be using the middle steps
in  chained query building?)

I have been looking into object creation speed when loading many
objects simultaneously, for my particular case running the query as
values_list took something like 5ms (sorry, I don't remember the
values and I am not at work at the moment), 50ms when constructing
instances (1500 objects in my case). The problem seems to be that for
every object created we go and check which fields the model has, which
of them come from args, which are deferred and which have default
values. It would probably be much faster to check the field
classifications once, and the bulk create the objects using the
classifications.

I did a small test for building 1500 objects in this style (only
object building, not from query set), and the result was something
like 50% faster object building. And yes, signals were sent for every
object created + default() was called if it was callable for the
field. The iterator in query makes it hard to use bulk loading style
however...

The same problem is present in row iteration when constructing objects
from the rows returned from the DB. For each row we check which select
related things we have etc. This could be checked once before starting
to iterate through the rows and then use the calculated values each
time when building the object from the row.

- Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-06 Thread OverKrik
Thanks Jacob, I will continue testing and report if something new on
this issue comes out.

On Aug 6, 3:50 am, Jacob Kaplan-Moss  wrote:
> On Thu, Aug 5, 2010 at 6:14 PM, OverKrik  wrote:
> > Hi Jeremy, I will release all my code after finishing the test suite -
> > I think, in about 2 weeks.
>
> I'm looking forward to seeing it. I agree that the results are
> counter-intuitive; seems there's *something* going on here that
> shouldn't be happening. I'd expect, given your scenario, that the
> only/defer versions would be faster, so the fact that they're not
> means that the situation's more complicated than my mental model.
>
> I'm going to be trying to reproduce your results locally, both using a
> clone of your setup and also against some of the huge datasets I have
> access to. Let's keep on this: even if there's not a bug per se we
> still need to understand the implications of what only/defer are doing
> so we can accurately represent the pros/cons of using 'em.
>
> Thanks for bringing this up!
>
> Jacob

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread Jacob Kaplan-Moss
On Thu, Aug 5, 2010 at 6:14 PM, OverKrik  wrote:
> Hi Jeremy, I will release all my code after finishing the test suite -
> I think, in about 2 weeks.

I'm looking forward to seeing it. I agree that the results are
counter-intuitive; seems there's *something* going on here that
shouldn't be happening. I'd expect, given your scenario, that the
only/defer versions would be faster, so the fact that they're not
means that the situation's more complicated than my mental model.

I'm going to be trying to reproduce your results locally, both using a
clone of your setup and also against some of the huge datasets I have
access to. Let's keep on this: even if there's not a bug per se we
still need to understand the implications of what only/defer are doing
so we can accurately represent the pros/cons of using 'em.

Thanks for bringing this up!

Jacob

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread OverKrik
Hi Jeremy, I will release all my code after finishing the test suite -
I think, in about 2 weeks.

On Aug 6, 2:59 am, Jeremy Dunck  wrote:
> On Thu, Aug 5, 2010 at 4:32 PM, OverKrik  wrote:
> > I am performing every test 10 times, excluding one fastest and one
> > slowest result, restarting db every time and performing 10 000 request
> > to warm db before measuring execution time.
> > Just in case, I've tried running tests in only-full-only-full and
> > defer-full-defer-full patters and got same results.
>
> This sounds like a pretty good test.  Can you attach the code?  I'm
> sure it's not pretty, but I've been meaning to work on benchmarks for
> a long time and it'd be a shame to not reuse your effort.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread Jeremy Dunck
On Thu, Aug 5, 2010 at 4:32 PM, OverKrik  wrote:
> I am performing every test 10 times, excluding one fastest and one
> slowest result, restarting db every time and performing 10 000 request
> to warm db before measuring execution time.
> Just in case, I've tried running tests in only-full-only-full and
> defer-full-defer-full patters and got same results.

This sounds like a pretty good test.  Can you attach the code?  I'm
sure it's not pretty, but I've been meaning to work on benchmarks for
a long time and it'd be a shame to not reuse your effort.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread OverKrik
1.
users = User.objects.only("power_level")[:50]
for user in users.iterator():
d = user.power_level

2.
   users = User.objects.all()[:50]
for user in users.iterator():
d = user.power_level

1. ~24 sec
2. ~28 sec

This one looks correct.
But I am a bit confused, does this mean that only() and defer() should
not be used in single-item pk queries?

On Aug 6, 1:34 am, Alex Gaynor  wrote:
> On Thu, Aug 5, 2010 at 5:32 PM, OverKrik  wrote:
> > I am performing every test 10 times, excluding one fastest and one
> > slowest result, restarting db every time and performing 10 000 request
> > to warm db before measuring execution time.
> > Just in case, I've tried running tests in only-full-only-full and
> > defer-full-defer-full patters and got same results.
>
> > On Aug 6, 1:18 am, Dennis Kaarsemaker  wrote:
> >> On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote:
>
> >> > - What database engine are you using?
> >> > - Where's the database being stored (same server? other server?
> >> > in-memory?)
> >> > - How much data is in the database?
> >> > - How big is that "info" field on an average model?
>
> >> - Were OS/database level caches equally hot or cold?
> >> --
> >> Dennis K.
>
> >> They've gone to plaid!
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Django developers" group.
> > To post to this group, send email to django-develop...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > django-developers+unsubscr...@googlegroups.com.
> > For more options, visit this group 
> > athttp://groups.google.com/group/django-developers?hl=en.
>
> Can you try comparing:
>
> Model.objects.only("field1")[:15]
>
> vs.
>
> Model.objects.all()[:15]
>
> Instead of looping and doing individual queries?
>
> Alex
>
> --
> "I disapprove of what you say, but I will defend to the death your
> right to say it." -- Voltaire
> "The people's good is the highest law." -- Cicero
> "Code can always be simpler than you think, but never as simple as you
> want" -- Me

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread Alex Gaynor
On Thu, Aug 5, 2010 at 5:32 PM, OverKrik  wrote:
> I am performing every test 10 times, excluding one fastest and one
> slowest result, restarting db every time and performing 10 000 request
> to warm db before measuring execution time.
> Just in case, I've tried running tests in only-full-only-full and
> defer-full-defer-full patters and got same results.
>
> On Aug 6, 1:18 am, Dennis Kaarsemaker  wrote:
>> On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote:
>>
>> > - What database engine are you using?
>> > - Where's the database being stored (same server? other server?
>> > in-memory?)
>> > - How much data is in the database?
>> > - How big is that "info" field on an average model?
>>
>> - Were OS/database level caches equally hot or cold?
>> --
>> Dennis K.
>>
>> They've gone to plaid!
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To post to this group, send email to django-develop...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.
>
>

Can you try comparing:

Model.objects.only("field1")[:15]

vs.

Model.objects.all()[:15]

Instead of looping and doing individual queries?

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread OverKrik
I am performing every test 10 times, excluding one fastest and one
slowest result, restarting db every time and performing 10 000 request
to warm db before measuring execution time.
Just in case, I've tried running tests in only-full-only-full and
defer-full-defer-full patters and got same results.

On Aug 6, 1:18 am, Dennis Kaarsemaker  wrote:
> On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote:
>
> > - What database engine are you using?
> > - Where's the database being stored (same server? other server?
> > in-memory?)
> > - How much data is in the database?
> > - How big is that "info" field on an average model?
>
> - Were OS/database level caches equally hot or cold?
> --
> Dennis K.
>
> They've gone to plaid!

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread OverKrik
Hi Jacob, thx for reply and sorry for not enough additional info in
original post. I was thinking that this issue can be related only to
python part of the bench, as everything looked ok with queries. Just
in case I've tested query generated by only\defer queryset using raw
SQL bench and compared it to query with every field:

1.
for pk in xrange(1,5):
cursor.execute ("SELECT `tests_user`.`id`,
`tests_user`.`name`, `tests_user`.`email`, `tests_user`.`age`,
`tests_user`.`power_level`, `tests_user`.`info` FROM `tests_user`
WHERE `tests_user`.`id` = %(pk)s", {'pk' : pk})
row = cursor.fetchone()
d = row[4]
2.
for pk in xrange(1,5):
cursor.execute ("SELECT
`tests_user`.`id`,`tests_user`.`power_level` FROM `tests_user` WHERE
`tests_user`.`id` = %(pk)s", {'pk' : pk})
row = cursor.fetchone()
d = row[1]

1. ~27 sec
2. ~21 sec
This is expected result(asking for less fields = better performance)

I am using mysql database on the same PC with MyISAM engine
"test_user" table has 921,000 entries, all info fields have randomly
generated 400 chars of random ascii


On Aug 6, 1:09 am, Jacob Kaplan-Moss  wrote:
> On Thu, Aug 5, 2010 at 3:44 PM, OverKrik  wrote:
> > Hi, I am testing performance of three querysets
>
> Good! We need as many benchmarks as we can get our hands on.
>
> > I was expecting first two querysets to be faster, but for some reason
> > it takes about ~105sec to finish (3) and ~130sec for (1) and (2)
> > I've checked queries generated by both querysets and can see that I am
> > not doing any extra requests to DB, and that (1) and (2) generates
> > correct SQL which includes only pk and power_level fields. I have
> > DEBUG=False when running tests.
> > Can this be a bug?
>
> Perhaps, but there's not enough information yet to know for sure.
> Anytime you're doing database performance testing, the particulars of
> the database and data set matter a *huge* deal. Before I could draw
> any conclusions from your data I'd want to know:
>
> - What database engine are you using?
> - Where's the database being stored (same server? other server? in-memory?)
> - How much data is in the database?
> - How big is that "info" field on an average model?
>
> If you're benchmarking this against a small dataset and an in-memory
> database like SQLite I'd fully expect to see the defer/only benchmark
> to be slower. That's because every time a QS is chained it needs to be
> copied, which is a relatively expensive operation. In a setup with
> small data, the time spent in Python is going to outweigh the time
> spent running the query in the database and sending the data over the
> wire.
>
> On the other hand, if I you have a million User objects with an
> average of 1K in the info field and you're running against a remote
> database -- situations that defer/only were specifically designed to
> optimize -- I'd be *very* worried if defer/only was slower.
>
> Make sense?
>
> Jacob

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread Dennis Kaarsemaker
On do, 2010-08-05 at 16:09 -0500, Jacob Kaplan-Moss wrote:

> - What database engine are you using?
> - Where's the database being stored (same server? other server?
> in-memory?)
> - How much data is in the database?
> - How big is that "info" field on an average model? 

- Were OS/database level caches equally hot or cold?
-- 
Dennis K.

They've gone to plaid!

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Querysets with "only()" and "defer()" slower than without?

2010-08-05 Thread Jacob Kaplan-Moss
On Thu, Aug 5, 2010 at 3:44 PM, OverKrik  wrote:
> Hi, I am testing performance of three querysets

Good! We need as many benchmarks as we can get our hands on.

> I was expecting first two querysets to be faster, but for some reason
> it takes about ~105sec to finish (3) and ~130sec for (1) and (2)
> I've checked queries generated by both querysets and can see that I am
> not doing any extra requests to DB, and that (1) and (2) generates
> correct SQL which includes only pk and power_level fields. I have
> DEBUG=False when running tests.
> Can this be a bug?

Perhaps, but there's not enough information yet to know for sure.
Anytime you're doing database performance testing, the particulars of
the database and data set matter a *huge* deal. Before I could draw
any conclusions from your data I'd want to know:

- What database engine are you using?
- Where's the database being stored (same server? other server? in-memory?)
- How much data is in the database?
- How big is that "info" field on an average model?

If you're benchmarking this against a small dataset and an in-memory
database like SQLite I'd fully expect to see the defer/only benchmark
to be slower. That's because every time a QS is chained it needs to be
copied, which is a relatively expensive operation. In a setup with
small data, the time spent in Python is going to outweigh the time
spent running the query in the database and sending the data over the
wire.

On the other hand, if I you have a million User objects with an
average of 1K in the info field and you're running against a remote
database -- situations that defer/only were specifically designed to
optimize -- I'd be *very* worried if defer/only was slower.

Make sense?

Jacob

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.