Caching performance regression between django 1.0 and django 1.1.1

2009-10-19 Thread mt

Hi All,
I recently upgraded a high traffic site from django 1.0 to django
1.1.1 and noticed that the load on the server went through the roof,
so I had to revert to django 1.0.
I've done some testing and noticed that the caching behaviour between
1. 0 and 1.1.1 has changed.
Basically I was caching expensive database queries using memcached, in
django 1.0 reading from the cache is a fast operation however in
django 1.1.1 reading from cache causes all fields with a callable as
default to be called.
In the case where that callable is an expensive operation the
performance is severely affected.
I've created patches showing this regression for the branches
http://code.djangoproject.com/svn/django/branches/releases/1.0.X
and
http://code.djangoproject.com/svn/django/branches/releases/1.1.X
which shows the default callable being called on cache read.

My questions are:
1. Is it wrong to store querysets in the cache?
2. Should I log a bug for this in the django tracker and upload my
test case patches?
3. Was this change made on purpose to fulfill other features?

Thanks
Michael
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Caching performance regression between django 1.0 and django 1.1.1

2009-10-19 Thread mt

Hi All,
I recently upgraded a high traffic site from django 1.0 to django
1.1.1 and noticed that the load on the server went through the roof,
so I had to revert to django 1.0.
I've done some testing and noticed that the caching behaviour between
1. 0 and 1.1.1 has changed.
Basically I was caching expensive database queries using memcached, in
django 1.0 reading from the cache is a fast operation however in
django 1.1.1 reading from cache causes all fields with a callable as
default to be called.
In the case where that callable is an expensive operation the
performance is severely affected.
I've created patches showing this regression for the branches
http://code.djangoproject.com/svn/django/branches/releases/1.0.X
and
http://code.djangoproject.com/svn/django/branches/releases/1.1.X
which shows the default callable being called on cache read.

My questions are:
1. Is it wrong to store querysets in the cache?
2. Should I log a bug for this in the django tracker and upload my
test case patches?
3. Was this change made on purpose to fulfill other features?

Thanks
Michael
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Caching performance regression between django 1.0 and django 1.1.1

2009-10-19 Thread Russell Keith-Magee

On Mon, Oct 19, 2009 at 6:52 PM, mt  wrote:
>
> Hi All,
> I recently upgraded a high traffic site from django 1.0 to django
> 1.1.1 and noticed that the load on the server went through the roof,
> so I had to revert to django 1.0.
> I've done some testing and noticed that the caching behaviour between
> 1. 0 and 1.1.1 has changed.
> Basically I was caching expensive database queries using memcached, in
> django 1.0 reading from the cache is a fast operation however in
> django 1.1.1 reading from cache causes all fields with a callable as
> default to be called.
> In the case where that callable is an expensive operation the
> performance is severely affected.
> I've created patches showing this regression for the branches
> http://code.djangoproject.com/svn/django/branches/releases/1.0.X
> and
> http://code.djangoproject.com/svn/django/branches/releases/1.1.X
> which shows the default callable being called on cache read.
>
> My questions are:
> 1. Is it wrong to store querysets in the cache?

Strictly, a queryset doesn't contain *any* data - it's just a
programatic representation of a SQL query. So, putting a queryset in
the cache doesn't inherently mean that anything is being stored.
However, if the queryset has been evaluated, there might be something
in the queryset's internal result cache, and *that* value will be
returned when the value is retrieved from the cache.

> 2. Should I log a bug for this in the django tracker and upload my
> test case patches?

Certainly.

> 3. Was this change made on purpose to fulfill other features?

I can't think of any obvious reason that you would see a regression of
that sort - the core infrastructure for queries didn't change between
v1.0 and v1.1. There were some features added (such as aggregates),
and some cleanup of the internals of value_list() and values()
querysets, but those changes shouldn't have affected caching
behaviour.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Caching performance regression between django 1.0 and django 1.1.1

2009-10-19 Thread mt



On Oct 19, 1:08 pm, Russell Keith-Magee 
wrote:
> On Mon, Oct 19, 2009 at 6:52 PM, mt  wrote:
>
> > Hi All,
> > I recently upgraded a high traffic site from django 1.0 to django
> > 1.1.1 and noticed that the load on the server went through the roof,
> > so I had to revert to django 1.0.
> > I've done some testing and noticed that the caching behaviour between
> > 1. 0 and 1.1.1 has changed.
> > Basically I was caching expensive database queries using memcached, in
> > django 1.0 reading from the cache is a fast operation however in
> > django 1.1.1 reading from cache causes all fields with a callable as
> > default to be called.
> > In the case where that callable is an expensive operation the
> > performance is severely affected.
> > I've created patches showing this regression for the branches
> >http://code.djangoproject.com/svn/django/branches/releases/1.0.X
> > and
> >http://code.djangoproject.com/svn/django/branches/releases/1.1.X
> > which shows the default callable being called on cache read.
>
> > My questions are:
> > 1. Is it wrong to store querysets in the cache?
>
> Strictly, a queryset doesn't contain *any* data - it's just a
> programatic representation of a SQL query. So, putting a queryset in
> the cache doesn't inherently mean that anything is being stored.
> However, if the queryset has been evaluated, there might be something
> in the queryset's internal result cache, and *that* value will be
> returned when the value is retrieved from the cache.
>
> > 2. Should I log a bug for this in the django tracker and upload my
> > test case patches?
>
> Certainly.
Done: http://code.djangoproject.com/ticket/12057
Thanks
Michael

>
> > 3. Was this change made on purpose to fulfill other features?
>
> I can't think of any obvious reason that you would see a regression of
> that sort - the core infrastructure for queries didn't change between
> v1.0 and v1.1. There were some features added (such as aggregates),
> and some cleanup of the internals of value_list() and values()
> querysets, but those changes shouldn't have affected caching
> behaviour.
>
> Yours,
> Russ Magee %-)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Caching performance regression between django 1.0 and django 1.1.1

2009-10-22 Thread mt



On Oct 19, 1:08 pm, Russell Keith-Magee 
wrote:
> On Mon, Oct 19, 2009 at 6:52 PM, mt  wrote:
>
> > Hi All,
> > I recently upgraded a high traffic site from django 1.0 to django
> > 1.1.1 and noticed that the load on the server went through the roof,
> > so I had to revert to django 1.0.
> > I've done some testing and noticed that the caching behaviour between
> > 1. 0 and 1.1.1 has changed.
> > Basically I was caching expensive database queries using memcached, in
> > django 1.0 reading from the cache is a fast operation however in
> > django 1.1.1 reading from cache causes all fields with a callable as
> > default to be called.
> > In the case where that callable is an expensive operation the
> > performance is severely affected.
> > I've created patches showing this regression for the branches
> >http://code.djangoproject.com/svn/django/branches/releases/1.0.X
> > and
> >http://code.djangoproject.com/svn/django/branches/releases/1.1.X
> > which shows the default callable being called on cache read.
>
> > My questions are:
> > 1. Is it wrong to store querysets in the cache?
>
> Strictly, a queryset doesn't contain *any* data - it's just a
> programatic representation of a SQL query. So, putting a queryset in
> the cache doesn't inherently mean that anything is being stored.
> However, if the queryset has been evaluated, there might be something
> in the queryset's internal result cache, and *that* value will be
> returned when the value is retrieved from the cache.
>
> > 2. Should I log a bug for this in the django tracker and upload my
> > test case patches?
>
> Certainly.
Hi Russell,
I've found the cause of the performance regression and have suggested
a patch on ticket 12057 (http://code.djangoproject.com/ticket/12057)
The patch runs without causing any other tests to fail in the django
tree.
Do you think this patch could have other side effects or is it OK?
Thanks
Michael

>
> > 3. Was this change made on purpose to fulfill other features?
>
> I can't think of any obvious reason that you would see a regression of
> that sort - the core infrastructure for queries didn't change between
> v1.0 and v1.1. There were some features added (such as aggregates),
> and some cleanup of the internals of value_list() and values()
> querysets, but those changes shouldn't have affected caching
> behaviour.
>
> Yours,
> Russ Magee %-)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Caching performance regression between django 1.0 and django 1.1.1

2009-10-22 Thread Russell Keith-Magee

On Thu, Oct 22, 2009 at 4:38 PM, mt  wrote:
>
>
>
> On Oct 19, 1:08 pm, Russell Keith-Magee 
> wrote:
>> On Mon, Oct 19, 2009 at 6:52 PM, mt  wrote:
>>
>> > Hi All,
>> > I recently upgraded a high traffic site from django 1.0 to django
>> > 1.1.1 and noticed that the load on the server went through the roof,
>> > so I had to revert to django 1.0.
>> > I've done some testing and noticed that the caching behaviour between
>> > 1. 0 and 1.1.1 has changed.
>> > Basically I was caching expensive database queries using memcached, in
>> > django 1.0 reading from the cache is a fast operation however in
>> > django 1.1.1 reading from cache causes all fields with a callable as
>> > default to be called.
>> > In the case where that callable is an expensive operation the
>> > performance is severely affected.
>> > I've created patches showing this regression for the branches
>> >http://code.djangoproject.com/svn/django/branches/releases/1.0.X
>> > and
>> >http://code.djangoproject.com/svn/django/branches/releases/1.1.X
>> > which shows the default callable being called on cache read.
>>
>> > My questions are:
>> > 1. Is it wrong to store querysets in the cache?
>>
>> Strictly, a queryset doesn't contain *any* data - it's just a
>> programatic representation of a SQL query. So, putting a queryset in
>> the cache doesn't inherently mean that anything is being stored.
>> However, if the queryset has been evaluated, there might be something
>> in the queryset's internal result cache, and *that* value will be
>> returned when the value is retrieved from the cache.
>>
>> > 2. Should I log a bug for this in the django tracker and upload my
>> > test case patches?
>>
>> Certainly.
> Hi Russell,
> I've found the cause of the performance regression and have suggested
> a patch on ticket 12057 (http://code.djangoproject.com/ticket/12057)
> The patch runs without causing any other tests to fail in the django
> tree.
> Do you think this patch could have other side effects or is it OK?

First off - the test case is awesome - that helps a lot.

As for the fix: there's a much easier solution - on line 356 of
django/db/models/base.py, replace the current implementation:

return (self.__class__, (), data)

with an implementation that essentially reverts to the underlying
implementation of __reduce__:

return super(Model, self).__reduce__()

You can see the underlying problem if you inspect super().__reduce__,
and compare it to the current return value.

As a result, I suspect that an analogous problem will exist for any
model with deferred fields. I'll need to confirm this before I commit
the fix. If you can spare any time to look into this, I'd be much
obliged.

Yours,
Russ Magee %-)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Caching performance regression between django 1.0 and django 1.1.1

2009-10-23 Thread mt



On Oct 22, 1:46 pm, Russell Keith-Magee 
wrote:
> On Thu, Oct 22, 2009 at 4:38 PM, mt  wrote:
>
> > On Oct 19, 1:08 pm, Russell Keith-Magee 
> > wrote:
> >> On Mon, Oct 19, 2009 at 6:52 PM, mt  wrote:
>
> >> > Hi All,
> >> > I recently upgraded a high traffic site from django 1.0 to django
> >> > 1.1.1 and noticed that the load on the server went through the roof,
> >> > so I had to revert to django 1.0.
> >> > I've done some testing and noticed that the caching behaviour between
> >> > 1. 0 and 1.1.1 has changed.
> >> > Basically I was caching expensive database queries using memcached, in
> >> > django 1.0 reading from the cache is a fast operation however in
> >> > django 1.1.1 reading from cache causes all fields with a callable as
> >> > default to be called.
> >> > In the case where that callable is an expensive operation the
> >> > performance is severely affected.
> >> > I've created patches showing this regression for the branches
> >> >http://code.djangoproject.com/svn/django/branches/releases/1.0.X
> >> > and
> >> >http://code.djangoproject.com/svn/django/branches/releases/1.1.X
> >> > which shows the default callable being called on cache read.
>
> >> > My questions are:
> >> > 1. Is it wrong to store querysets in the cache?
>
> >> Strictly, a queryset doesn't contain *any* data - it's just a
> >> programatic representation of a SQL query. So, putting a queryset in
> >> the cache doesn't inherently mean that anything is being stored.
> >> However, if the queryset has been evaluated, there might be something
> >> in the queryset's internal result cache, and *that* value will be
> >> returned when the value is retrieved from the cache.
>
> >> > 2. Should I log a bug for this in the django tracker and upload my
> >> > test case patches?
>
> >> Certainly.
> > Hi Russell,
> > I've found the cause of the performance regression and have suggested
> > a patch on ticket 12057 (http://code.djangoproject.com/ticket/12057)
> > The patch runs without causing any other tests to fail in the django
> > tree.
> > Do you think this patch could have other side effects or is it OK?
>
> First off - the test case is awesome - that helps a lot.
>
> As for the fix: there's a much easier solution - on line 356 of
> django/db/models/base.py, replace the current implementation:
>
> return (self.__class__, (), data)
>
> with an implementation that essentially reverts to the underlying
> implementation of __reduce__:
>
> return super(Model, self).__reduce__()
>
> You can see the underlying problem if you inspect super().__reduce__,
> and compare it to the current return value.
>
> As a result, I suspect that an analogous problem will exist for any
> model with deferred fields. I'll need to confirm this before I commit
> the fix. If you can spare any time to look into this, I'd be much
> obliged.
Hi Russell,
Thanks for taking the time to look at this.
Your solution is much more elegant and the test case passes using your
change to base.py.
With regard to models with deferred fields your suspicions are
correct.
I added a new test case for models with deferred fields (http://
code.djangoproject.com/attachment/ticket/12057/
prevent__init__on_pickle_load.patch) to test querysets that use
defer.
This test currently fails when I:
cd django (root directory that contains tests folder)
echo "DATABASE_ENGINE = 'sqlite3'" > settings.py
PYTHONPATH=`pwd`:$PYTHONPATH ./tests/runtests.py --settings=settings -
v1 cache

Thanks
Michael
>
> Yours,
> Russ Magee %-)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Caching performance regression between django 1.0 and django 1.1.1

2009-10-27 Thread mt



On Oct 22, 12:46 pm, Russell Keith-Magee 
wrote:
> On Thu, Oct 22, 2009 at 4:38 PM, mt  wrote:
>
> > On Oct 19, 1:08 pm, Russell Keith-Magee 
> > wrote:
> >> On Mon, Oct 19, 2009 at 6:52 PM, mt  wrote:
>
> >> > Hi All,
> >> > I recently upgraded a high traffic site from django 1.0 to django
> >> > 1.1.1 and noticed that the load on the server went through the roof,
> >> > so I had to revert to django 1.0.
> >> > I've done some testing and noticed that the caching behaviour between
> >> > 1. 0 and 1.1.1 has changed.
> >> > Basically I was caching expensive database queries using memcached, in
> >> > django 1.0 reading from the cache is a fast operation however in
> >> > django 1.1.1 reading from cache causes all fields with a callable as
> >> > default to be called.
> >> > In the case where that callable is an expensive operation the
> >> > performance is severely affected.
> >> > I've created patches showing this regression for the branches
> >> >http://code.djangoproject.com/svn/django/branches/releases/1.0.X
> >> > and
> >> >http://code.djangoproject.com/svn/django/branches/releases/1.1.X
> >> > which shows the default callable being called on cache read.
>
> >> > My questions are:
> >> > 1. Is it wrong to store querysets in the cache?
>
> >> Strictly, a queryset doesn't contain *any* data - it's just a
> >> programatic representation of a SQL query. So, putting a queryset in
> >> the cache doesn't inherently mean that anything is being stored.
> >> However, if the queryset has been evaluated, there might be something
> >> in the queryset's internal result cache, and *that* value will be
> >> returned when the value is retrieved from the cache.
>
> >> > 2. Should I log a bug for this in the django tracker and upload my
> >> > test case patches?
>
> >> Certainly.
> > Hi Russell,
> > I've found the cause of the performance regression and have suggested
> > a patch on ticket 12057 (http://code.djangoproject.com/ticket/12057)
> > The patch runs without causing any other tests to fail in the django
> > tree.
> > Do you think this patch could have other side effects or is it OK?
>
> First off - the test case is awesome - that helps a lot.
>
> As for the fix: there's a much easier solution - on line 356 of
> django/db/models/base.py, replace the current implementation:
>
> return (self.__class__, (), data)
>
> with an implementation that essentially reverts to the underlying
> implementation of __reduce__:
>
> return super(Model, self).__reduce__()
>
> You can see the underlying problem if you inspect super().__reduce__,
> and compare it to the current return value.
>
> As a result, I suspect that an analogous problem will exist for any
> model with deferred fields. I'll need to confirm this before I commit
> the fix. If you can spare any time to look into this, I'd be much
> obliged.
Hi Russell,
I've spent some more time delving in to this (http://
code.djangoproject.com/ticket/12057)
and the situation seems to be that the patch we have made resolves the
problem with unpickling of model instances with callables as default.
However I've come across another possible bug.
Pickling of a model instance with deferred fields causes a model
instances __init__function to be called and therefore any fields with
a callable as default.
This looks like it's done by design (http://code.djangoproject.com/
browser/django/trunk/django/db/models/query.py?rev=11669#L241)
If this is by design can you let me know and I will modify the cache
write test case to pass and maybe we can commit the patch?
Thanks
Michael

>
> Yours,
> Russ Magee %-)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---