Re: pre_init/post_init vs. performance

2012-07-28 Thread Anssi Kääriäinen
On 27 heinä, 18:58, Jeremy Dunck  wrote:
> Can I get a review of the branch?  25% performance improvement of
> Model.__init__ in stock django seems worth landing. :)

I did some work to add new benchmarks to djangobench. In my tests the
speedup for the init hooks approach was nearly 60% when both pre/post
init signals were in use in the project. Currently this happens when
there are at least one ImageField and one GenericForeignKey used in
the project. The test is fetching 500 objects from DB by
the .iterator() method.

We should do something to this issue. If the init hooks approach is
the way to go needs to be decided. The caching of signal receivers
gives the same speedup.

Some more info in ticket #18627.

 - Anssi

PS: for another possible nice speedup see ticket 
https://code.djangoproject.com/ticket/16759
which is about removing the use of __deepcopy__ from queryset cloning.
There is potential for major speedups using that approach. Just one
example: model.save() +50% speedup.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: pre_init/post_init vs. performance

2012-07-27 Thread Jeremy Dunck
Can I get a review of the branch?  25% performance improvement of
Model.__init__ in stock django seems worth landing. :)


On Wed, Jul 18, 2012 at 3:48 AM, Jeremy Dunck  wrote:
> On Sun, Jul 15, 2012 at 11:07 AM, Jeremy Dunck  wrote:
>>
>> That is indeed what I meant, but I think Anssi's probably right that
>> this belongs in contribute_to_class instead of a new adhoc thing in
>> ModelBase.__new__.
>>
>> I'll take a swing at making this change.
>>
>> Opened related ticket: https://code.djangoproject.com/ticket/18627
>
>
> OK, I have a very rough cut w/ all tests passing here:
> https://github.com/jdunck/django/tree/18627_pre_init
>
> I poked in a flag, sys.JDUNCK_NEW, which can be turned on/off for old
> and new implementation - this is poked in at django.__init__
>
> To test performance, using tests/test_sqlite modified with:
>
> INSTALLED_APPS = [
> 'regressiontests.model_fields'
> ]
>
> and
> DATABASES['default']['name'] = ':memory:'
>
> Testing the pre_init hook:
>
> timeit.Timer("""
> for i in names:
> models.Person(name=i)
> """, """
> from django.core.management import call_command
> from regressiontests.model_fields import models
> call_command('syncdb')
> names = [str(i) for i in range(1000)]
> """).timeit(1000)
>
> Under the existing signal usage: 24.609718084335327
> Under the new approach avoiding signal usage: 18.074234008789062
>
> I'd expect post_init to be a smaller gain since the destination
> function has more overhead, but still a gain.
>
> Note that the pre_init and post_init signals are still fired -- it's
> just that django as-shipped won't use any listeners on them, so the
> Model.__init__ performance improves.
>
> In some cases, object construction currently is larger than query time
> - a simple 'select x from y limit 1000' would have similar overhead to
> this.  Also, note that this gain would be seen on all models in any
> project using GFK or ImageField but no other pre_init or post_init
> hooks - not just those models which have the fields.  With this
> switch, only the models with the hooking fields have the related
> overhead.
>
> Feedback welcome.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: pre_init/post_init vs. performance

2012-07-18 Thread Jeremy Dunck
On Sun, Jul 15, 2012 at 11:07 AM, Jeremy Dunck  wrote:
>
> That is indeed what I meant, but I think Anssi's probably right that
> this belongs in contribute_to_class instead of a new adhoc thing in
> ModelBase.__new__.
>
> I'll take a swing at making this change.
>
> Opened related ticket: https://code.djangoproject.com/ticket/18627


OK, I have a very rough cut w/ all tests passing here:
https://github.com/jdunck/django/tree/18627_pre_init

I poked in a flag, sys.JDUNCK_NEW, which can be turned on/off for old
and new implementation - this is poked in at django.__init__

To test performance, using tests/test_sqlite modified with:

INSTALLED_APPS = [
'regressiontests.model_fields'
]

and
DATABASES['default']['name'] = ':memory:'

Testing the pre_init hook:

timeit.Timer("""
for i in names:
models.Person(name=i)
""", """
from django.core.management import call_command
from regressiontests.model_fields import models
call_command('syncdb')
names = [str(i) for i in range(1000)]
""").timeit(1000)

Under the existing signal usage: 24.609718084335327
Under the new approach avoiding signal usage: 18.074234008789062

I'd expect post_init to be a smaller gain since the destination
function has more overhead, but still a gain.

Note that the pre_init and post_init signals are still fired -- it's
just that django as-shipped won't use any listeners on them, so the
Model.__init__ performance improves.

In some cases, object construction currently is larger than query time
- a simple 'select x from y limit 1000' would have similar overhead to
this.  Also, note that this gain would be seen on all models in any
project using GFK or ImageField but no other pre_init or post_init
hooks - not just those models which have the fields.  With this
switch, only the models with the hooking fields have the related
overhead.

Feedback welcome.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: pre_init/post_init vs. performance

2012-07-15 Thread Jeremy Dunck
On Sun, Jul 15, 2012 at 9:51 AM, Andy McCurdy  wrote:
>
> On Jul 14, 2012, at 6:37 PM, Russell Keith-Magee wrote:
>
>>
>> My only concern is:
>>
>>> We could store which fields have these hooks upon ModelBase.__new__
>>> construction and so skip most fields and overhead in __init__.
>>
>> I'm not sure if I'm misreading your intentions here, but just to be
>> sure -- I'd like to avoid explicitly naming certain field types as
>> "special" in ModelBase if at all possible. If we explicitly name
>> certain fields in ModelBase, then it means that third-parties with the
>> same requirement won't be able to exploit the same API entry point. In
>> the case of GenericForeignKey specifically, it also means that we
>> would be introducing a dependency (even if it was name-only) on
>> contrib into core -- which is something we've been trying to move away
>> from.
>
>
>
> I think Jeremy means doing something like this in ModelBase.__new__:
>
>
> meta.fields_needing_init = [field for field in fields if hasattr(field, 
> 'model_pre_init')]
>
>
> And in ModelBase.__init__:
>
>
> for field in self._meta.fields_needing_init:
> field.model_pre_init(*args, **kwargs)
>
>
> This way, in ModelBase.__init__, only the fields that actually need to be 
> initialized have this method called.
>

That is indeed what I meant, but I think Anssi's probably right that
this belongs in contribute_to_class instead of a new adhoc thing in
ModelBase.__new__.

I'll take a swing at making this change.

Opened related ticket: https://code.djangoproject.com/ticket/18627

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: pre_init/post_init vs. performance

2012-07-15 Thread Andy McCurdy

On Jul 14, 2012, at 6:37 PM, Russell Keith-Magee wrote:

> 
> My only concern is:
> 
>> We could store which fields have these hooks upon ModelBase.__new__
>> construction and so skip most fields and overhead in __init__.
> 
> I'm not sure if I'm misreading your intentions here, but just to be
> sure -- I'd like to avoid explicitly naming certain field types as
> "special" in ModelBase if at all possible. If we explicitly name
> certain fields in ModelBase, then it means that third-parties with the
> same requirement won't be able to exploit the same API entry point. In
> the case of GenericForeignKey specifically, it also means that we
> would be introducing a dependency (even if it was name-only) on
> contrib into core -- which is something we've been trying to move away
> from.



I think Jeremy means doing something like this in ModelBase.__new__:


meta.fields_needing_init = [field for field in fields if hasattr(field, 
'model_pre_init')]


And in ModelBase.__init__:


for field in self._meta.fields_needing_init:
field.model_pre_init(*args, **kwargs)


This way, in ModelBase.__init__, only the fields that actually need to be 
initialized have this method called.


-andy

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: pre_init/post_init vs. performance

2012-07-14 Thread Russell Keith-Magee
On Sat, Jul 14, 2012 at 10:57 AM, Jeremy Dunck  wrote:
> I was poking around in our (Votizen's) use of signals and thinking
> about making some tooling so that signal usage was a bit more
> transparent.
>
> In doing so, I noticed that GenericForeignKey hooks the model pre_init
> signal.  It does that because GFK needs a chance to munge kwargs from
> the GFK field name to the content_id and object_id fields.
>
> Signal dispatch with zero receivers is much much faster than with 1 or
> more receivers, and pre_init is fired quite a lot.  Pretty much every
> decent sized project I've worked on has used GFK, so I imagine this is
> a large sap on performance overall.
>
> Similarly, I just noticed that ImageField does a similar thing,
> setting dimension values in post_init.
>
> I think we could get a significant performance win if, instead of
> using pre_init/post_init here, we changed Model.__init__ to call a
> per-field hook that could munge the Model.__init__ kwargs.
>
> SomeField.model_pre_init(instance, args, kwargs)
>   or maybe for symmetry with contribute_to_class:
> SomeField.contribute_to_instance(instance, args, kwargs)

+1 to this idea.

My only concern is:

> We could store which fields have these hooks upon ModelBase.__new__
> construction and so skip most fields and overhead in __init__.

I'm not sure if I'm misreading your intentions here, but just to be
sure -- I'd like to avoid explicitly naming certain field types as
"special" in ModelBase if at all possible. If we explicitly name
certain fields in ModelBase, then it means that third-parties with the
same requirement won't be able to exploit the same API entry point. In
the case of GenericForeignKey specifically, it also means that we
would be introducing a dependency (even if it was name-only) on
contrib into core -- which is something we've been trying to move away
from.

Russ %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: pre_init/post_init vs. performance

2012-07-14 Thread Anssi Kääriäinen
On 14 heinä, 05:57, Jeremy Dunck  wrote:
> I was poking around in our (Votizen's) use of signals and thinking
> about making some tooling so that signal usage was a bit more
> transparent.
>
> In doing so, I noticed that GenericForeignKey hooks the model pre_init
> signal.  It does that because GFK needs a chance to munge kwargs from
> the GFK field name to the content_id and object_id fields.
>
> Signal dispatch with zero receivers is much much faster than with 1 or
> more receivers, and pre_init is fired quite a lot.  Pretty much every
> decent sized project I've worked on has used GFK, so I imagine this is
> a large sap on performance overall.
>
> Similarly, I just noticed that ImageField does a similar thing,
> setting dimension values in post_init.
>
> I think we could get a significant performance win if, instead of
> using pre_init/post_init here, we changed Model.__init__ to call a
> per-field hook that could munge the Model.__init__ kwargs.
>
> SomeField.model_pre_init(instance, args, kwargs)
>   or maybe for symmetry with contribute_to_class:
> SomeField.contribute_to_instance(instance, args, kwargs)
>
> We could store which fields have these hooks upon ModelBase.__new__
> construction and so skip most fields and overhead in __init__.
>
> I'm happy to make a pull request but checking here to see if there's
> any pushback.

A big +1 from me.

Maybe the fields could explicitly add methods to pre/post init hooks
in contribute_to_class instead of method autodiscovery?

The signal receiver caching approach is another alternative (https://
code.djangoproject.com/ticket/16679). The downside of that approach is
that it complicates the already complex signals framework, so for that
reason it seems better to get rid of the post/pre init signals.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.