@Tim, sorry about that, I did a search before I posted, but it looks like 
it slipped away somehow.

@Anssi, thank you for your response.
I really haven't think about threat safety, but maybe its because in our 
specific usage it's close to impossible to happen.

What do you think about this:
1) Put everything into a transaction
2) Before update, count the queryset.
3) Fire the pre_update with the queryset, and the count.
4) Do the actual update -> get the updated rows
5) Assert the updated rows is equal to the queryset.count().
6) Revert the transaction if there is a difference.

No pk fetching, this is left to the be implemented by the listener if he 
needs it.

On Friday, March 31, 2017 at 8:01:45 AM UTC+3, Anssi Kääriäinen wrote:
>
> The problem with passing the queryset is that it's possible that some 
> object is added to or removed from the queryset between the pre_update and 
> actual update execution. To avoid this the execution should go somewhere 
> along the lines of:
>    1) if there is pre_update or post_update do stages 2-5, if not, update 
> as with current code
>    2) fetch primary keys for models to be updated to a set, with 
> .select_for_update() applied
>    3) fire pre_update, give the primary key set as argument
>    4) do the update against a queryset with .filter(pk__in=pk_set)
>    5) fire post_update with primary key set as argument
>
> This way the pre and post update signals will execute against a fixed set 
> of instances. The bad part is that this can significantly slow down the 
> .update() call, but to get actually safe signals, I don't see a way around 
> that.
>
>  - Anssi
>
> On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote:
>>
>> There's an accepted ticket about adding pre_update and post_update 
>> signals: https://code.djangoproject.com/ticket/21461. From a quick 
>> glance, I think this is what you're proposing.
>>
>> On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote:
>>>
>>> Consider the following piece of code:
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>>     instance = kwargs['instance']
>>>     if instance.verified:
>>>         do_something(instance)
>>>     else:
>>>         do_else(instance)
>>>
>>>
>>>
>>> Its good, because it keeps `MyModel` decoupled from `do_something`
>>>
>>> But there is one flaw. If we do:
>>>
>>> MyModel.objects.filter(verified=False).update(verified=True)
>>>
>>> we are screwed, `do_something` is not executed, and our models get 
>>> out-of-sync.
>>>
>>> If we try to get smart and manually fire the pre_save signal for each 
>>> instance, we are gonna have a hard time.
>>> Its gonna be slow.
>>> And its gonna be memory inefficient.
>>>
>>> We already experienced it in our app.
>>>
>>> So our new approach is like this:
>>>
>>>     pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"
>>> ])
>>>     post_bulk_update = Signal(providing_args=["update_kwargs",])
>>>
>>>     pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>     post_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>
>>>
>>>     class MyModelQuerySet(models.QuerySet):
>>>         def update(self, **kwargs):
>>>             pre_bulk_update.send(sender=self.model, queryset=self, 
>>> update_kwargs=kwargs)
>>>             res = super(MyModelQuerySet, self).update(**kwargs)
>>>             # The queryset will be altered after the update call
>>>             # so no reason to send it.
>>>             post_bulk_update.send(sender=self.model, update_kwargs=
>>> kwargs)
>>>             return res
>>>
>>>         def bulk_create(self, objs, batch_size=None):
>>>             pre_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>>             res = super(MyModelQuerySet, self).bulk_create(objs, 
>>> batch_size)
>>>             post_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>>             return res
>>>
>>>
>>>     class MyModel(models.Model):
>>>         #...
>>>         objects = MyModelQuerySet.as_manager()
>>>
>>>
>>>
>>> This gives us a nice interface to handle all kind of changes regarding 
>>> `MyModel`
>>> Our example usage looks like this:
>>>
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>>     instance = kwargs['instance']
>>>     if instance.verified:
>>>         do_something(instance)
>>>     else:
>>>         do_else(instance)
>>>
>>>
>>> @receiver(pre_bulk_update, sender=MyModel)
>>> def my_bulk_update_handler(sender, **kwargs):
>>>     update_kwargs = kwargs['update_kwargs']
>>>     if 'verified' not in update_kwargs:
>>>         # no change im interested in
>>>         # no need to take any action
>>>         return
>>>
>>>     queryset = kwargs['queryset']
>>>     pks_to_be_updated = queryset.values_list('pk', flat=True)
>>>     if update_kwargs['verified']:
>>>         do_something_bulk_update_implementation(pks_to_be_updated)
>>>     else:
>>>         bulk_update_do_else_implementation(pks_to_be_updated)
>>>
>>>
>>> @receiver(pre_bulk_create, sender=MyModel)
>>> def my_bulk_create_handler(sender, **kwargs):
>>>     objs = kwargs['objs']
>>>     group_1 = []
>>>     group_2 = []
>>>     for obj in objs:
>>>         if obj.verified:
>>>             group_1.append(obj)
>>>         else:
>>>             group_2.append(obj)
>>>
>>>     if group_1:
>>>         do_something_bulk_create_implementation(group_1)
>>>     if group_2:
>>>         bulk_create_do_else_implementation(group_2)
>>>
>>>
>>>
>>> I think this turns out to be a very clean approach.
>>> It help us use the most optimal strategy to handle the change.
>>> So I'm sharing this with the community to check your feedback.
>>> I believe if this gets into the Django Internals, it can be a very 
>>> powerful tool.
>>> It will lose power as a 3rd party app.
>>>
>>
On Friday, March 31, 2017 at 8:01:45 AM UTC+3, Anssi Kääriäinen wrote:
>
> The problem with passing the queryset is that it's possible that some 
> object is added to or removed from the queryset between the pre_update and 
> actual update execution. To avoid this the execution should go somewhere 
> along the lines of:
>    1) if there is pre_update or post_update do stages 2-5, if not, update 
> as with current code
>    2) fetch primary keys for models to be updated to a set, with 
> .select_for_update() applied
>    3) fire pre_update, give the primary key set as argument
>    4) do the update against a queryset with .filter(pk__in=pk_set)
>    5) fire post_update with primary key set as argument
>
> This way the pre and post update signals will execute against a fixed set 
> of instances. The bad part is that this can significantly slow down the 
> .update() call, but to get actually safe signals, I don't see a way around 
> that.
>
>  - Anssi
>
> On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote:
>>
>> There's an accepted ticket about adding pre_update and post_update 
>> signals: https://code.djangoproject.com/ticket/21461. From a quick 
>> glance, I think this is what you're proposing.
>>
>> On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote:
>>>
>>> Consider the following piece of code:
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>>     instance = kwargs['instance']
>>>     if instance.verified:
>>>         do_something(instance)
>>>     else:
>>>         do_else(instance)
>>>
>>>
>>>
>>> Its good, because it keeps `MyModel` decoupled from `do_something`
>>>
>>> But there is one flaw. If we do:
>>>
>>> MyModel.objects.filter(verified=False).update(verified=True)
>>>
>>> we are screwed, `do_something` is not executed, and our models get 
>>> out-of-sync.
>>>
>>> If we try to get smart and manually fire the pre_save signal for each 
>>> instance, we are gonna have a hard time.
>>> Its gonna be slow.
>>> And its gonna be memory inefficient.
>>>
>>> We already experienced it in our app.
>>>
>>> So our new approach is like this:
>>>
>>>     pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"
>>> ])
>>>     post_bulk_update = Signal(providing_args=["update_kwargs",])
>>>
>>>     pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>     post_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>
>>>
>>>     class MyModelQuerySet(models.QuerySet):
>>>         def update(self, **kwargs):
>>>             pre_bulk_update.send(sender=self.model, queryset=self, 
>>> update_kwargs=kwargs)
>>>             res = super(MyModelQuerySet, self).update(**kwargs)
>>>             # The queryset will be altered after the update call
>>>             # so no reason to send it.
>>>             post_bulk_update.send(sender=self.model, update_kwargs=
>>> kwargs)
>>>             return res
>>>
>>>         def bulk_create(self, objs, batch_size=None):
>>>             pre_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>>             res = super(MyModelQuerySet, self).bulk_create(objs, 
>>> batch_size)
>>>             post_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>>             return res
>>>
>>>
>>>     class MyModel(models.Model):
>>>         #...
>>>         objects = MyModelQuerySet.as_manager()
>>>
>>>
>>>
>>> This gives us a nice interface to handle all kind of changes regarding 
>>> `MyModel`
>>> Our example usage looks like this:
>>>
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>>     instance = kwargs['instance']
>>>     if instance.verified:
>>>         do_something(instance)
>>>     else:
>>>         do_else(instance)
>>>
>>>
>>> @receiver(pre_bulk_update, sender=MyModel)
>>> def my_bulk_update_handler(sender, **kwargs):
>>>     update_kwargs = kwargs['update_kwargs']
>>>     if 'verified' not in update_kwargs:
>>>         # no change im interested in
>>>         # no need to take any action
>>>         return
>>>
>>>     queryset = kwargs['queryset']
>>>     pks_to_be_updated = queryset.values_list('pk', flat=True)
>>>     if update_kwargs['verified']:
>>>         do_something_bulk_update_implementation(pks_to_be_updated)
>>>     else:
>>>         bulk_update_do_else_implementation(pks_to_be_updated)
>>>
>>>
>>> @receiver(pre_bulk_create, sender=MyModel)
>>> def my_bulk_create_handler(sender, **kwargs):
>>>     objs = kwargs['objs']
>>>     group_1 = []
>>>     group_2 = []
>>>     for obj in objs:
>>>         if obj.verified:
>>>             group_1.append(obj)
>>>         else:
>>>             group_2.append(obj)
>>>
>>>     if group_1:
>>>         do_something_bulk_create_implementation(group_1)
>>>     if group_2:
>>>         bulk_create_do_else_implementation(group_2)
>>>
>>>
>>>
>>> I think this turns out to be a very clean approach.
>>> It help us use the most optimal strategy to handle the change.
>>> So I'm sharing this with the community to check your feedback.
>>> I believe if this gets into the Django Internals, it can be a very 
>>> powerful tool.
>>> It will lose power as a 3rd party app.
>>>
>>
On Friday, March 31, 2017 at 8:01:45 AM UTC+3, Anssi Kääriäinen wrote:
>
> The problem with passing the queryset is that it's possible that some 
> object is added to or removed from the queryset between the pre_update and 
> actual update execution. To avoid this the execution should go somewhere 
> along the lines of:
>    1) if there is pre_update or post_update do stages 2-5, if not, update 
> as with current code
>    2) fetch primary keys for models to be updated to a set, with 
> .select_for_update() applied
>    3) fire pre_update, give the primary key set as argument
>    4) do the update against a queryset with .filter(pk__in=pk_set)
>    5) fire post_update with primary key set as argument
>
> This way the pre and post update signals will execute against a fixed set 
> of instances. The bad part is that this can significantly slow down the 
> .update() call, but to get actually safe signals, I don't see a way around 
> that.
>
>  - Anssi
>
> On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote:
>>
>> There's an accepted ticket about adding pre_update and post_update 
>> signals: https://code.djangoproject.com/ticket/21461. From a quick 
>> glance, I think this is what you're proposing.
>>
>> On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote:
>>>
>>> Consider the following piece of code:
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>>     instance = kwargs['instance']
>>>     if instance.verified:
>>>         do_something(instance)
>>>     else:
>>>         do_else(instance)
>>>
>>>
>>>
>>> Its good, because it keeps `MyModel` decoupled from `do_something`
>>>
>>> But there is one flaw. If we do:
>>>
>>> MyModel.objects.filter(verified=False).update(verified=True)
>>>
>>> we are screwed, `do_something` is not executed, and our models get 
>>> out-of-sync.
>>>
>>> If we try to get smart and manually fire the pre_save signal for each 
>>> instance, we are gonna have a hard time.
>>> Its gonna be slow.
>>> And its gonna be memory inefficient.
>>>
>>> We already experienced it in our app.
>>>
>>> So our new approach is like this:
>>>
>>>     pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"
>>> ])
>>>     post_bulk_update = Signal(providing_args=["update_kwargs",])
>>>
>>>     pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>     post_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>
>>>
>>>     class MyModelQuerySet(models.QuerySet):
>>>         def update(self, **kwargs):
>>>             pre_bulk_update.send(sender=self.model, queryset=self, 
>>> update_kwargs=kwargs)
>>>             res = super(MyModelQuerySet, self).update(**kwargs)
>>>             # The queryset will be altered after the update call
>>>             # so no reason to send it.
>>>             post_bulk_update.send(sender=self.model, update_kwargs=
>>> kwargs)
>>>             return res
>>>
>>>         def bulk_create(self, objs, batch_size=None):
>>>             pre_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>>             res = super(MyModelQuerySet, self).bulk_create(objs, 
>>> batch_size)
>>>             post_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>>             return res
>>>
>>>
>>>     class MyModel(models.Model):
>>>         #...
>>>         objects = MyModelQuerySet.as_manager()
>>>
>>>
>>>
>>> This gives us a nice interface to handle all kind of changes regarding 
>>> `MyModel`
>>> Our example usage looks like this:
>>>
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>>     instance = kwargs['instance']
>>>     if instance.verified:
>>>         do_something(instance)
>>>     else:
>>>         do_else(instance)
>>>
>>>
>>> @receiver(pre_bulk_update, sender=MyModel)
>>> def my_bulk_update_handler(sender, **kwargs):
>>>     update_kwargs = kwargs['update_kwargs']
>>>     if 'verified' not in update_kwargs:
>>>         # no change im interested in
>>>         # no need to take any action
>>>         return
>>>
>>>     queryset = kwargs['queryset']
>>>     pks_to_be_updated = queryset.values_list('pk', flat=True)
>>>     if update_kwargs['verified']:
>>>         do_something_bulk_update_implementation(pks_to_be_updated)
>>>     else:
>>>         bulk_update_do_else_implementation(pks_to_be_updated)
>>>
>>>
>>> @receiver(pre_bulk_create, sender=MyModel)
>>> def my_bulk_create_handler(sender, **kwargs):
>>>     objs = kwargs['objs']
>>>     group_1 = []
>>>     group_2 = []
>>>     for obj in objs:
>>>         if obj.verified:
>>>             group_1.append(obj)
>>>         else:
>>>             group_2.append(obj)
>>>
>>>     if group_1:
>>>         do_something_bulk_create_implementation(group_1)
>>>     if group_2:
>>>         bulk_create_do_else_implementation(group_2)
>>>
>>>
>>>
>>> I think this turns out to be a very clean approach.
>>> It help us use the most optimal strategy to handle the change.
>>> So I'm sharing this with the community to check your feedback.
>>> I believe if this gets into the Django Internals, it can be a very 
>>> powerful tool.
>>> It will lose power as a 3rd party app.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/cfb0f943-9aca-4322-9cc3-7c7e4d8bee40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to