Re: Adding signals to bulk update/create operations
@Anssi, Perhaps we could use your approach above, except using rowids instead of primary keys? Many SQL flavors support retrieving the rowids of the created/updated rows within the same query as a bulk create/update (e.g. OUTPUT or RETURNING). Greg On Thursday, April 6, 2017 at 12:53:55 AM UTC-5, Anssi Kääriäinen wrote: > > On Friday, March 31, 2017 at 10:50:54 AM UTC+3, Todor Velichkov wrote: >> >> @Anssi, thank you for your response. >> I really haven't think about threat safety, but maybe its because in our >> specific usage it's close to impossible to happen. >> >> What do you think about this: >> 1) Put everything into a transaction >> 2) Before update, count the queryset. >> 3) Fire the pre_update with the queryset, and the count. >> 4) Do the actual update -> get the updated rows >> 5) Assert the updated rows is equal to the queryset.count(). >> 6) Revert the transaction if there is a difference. >> >> No pk fetching, this is left to the be implemented by the listener if he >> needs it. >> > > I'm afraid this won't work well enough. It does work for many use cases, > but in cases where concurrency is an issue, you'd get some loud failures, > and worse, you could get cases where things work silently wrong (one item > removed, one added concurrently). These cases would be near-impossible to > debug. > > The reason why we don't have update signals is that it's hard to find a > solution that allows users to run actions when an instance is updated, > works correctly in concurrent cases and with large amount of objects and > doesn't turn bulk update in to a loop of save() calls. > > Maybe the best approach is to look for a solution where users can > customise update behaviour for remote models if they so wish. Then you > could write your own update signals solution, one that doesn't need to care > about all of the above constraints. > > - Anssi > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/08303c25-96a8-4aa0-b325-ef5dc91ed8d1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Adding signals to bulk update/create operations
On Friday, March 31, 2017 at 10:50:54 AM UTC+3, Todor Velichkov wrote: > > @Anssi, thank you for your response. > I really haven't think about threat safety, but maybe its because in our > specific usage it's close to impossible to happen. > > What do you think about this: > 1) Put everything into a transaction > 2) Before update, count the queryset. > 3) Fire the pre_update with the queryset, and the count. > 4) Do the actual update -> get the updated rows > 5) Assert the updated rows is equal to the queryset.count(). > 6) Revert the transaction if there is a difference. > > No pk fetching, this is left to the be implemented by the listener if he > needs it. > I'm afraid this won't work well enough. It does work for many use cases, but in cases where concurrency is an issue, you'd get some loud failures, and worse, you could get cases where things work silently wrong (one item removed, one added concurrently). These cases would be near-impossible to debug. The reason why we don't have update signals is that it's hard to find a solution that allows users to run actions when an instance is updated, works correctly in concurrent cases and with large amount of objects and doesn't turn bulk update in to a loop of save() calls. Maybe the best approach is to look for a solution where users can customise update behaviour for remote models if they so wish. Then you could write your own update signals solution, one that doesn't need to care about all of the above constraints. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/fd81d210-a2d6-4e7a-9eb0-d3f8f078a032%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Adding signals to bulk update/create operations
@Tim, sorry about that, I did a search before I posted, but it looks like it slipped away somehow. @Anssi, thank you for your response. I really haven't think about threat safety, but maybe its because in our specific usage it's close to impossible to happen. What do you think about this: 1) Put everything into a transaction 2) Before update, count the queryset. 3) Fire the pre_update with the queryset, and the count. 4) Do the actual update -> get the updated rows 5) Assert the updated rows is equal to the queryset.count(). 6) Revert the transaction if there is a difference. No pk fetching, this is left to the be implemented by the listener if he needs it. On Friday, March 31, 2017 at 8:01:45 AM UTC+3, Anssi Kääriäinen wrote: > > The problem with passing the queryset is that it's possible that some > object is added to or removed from the queryset between the pre_update and > actual update execution. To avoid this the execution should go somewhere > along the lines of: >1) if there is pre_update or post_update do stages 2-5, if not, update > as with current code >2) fetch primary keys for models to be updated to a set, with > .select_for_update() applied >3) fire pre_update, give the primary key set as argument >4) do the update against a queryset with .filter(pk__in=pk_set) >5) fire post_update with primary key set as argument > > This way the pre and post update signals will execute against a fixed set > of instances. The bad part is that this can significantly slow down the > .update() call, but to get actually safe signals, I don't see a way around > that. > > - Anssi > > On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote: >> >> There's an accepted ticket about adding pre_update and post_update >> signals: https://code.djangoproject.com/ticket/21461. From a quick >> glance, I think this is what you're proposing. >> >> On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote: >>> >>> Consider the following piece of code: >>> >>> @receiver(pre_save, sender=MyModel) >>> def my_handler(sender, **kwargs): >>> instance = kwargs['instance'] >>> if instance.verified: >>> do_something(instance) >>> else: >>> do_else(instance) >>> >>> >>> >>> Its good, because it keeps `MyModel` decoupled from `do_something` >>> >>> But there is one flaw. If we do: >>> >>> MyModel.objects.filter(verified=False).update(verified=True) >>> >>> we are screwed, `do_something` is not executed, and our models get >>> out-of-sync. >>> >>> If we try to get smart and manually fire the pre_save signal for each >>> instance, we are gonna have a hard time. >>> Its gonna be slow. >>> And its gonna be memory inefficient. >>> >>> We already experienced it in our app. >>> >>> So our new approach is like this: >>> >>> pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs" >>> ]) >>> post_bulk_update = Signal(providing_args=["update_kwargs",]) >>> >>> pre_bulk_create = Signal(providing_args=["objs", "batch_size"]) >>> post_bulk_create = Signal(providing_args=["objs", "batch_size"]) >>> >>> >>> class MyModelQuerySet(models.QuerySet): >>> def update(self, **kwargs): >>> pre_bulk_update.send(sender=self.model, queryset=self, >>> update_kwargs=kwargs) >>> res = super(MyModelQuerySet, self).update(**kwargs) >>> # The queryset will be altered after the update call >>> # so no reason to send it. >>> post_bulk_update.send(sender=self.model, update_kwargs= >>> kwargs) >>> return res >>> >>> def bulk_create(self, objs, batch_size=None): >>> pre_bulk_create.send(sender=self.model, objs=objs, >>> batch_size=batch_size) >>> res = super(MyModelQuerySet, self).bulk_create(objs, >>> batch_size) >>> post_bulk_create.send(sender=self.model, objs=objs, >>> batch_size=batch_size) >>> return res >>> >>> >>> class MyModel(models.Model): >>> #... >>> objects = MyModelQuerySet.as_manager() >>> >>> >>> >>> This gives us a nice interface to handle all kind of changes regarding >>> `MyModel` >>> Our example usage looks like this: >>> >>> >>> @receiver(pre_save, sender=MyModel) >>> def my_handler(sender, **kwargs): >>> instance = kwargs['instance'] >>> if instance.verified: >>> do_something(instance) >>> else: >>> do_else(instance) >>> >>> >>> @receiver(pre_bulk_update, sender=MyModel) >>> def my_bulk_update_handler(sender, **kwargs): >>> update_kwargs = kwargs['update_kwargs'] >>> if 'verified' not in update_kwargs: >>> # no change im interested in >>> # no need to take any action >>> return >>> >>> queryset = kwargs['queryset'] >>> pks_to_be_updated = queryset.values_list('pk', flat=True) >>> if update_kwargs['verified']: >>> do_something_bulk_update_implementation(pks_to_be_updated) >>> else: >>>
Re: Adding signals to bulk update/create operations
The problem with passing the queryset is that it's possible that some object is added to or removed from the queryset between the pre_update and actual update execution. To avoid this the execution should go somewhere along the lines of: 1) if there is pre_update or post_update do stages 2-5, if not, update as with current code 2) fetch primary keys for models to be updated to a set, with .select_for_update() applied 3) fire pre_update, give the primary key set as argument 4) do the update against a queryset with .filter(pk__in=pk_set) 5) fire post_update with primary key set as argument This way the pre and post update signals will execute against a fixed set of instances. The bad part is that this can significantly slow down the .update() call, but to get actually safe signals, I don't see a way around that. - Anssi On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote: > > There's an accepted ticket about adding pre_update and post_update > signals: https://code.djangoproject.com/ticket/21461. From a quick > glance, I think this is what you're proposing. > > On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote: >> >> Consider the following piece of code: >> >> @receiver(pre_save, sender=MyModel) >> def my_handler(sender, **kwargs): >> instance = kwargs['instance'] >> if instance.verified: >> do_something(instance) >> else: >> do_else(instance) >> >> >> >> Its good, because it keeps `MyModel` decoupled from `do_something` >> >> But there is one flaw. If we do: >> >> MyModel.objects.filter(verified=False).update(verified=True) >> >> we are screwed, `do_something` is not executed, and our models get >> out-of-sync. >> >> If we try to get smart and manually fire the pre_save signal for each >> instance, we are gonna have a hard time. >> Its gonna be slow. >> And its gonna be memory inefficient. >> >> We already experienced it in our app. >> >> So our new approach is like this: >> >> pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs" >> ]) >> post_bulk_update = Signal(providing_args=["update_kwargs",]) >> >> pre_bulk_create = Signal(providing_args=["objs", "batch_size"]) >> post_bulk_create = Signal(providing_args=["objs", "batch_size"]) >> >> >> class MyModelQuerySet(models.QuerySet): >> def update(self, **kwargs): >> pre_bulk_update.send(sender=self.model, queryset=self, >> update_kwargs=kwargs) >> res = super(MyModelQuerySet, self).update(**kwargs) >> # The queryset will be altered after the update call >> # so no reason to send it. >> post_bulk_update.send(sender=self.model, update_kwargs=kwargs >> ) >> return res >> >> def bulk_create(self, objs, batch_size=None): >> pre_bulk_create.send(sender=self.model, objs=objs, batch_size >> =batch_size) >> res = super(MyModelQuerySet, self).bulk_create(objs, >> batch_size) >> post_bulk_create.send(sender=self.model, objs=objs, >> batch_size=batch_size) >> return res >> >> >> class MyModel(models.Model): >> #... >> objects = MyModelQuerySet.as_manager() >> >> >> >> This gives us a nice interface to handle all kind of changes regarding >> `MyModel` >> Our example usage looks like this: >> >> >> @receiver(pre_save, sender=MyModel) >> def my_handler(sender, **kwargs): >> instance = kwargs['instance'] >> if instance.verified: >> do_something(instance) >> else: >> do_else(instance) >> >> >> @receiver(pre_bulk_update, sender=MyModel) >> def my_bulk_update_handler(sender, **kwargs): >> update_kwargs = kwargs['update_kwargs'] >> if 'verified' not in update_kwargs: >> # no change im interested in >> # no need to take any action >> return >> >> queryset = kwargs['queryset'] >> pks_to_be_updated = queryset.values_list('pk', flat=True) >> if update_kwargs['verified']: >> do_something_bulk_update_implementation(pks_to_be_updated) >> else: >> bulk_update_do_else_implementation(pks_to_be_updated) >> >> >> @receiver(pre_bulk_create, sender=MyModel) >> def my_bulk_create_handler(sender, **kwargs): >> objs = kwargs['objs'] >> group_1 = [] >> group_2 = [] >> for obj in objs: >> if obj.verified: >> group_1.append(obj) >> else: >> group_2.append(obj) >> >> if group_1: >> do_something_bulk_create_implementation(group_1) >> if group_2: >> bulk_create_do_else_implementation(group_2) >> >> >> >> I think this turns out to be a very clean approach. >> It help us use the most optimal strategy to handle the change. >> So I'm sharing this with the community to check your feedback. >> I believe if this gets into the Django Internals, it can be a very >> powerful tool. >> It will lose power as a 3rd party app. >> > -- You received this message
Re: Adding signals to bulk update/create operations
There's an accepted ticket about adding pre_update and post_update signals: https://code.djangoproject.com/ticket/21461. From a quick glance, I think this is what you're proposing. On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote: > > Consider the following piece of code: > > @receiver(pre_save, sender=MyModel) > def my_handler(sender, **kwargs): > instance = kwargs['instance'] > if instance.verified: > do_something(instance) > else: > do_else(instance) > > > > Its good, because it keeps `MyModel` decoupled from `do_something` > > But there is one flaw. If we do: > > MyModel.objects.filter(verified=False).update(verified=True) > > we are screwed, `do_something` is not executed, and our models get > out-of-sync. > > If we try to get smart and manually fire the pre_save signal for each > instance, we are gonna have a hard time. > Its gonna be slow. > And its gonna be memory inefficient. > > We already experienced it in our app. > > So our new approach is like this: > > pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"]) > post_bulk_update = Signal(providing_args=["update_kwargs",]) > > pre_bulk_create = Signal(providing_args=["objs", "batch_size"]) > post_bulk_create = Signal(providing_args=["objs", "batch_size"]) > > > class MyModelQuerySet(models.QuerySet): > def update(self, **kwargs): > pre_bulk_update.send(sender=self.model, queryset=self, > update_kwargs=kwargs) > res = super(MyModelQuerySet, self).update(**kwargs) > # The queryset will be altered after the update call > # so no reason to send it. > post_bulk_update.send(sender=self.model, update_kwargs=kwargs) > return res > > def bulk_create(self, objs, batch_size=None): > pre_bulk_create.send(sender=self.model, objs=objs, batch_size= > batch_size) > res = super(MyModelQuerySet, self).bulk_create(objs, > batch_size) > post_bulk_create.send(sender=self.model, objs=objs, batch_size > =batch_size) > return res > > > class MyModel(models.Model): > #... > objects = MyModelQuerySet.as_manager() > > > > This gives us a nice interface to handle all kind of changes regarding > `MyModel` > Our example usage looks like this: > > > @receiver(pre_save, sender=MyModel) > def my_handler(sender, **kwargs): > instance = kwargs['instance'] > if instance.verified: > do_something(instance) > else: > do_else(instance) > > > @receiver(pre_bulk_update, sender=MyModel) > def my_bulk_update_handler(sender, **kwargs): > update_kwargs = kwargs['update_kwargs'] > if 'verified' not in update_kwargs: > # no change im interested in > # no need to take any action > return > > queryset = kwargs['queryset'] > pks_to_be_updated = queryset.values_list('pk', flat=True) > if update_kwargs['verified']: > do_something_bulk_update_implementation(pks_to_be_updated) > else: > bulk_update_do_else_implementation(pks_to_be_updated) > > > @receiver(pre_bulk_create, sender=MyModel) > def my_bulk_create_handler(sender, **kwargs): > objs = kwargs['objs'] > group_1 = [] > group_2 = [] > for obj in objs: > if obj.verified: > group_1.append(obj) > else: > group_2.append(obj) > > if group_1: > do_something_bulk_create_implementation(group_1) > if group_2: > bulk_create_do_else_implementation(group_2) > > > > I think this turns out to be a very clean approach. > It help us use the most optimal strategy to handle the change. > So I'm sharing this with the community to check your feedback. > I believe if this gets into the Django Internals, it can be a very > powerful tool. > It will lose power as a 3rd party app. > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/81796184-84b8-4954-900a-2bf9ce965fe8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Adding signals to bulk update/create operations
Consider the following piece of code: @receiver(pre_save, sender=MyModel) def my_handler(sender, **kwargs): instance = kwargs['instance'] if instance.verified: do_something(instance) else: do_else(instance) Its good, because it keeps `MyModel` decoupled from `do_something` But there is one flaw. If we do: MyModel.objects.filter(verified=False).update(verified=True) we are screwed, `do_something` is not executed, and our models get out-of-sync. If we try to get smart and manually fire the pre_save signal for each instance, we are gonna have a hard time. Its gonna be slow. And its gonna be memory inefficient. We already experienced it in our app. So our new approach is like this: pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"]) post_bulk_update = Signal(providing_args=["update_kwargs",]) pre_bulk_create = Signal(providing_args=["objs", "batch_size"]) post_bulk_create = Signal(providing_args=["objs", "batch_size"]) class MyModelQuerySet(models.QuerySet): def update(self, **kwargs): pre_bulk_update.send(sender=self.model, queryset=self, update_kwargs=kwargs) res = super(MyModelQuerySet, self).update(**kwargs) # The queryset will be altered after the update call # so no reason to send it. post_bulk_update.send(sender=self.model, update_kwargs=kwargs) return res def bulk_create(self, objs, batch_size=None): pre_bulk_create.send(sender=self.model, objs=objs, batch_size= batch_size) res = super(MyModelQuerySet, self).bulk_create(objs, batch_size) post_bulk_create.send(sender=self.model, objs=objs, batch_size= batch_size) return res class MyModel(models.Model): #... objects = MyModelQuerySet.as_manager() This gives us a nice interface to handle all kind of changes regarding `MyModel` Our example usage looks like this: @receiver(pre_save, sender=MyModel) def my_handler(sender, **kwargs): instance = kwargs['instance'] if instance.verified: do_something(instance) else: do_else(instance) @receiver(pre_bulk_update, sender=MyModel) def my_bulk_update_handler(sender, **kwargs): update_kwargs = kwargs['update_kwargs'] if 'verified' not in update_kwargs: # no change im interested in # no need to take any action return queryset = kwargs['queryset'] pks_to_be_updated = queryset.values_list('pk', flat=True) if update_kwargs['verified']: do_something_bulk_update_implementation(pks_to_be_updated) else: bulk_update_do_else_implementation(pks_to_be_updated) @receiver(pre_bulk_create, sender=MyModel) def my_bulk_create_handler(sender, **kwargs): objs = kwargs['objs'] group_1 = [] group_2 = [] for obj in objs: if obj.verified: group_1.append(obj) else: group_2.append(obj) if group_1: do_something_bulk_create_implementation(group_1) if group_2: bulk_create_do_else_implementation(group_2) I think this turns out to be a very clean approach. It help us use the most optimal strategy to handle the change. So I'm sharing this with the community to check your feedback. I believe if this gets into the Django Internals, it can be a very powerful tool. It will lose power as a 3rd party app. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/94257dee-e88c-41ae-ab39-c8535d8e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.