Re: Adding signals to bulk update/create operations

2017-11-05 Thread gregshipssoftware
@Anssi, Perhaps we could use your approach above, except using rowids 
instead of primary keys? Many SQL flavors support retrieving the rowids of 
the created/updated rows within the same query as a bulk create/update 
(e.g. OUTPUT or RETURNING).

Greg

On Thursday, April 6, 2017 at 12:53:55 AM UTC-5, Anssi Kääriäinen wrote:
>
> On Friday, March 31, 2017 at 10:50:54 AM UTC+3, Todor Velichkov wrote:
>>
>> @Anssi, thank you for your response.
>> I really haven't think about threat safety, but maybe its because in our 
>> specific usage it's close to impossible to happen.
>>
>> What do you think about this:
>> 1) Put everything into a transaction
>> 2) Before update, count the queryset.
>> 3) Fire the pre_update with the queryset, and the count.
>> 4) Do the actual update -> get the updated rows
>> 5) Assert the updated rows is equal to the queryset.count().
>> 6) Revert the transaction if there is a difference.
>>
>> No pk fetching, this is left to the be implemented by the listener if he 
>> needs it.
>>
>
> I'm afraid this won't work well enough. It does work for many use cases, 
> but in cases where concurrency is an issue, you'd get some loud failures, 
> and worse, you could get cases where things work silently wrong (one item 
> removed, one added concurrently). These cases would be near-impossible to 
> debug.
>
> The reason why we don't have update signals is that it's hard to find a 
> solution that allows users to run actions when an instance is updated, 
> works correctly in concurrent cases and with large amount of objects and 
> doesn't turn bulk update in to a loop of save() calls.
>
> Maybe the best approach is to look for a solution where users can 
> customise update behaviour for remote models if they so wish. Then you 
> could write your own update signals solution, one that doesn't need to care 
> about all of the above constraints.
>
>  - Anssi
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/08303c25-96a8-4aa0-b325-ef5dc91ed8d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding signals to bulk update/create operations

2017-04-05 Thread Anssi Kääriäinen
On Friday, March 31, 2017 at 10:50:54 AM UTC+3, Todor Velichkov wrote:
>
> @Anssi, thank you for your response.
> I really haven't think about threat safety, but maybe its because in our 
> specific usage it's close to impossible to happen.
>
> What do you think about this:
> 1) Put everything into a transaction
> 2) Before update, count the queryset.
> 3) Fire the pre_update with the queryset, and the count.
> 4) Do the actual update -> get the updated rows
> 5) Assert the updated rows is equal to the queryset.count().
> 6) Revert the transaction if there is a difference.
>
> No pk fetching, this is left to the be implemented by the listener if he 
> needs it.
>

I'm afraid this won't work well enough. It does work for many use cases, 
but in cases where concurrency is an issue, you'd get some loud failures, 
and worse, you could get cases where things work silently wrong (one item 
removed, one added concurrently). These cases would be near-impossible to 
debug.

The reason why we don't have update signals is that it's hard to find a 
solution that allows users to run actions when an instance is updated, 
works correctly in concurrent cases and with large amount of objects and 
doesn't turn bulk update in to a loop of save() calls.

Maybe the best approach is to look for a solution where users can customise 
update behaviour for remote models if they so wish. Then you could write 
your own update signals solution, one that doesn't need to care about all 
of the above constraints.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fd81d210-a2d6-4e7a-9eb0-d3f8f078a032%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Adding signals to bulk update/create operations

2017-03-31 Thread Todor Velichkov
@Tim, sorry about that, I did a search before I posted, but it looks like 
it slipped away somehow.

@Anssi, thank you for your response.
I really haven't think about threat safety, but maybe its because in our 
specific usage it's close to impossible to happen.

What do you think about this:
1) Put everything into a transaction
2) Before update, count the queryset.
3) Fire the pre_update with the queryset, and the count.
4) Do the actual update -> get the updated rows
5) Assert the updated rows is equal to the queryset.count().
6) Revert the transaction if there is a difference.

No pk fetching, this is left to the be implemented by the listener if he 
needs it.

On Friday, March 31, 2017 at 8:01:45 AM UTC+3, Anssi Kääriäinen wrote:
>
> The problem with passing the queryset is that it's possible that some 
> object is added to or removed from the queryset between the pre_update and 
> actual update execution. To avoid this the execution should go somewhere 
> along the lines of:
>1) if there is pre_update or post_update do stages 2-5, if not, update 
> as with current code
>2) fetch primary keys for models to be updated to a set, with 
> .select_for_update() applied
>3) fire pre_update, give the primary key set as argument
>4) do the update against a queryset with .filter(pk__in=pk_set)
>5) fire post_update with primary key set as argument
>
> This way the pre and post update signals will execute against a fixed set 
> of instances. The bad part is that this can significantly slow down the 
> .update() call, but to get actually safe signals, I don't see a way around 
> that.
>
>  - Anssi
>
> On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote:
>>
>> There's an accepted ticket about adding pre_update and post_update 
>> signals: https://code.djangoproject.com/ticket/21461. From a quick 
>> glance, I think this is what you're proposing.
>>
>> On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote:
>>>
>>> Consider the following piece of code:
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>> instance = kwargs['instance']
>>> if instance.verified:
>>> do_something(instance)
>>> else:
>>> do_else(instance)
>>>
>>>
>>>
>>> Its good, because it keeps `MyModel` decoupled from `do_something`
>>>
>>> But there is one flaw. If we do:
>>>
>>> MyModel.objects.filter(verified=False).update(verified=True)
>>>
>>> we are screwed, `do_something` is not executed, and our models get 
>>> out-of-sync.
>>>
>>> If we try to get smart and manually fire the pre_save signal for each 
>>> instance, we are gonna have a hard time.
>>> Its gonna be slow.
>>> And its gonna be memory inefficient.
>>>
>>> We already experienced it in our app.
>>>
>>> So our new approach is like this:
>>>
>>> pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"
>>> ])
>>> post_bulk_update = Signal(providing_args=["update_kwargs",])
>>>
>>> pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>> post_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>>
>>>
>>> class MyModelQuerySet(models.QuerySet):
>>> def update(self, **kwargs):
>>> pre_bulk_update.send(sender=self.model, queryset=self, 
>>> update_kwargs=kwargs)
>>> res = super(MyModelQuerySet, self).update(**kwargs)
>>> # The queryset will be altered after the update call
>>> # so no reason to send it.
>>> post_bulk_update.send(sender=self.model, update_kwargs=
>>> kwargs)
>>> return res
>>>
>>> def bulk_create(self, objs, batch_size=None):
>>> pre_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>> res = super(MyModelQuerySet, self).bulk_create(objs, 
>>> batch_size)
>>> post_bulk_create.send(sender=self.model, objs=objs, 
>>> batch_size=batch_size)
>>> return res
>>>
>>>
>>> class MyModel(models.Model):
>>> #...
>>> objects = MyModelQuerySet.as_manager()
>>>
>>>
>>>
>>> This gives us a nice interface to handle all kind of changes regarding 
>>> `MyModel`
>>> Our example usage looks like this:
>>>
>>>
>>> @receiver(pre_save, sender=MyModel)
>>> def my_handler(sender, **kwargs):
>>> instance = kwargs['instance']
>>> if instance.verified:
>>> do_something(instance)
>>> else:
>>> do_else(instance)
>>>
>>>
>>> @receiver(pre_bulk_update, sender=MyModel)
>>> def my_bulk_update_handler(sender, **kwargs):
>>> update_kwargs = kwargs['update_kwargs']
>>> if 'verified' not in update_kwargs:
>>> # no change im interested in
>>> # no need to take any action
>>> return
>>>
>>> queryset = kwargs['queryset']
>>> pks_to_be_updated = queryset.values_list('pk', flat=True)
>>> if update_kwargs['verified']:
>>> do_something_bulk_update_implementation(pks_to_be_updated)
>>> else:
>>>

Re: Adding signals to bulk update/create operations

2017-03-30 Thread Anssi Kääriäinen
The problem with passing the queryset is that it's possible that some 
object is added to or removed from the queryset between the pre_update and 
actual update execution. To avoid this the execution should go somewhere 
along the lines of:
   1) if there is pre_update or post_update do stages 2-5, if not, update 
as with current code
   2) fetch primary keys for models to be updated to a set, with 
.select_for_update() applied
   3) fire pre_update, give the primary key set as argument
   4) do the update against a queryset with .filter(pk__in=pk_set)
   5) fire post_update with primary key set as argument

This way the pre and post update signals will execute against a fixed set 
of instances. The bad part is that this can significantly slow down the 
.update() call, but to get actually safe signals, I don't see a way around 
that.

 - Anssi

On Friday, March 31, 2017 at 3:51:34 AM UTC+3, Tim Graham wrote:
>
> There's an accepted ticket about adding pre_update and post_update 
> signals: https://code.djangoproject.com/ticket/21461. From a quick 
> glance, I think this is what you're proposing.
>
> On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote:
>>
>> Consider the following piece of code:
>>
>> @receiver(pre_save, sender=MyModel)
>> def my_handler(sender, **kwargs):
>> instance = kwargs['instance']
>> if instance.verified:
>> do_something(instance)
>> else:
>> do_else(instance)
>>
>>
>>
>> Its good, because it keeps `MyModel` decoupled from `do_something`
>>
>> But there is one flaw. If we do:
>>
>> MyModel.objects.filter(verified=False).update(verified=True)
>>
>> we are screwed, `do_something` is not executed, and our models get 
>> out-of-sync.
>>
>> If we try to get smart and manually fire the pre_save signal for each 
>> instance, we are gonna have a hard time.
>> Its gonna be slow.
>> And its gonna be memory inefficient.
>>
>> We already experienced it in our app.
>>
>> So our new approach is like this:
>>
>> pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"
>> ])
>> post_bulk_update = Signal(providing_args=["update_kwargs",])
>>
>> pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
>> post_bulk_create = Signal(providing_args=["objs", "batch_size"])
>>
>>
>> class MyModelQuerySet(models.QuerySet):
>> def update(self, **kwargs):
>> pre_bulk_update.send(sender=self.model, queryset=self, 
>> update_kwargs=kwargs)
>> res = super(MyModelQuerySet, self).update(**kwargs)
>> # The queryset will be altered after the update call
>> # so no reason to send it.
>> post_bulk_update.send(sender=self.model, update_kwargs=kwargs
>> )
>> return res
>>
>> def bulk_create(self, objs, batch_size=None):
>> pre_bulk_create.send(sender=self.model, objs=objs, batch_size
>> =batch_size)
>> res = super(MyModelQuerySet, self).bulk_create(objs, 
>> batch_size)
>> post_bulk_create.send(sender=self.model, objs=objs, 
>> batch_size=batch_size)
>> return res
>>
>>
>> class MyModel(models.Model):
>> #...
>> objects = MyModelQuerySet.as_manager()
>>
>>
>>
>> This gives us a nice interface to handle all kind of changes regarding 
>> `MyModel`
>> Our example usage looks like this:
>>
>>
>> @receiver(pre_save, sender=MyModel)
>> def my_handler(sender, **kwargs):
>> instance = kwargs['instance']
>> if instance.verified:
>> do_something(instance)
>> else:
>> do_else(instance)
>>
>>
>> @receiver(pre_bulk_update, sender=MyModel)
>> def my_bulk_update_handler(sender, **kwargs):
>> update_kwargs = kwargs['update_kwargs']
>> if 'verified' not in update_kwargs:
>> # no change im interested in
>> # no need to take any action
>> return
>>
>> queryset = kwargs['queryset']
>> pks_to_be_updated = queryset.values_list('pk', flat=True)
>> if update_kwargs['verified']:
>> do_something_bulk_update_implementation(pks_to_be_updated)
>> else:
>> bulk_update_do_else_implementation(pks_to_be_updated)
>>
>>
>> @receiver(pre_bulk_create, sender=MyModel)
>> def my_bulk_create_handler(sender, **kwargs):
>> objs = kwargs['objs']
>> group_1 = []
>> group_2 = []
>> for obj in objs:
>> if obj.verified:
>> group_1.append(obj)
>> else:
>> group_2.append(obj)
>>
>> if group_1:
>> do_something_bulk_create_implementation(group_1)
>> if group_2:
>> bulk_create_do_else_implementation(group_2)
>>
>>
>>
>> I think this turns out to be a very clean approach.
>> It help us use the most optimal strategy to handle the change.
>> So I'm sharing this with the community to check your feedback.
>> I believe if this gets into the Django Internals, it can be a very 
>> powerful tool.
>> It will lose power as a 3rd party app.
>>
>

-- 
You received this message 

Re: Adding signals to bulk update/create operations

2017-03-30 Thread Tim Graham
There's an accepted ticket about adding pre_update and post_update signals: 
https://code.djangoproject.com/ticket/21461. From a quick glance, I think 
this is what you're proposing.

On Thursday, March 30, 2017 at 4:28:00 PM UTC-4, Todor Velichkov wrote:
>
> Consider the following piece of code:
>
> @receiver(pre_save, sender=MyModel)
> def my_handler(sender, **kwargs):
> instance = kwargs['instance']
> if instance.verified:
> do_something(instance)
> else:
> do_else(instance)
>
>
>
> Its good, because it keeps `MyModel` decoupled from `do_something`
>
> But there is one flaw. If we do:
>
> MyModel.objects.filter(verified=False).update(verified=True)
>
> we are screwed, `do_something` is not executed, and our models get 
> out-of-sync.
>
> If we try to get smart and manually fire the pre_save signal for each 
> instance, we are gonna have a hard time.
> Its gonna be slow.
> And its gonna be memory inefficient.
>
> We already experienced it in our app.
>
> So our new approach is like this:
>
> pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"])
> post_bulk_update = Signal(providing_args=["update_kwargs",])
>
> pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
> post_bulk_create = Signal(providing_args=["objs", "batch_size"])
>
>
> class MyModelQuerySet(models.QuerySet):
> def update(self, **kwargs):
> pre_bulk_update.send(sender=self.model, queryset=self, 
> update_kwargs=kwargs)
> res = super(MyModelQuerySet, self).update(**kwargs)
> # The queryset will be altered after the update call
> # so no reason to send it.
> post_bulk_update.send(sender=self.model, update_kwargs=kwargs)
> return res
>
> def bulk_create(self, objs, batch_size=None):
> pre_bulk_create.send(sender=self.model, objs=objs, batch_size=
> batch_size)
> res = super(MyModelQuerySet, self).bulk_create(objs, 
> batch_size)
> post_bulk_create.send(sender=self.model, objs=objs, batch_size
> =batch_size)
> return res
>
>
> class MyModel(models.Model):
> #...
> objects = MyModelQuerySet.as_manager()
>
>
>
> This gives us a nice interface to handle all kind of changes regarding 
> `MyModel`
> Our example usage looks like this:
>
>
> @receiver(pre_save, sender=MyModel)
> def my_handler(sender, **kwargs):
> instance = kwargs['instance']
> if instance.verified:
> do_something(instance)
> else:
> do_else(instance)
>
>
> @receiver(pre_bulk_update, sender=MyModel)
> def my_bulk_update_handler(sender, **kwargs):
> update_kwargs = kwargs['update_kwargs']
> if 'verified' not in update_kwargs:
> # no change im interested in
> # no need to take any action
> return
>
> queryset = kwargs['queryset']
> pks_to_be_updated = queryset.values_list('pk', flat=True)
> if update_kwargs['verified']:
> do_something_bulk_update_implementation(pks_to_be_updated)
> else:
> bulk_update_do_else_implementation(pks_to_be_updated)
>
>
> @receiver(pre_bulk_create, sender=MyModel)
> def my_bulk_create_handler(sender, **kwargs):
> objs = kwargs['objs']
> group_1 = []
> group_2 = []
> for obj in objs:
> if obj.verified:
> group_1.append(obj)
> else:
> group_2.append(obj)
>
> if group_1:
> do_something_bulk_create_implementation(group_1)
> if group_2:
> bulk_create_do_else_implementation(group_2)
>
>
>
> I think this turns out to be a very clean approach.
> It help us use the most optimal strategy to handle the change.
> So I'm sharing this with the community to check your feedback.
> I believe if this gets into the Django Internals, it can be a very 
> powerful tool.
> It will lose power as a 3rd party app.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/81796184-84b8-4954-900a-2bf9ce965fe8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Adding signals to bulk update/create operations

2017-03-30 Thread Todor Velichkov
Consider the following piece of code:

@receiver(pre_save, sender=MyModel)
def my_handler(sender, **kwargs):
instance = kwargs['instance']
if instance.verified:
do_something(instance)
else:
do_else(instance)



Its good, because it keeps `MyModel` decoupled from `do_something`

But there is one flaw. If we do:

MyModel.objects.filter(verified=False).update(verified=True)

we are screwed, `do_something` is not executed, and our models get 
out-of-sync.

If we try to get smart and manually fire the pre_save signal for each 
instance, we are gonna have a hard time.
Its gonna be slow.
And its gonna be memory inefficient.

We already experienced it in our app.

So our new approach is like this:

pre_bulk_update = Signal(providing_args=["queryset", "update_kwargs"])
post_bulk_update = Signal(providing_args=["update_kwargs",])

pre_bulk_create = Signal(providing_args=["objs", "batch_size"])
post_bulk_create = Signal(providing_args=["objs", "batch_size"])


class MyModelQuerySet(models.QuerySet):
def update(self, **kwargs):
pre_bulk_update.send(sender=self.model, queryset=self, 
update_kwargs=kwargs)
res = super(MyModelQuerySet, self).update(**kwargs)
# The queryset will be altered after the update call
# so no reason to send it.
post_bulk_update.send(sender=self.model, update_kwargs=kwargs)
return res

def bulk_create(self, objs, batch_size=None):
pre_bulk_create.send(sender=self.model, objs=objs, batch_size=
batch_size)
res = super(MyModelQuerySet, self).bulk_create(objs, batch_size)
post_bulk_create.send(sender=self.model, objs=objs, batch_size=
batch_size)
return res


class MyModel(models.Model):
#...
objects = MyModelQuerySet.as_manager()



This gives us a nice interface to handle all kind of changes regarding 
`MyModel`
Our example usage looks like this:


@receiver(pre_save, sender=MyModel)
def my_handler(sender, **kwargs):
instance = kwargs['instance']
if instance.verified:
do_something(instance)
else:
do_else(instance)


@receiver(pre_bulk_update, sender=MyModel)
def my_bulk_update_handler(sender, **kwargs):
update_kwargs = kwargs['update_kwargs']
if 'verified' not in update_kwargs:
# no change im interested in
# no need to take any action
return

queryset = kwargs['queryset']
pks_to_be_updated = queryset.values_list('pk', flat=True)
if update_kwargs['verified']:
do_something_bulk_update_implementation(pks_to_be_updated)
else:
bulk_update_do_else_implementation(pks_to_be_updated)


@receiver(pre_bulk_create, sender=MyModel)
def my_bulk_create_handler(sender, **kwargs):
objs = kwargs['objs']
group_1 = []
group_2 = []
for obj in objs:
if obj.verified:
group_1.append(obj)
else:
group_2.append(obj)

if group_1:
do_something_bulk_create_implementation(group_1)
if group_2:
bulk_create_do_else_implementation(group_2)



I think this turns out to be a very clean approach.
It help us use the most optimal strategy to handle the change.
So I'm sharing this with the community to check your feedback.
I believe if this gets into the Django Internals, it can be a very powerful 
tool.
It will lose power as a 3rd party app.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/94257dee-e88c-41ae-ab39-c8535d8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.