I've run into a problem with get_or_create() with respect to concurrent 
access of the DB, and I have looked at the list archives for advice.  I 
found some discussions a while back regarding other's problems but no 
acceptable solution was ever implemented.  I have another proposed 
solution that I thought I should throw out there to see if anyone like 
it better.  First, let me restate the problem:

Two threads/processes/servers (let's call them P1 and P2) need to 
concurrently create a unique object
1. P1 calls get_or_create(), which tries to get the item (it doesn't exist)
2. P2 calls get_or_create(), which tries to get the same item (it 
doesn't exist)
3. P1's get_or_create() tries to create the item (this works and returns 
the item)
4. P2's get_or_create() tries to create the item.  One of two things 
happens:
    a. a second item is created with the same parameters if this doesn't 
violate a UNIQUE constraint
    b. the second create fails (because of a UNIQUE constraint) and 
raises an exception

In the case of 4a, a future get() or get_or_create() call will assert 
because multiple values have been returned.  In the case of 4b, the 
caller will need to catch the exception and (since the exception 
probably means there was a concurrent create) most likely try to get the 
object again.

Previous proposals to address this issue involved adding either a thread 
lock or a DB table lock around the get_or_create() call.  Both of these 
are unacceptable.  The thread lock does nothing to prevent the problem 
when using multiple front-end servers, and the DB lock is just plain bad 
for performance.

It seems reasonable to require that the model be designed with 
unique_together=(...) on the fields that are used the get_or_create().  
This will allow the DB to prevent duplicates from being created.  Thus 
the only code change needed to make get_or_create() always return the 
correct object is to call get() again in the event of an exception from 
create().

Pseudo-code
-------------
def get_or_create(**kwargs):
    try:
       obj = get(**kwargs)
    except:
       try:
          obj = create(**kwargs)
       except:
          obj = get(**kwargs)
    return obj

This solution is based on the following assumptions:

1. We always want get_or_create() to return the object we're looking for.
2. MOST of the time the object will exist, so calling get() first is the 
highest performance.
3. Occasionally the object will not exist and may be created 
concurrently by multiple threads/processes/servers.  In this case the 
second get() is no more expensive than the get() the caller should have 
to make anyway when handling the exception.

This solution has not performance penalty in the "normal" case and takes 
full advantage of the DB's data integrity enforcement.

If this solution is favorable, I'll create a ticket with the patch and 
tests.

Travis


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to