I've run into a problem with get_or_create() with respect to concurrent access of the DB, and I have looked at the list archives for advice. I found some discussions a while back regarding other's problems but no acceptable solution was ever implemented. I have another proposed solution that I thought I should throw out there to see if anyone like it better. First, let me restate the problem:
Two threads/processes/servers (let's call them P1 and P2) need to concurrently create a unique object 1. P1 calls get_or_create(), which tries to get the item (it doesn't exist) 2. P2 calls get_or_create(), which tries to get the same item (it doesn't exist) 3. P1's get_or_create() tries to create the item (this works and returns the item) 4. P2's get_or_create() tries to create the item. One of two things happens: a. a second item is created with the same parameters if this doesn't violate a UNIQUE constraint b. the second create fails (because of a UNIQUE constraint) and raises an exception In the case of 4a, a future get() or get_or_create() call will assert because multiple values have been returned. In the case of 4b, the caller will need to catch the exception and (since the exception probably means there was a concurrent create) most likely try to get the object again. Previous proposals to address this issue involved adding either a thread lock or a DB table lock around the get_or_create() call. Both of these are unacceptable. The thread lock does nothing to prevent the problem when using multiple front-end servers, and the DB lock is just plain bad for performance. It seems reasonable to require that the model be designed with unique_together=(...) on the fields that are used the get_or_create(). This will allow the DB to prevent duplicates from being created. Thus the only code change needed to make get_or_create() always return the correct object is to call get() again in the event of an exception from create(). Pseudo-code ------------- def get_or_create(**kwargs): try: obj = get(**kwargs) except: try: obj = create(**kwargs) except: obj = get(**kwargs) return obj This solution is based on the following assumptions: 1. We always want get_or_create() to return the object we're looking for. 2. MOST of the time the object will exist, so calling get() first is the highest performance. 3. Occasionally the object will not exist and may be created concurrently by multiple threads/processes/servers. In this case the second get() is no more expensive than the get() the caller should have to make anyway when handling the exception. This solution has not performance penalty in the "normal" case and takes full advantage of the DB's data integrity enforcement. If this solution is favorable, I'll create a ticket with the patch and tests. Travis --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---