On Thu, 2007-03-29 at 05:29 +0000, John Penix wrote: > I think I saw a get_or_create race condition today from concurrent > runs of our data uploader that uses the model API. Ouch. The docs > have several references to the api calls being atomic - now I'm > thinking get_or_create is an exception. And I'm guessing lots of > other people already know this.
You're right. The get_or_create() call isn't atomic, because it isn't a single database call and we cannot assume that the database layer has transactions (because it's not universally true). The window of opportunity for a problem is about five python instructions. > So, assuming it's not atomic (by default) is there a way to make it > safe other than using the django middleware layer to get > transactions? Like a flag... or a db schema tweak.... You could put a unique_together attribute in your model (part of the Meta class). Include in it the columns that are involved in determining when one instance differs from the next. Django's manage.py translates unique_together into database table constraints, so you won't be able to create multiple instances of the same type. I suspect if you do this, you will see IntegrityError raised out of get_or_create() when a conflict occurs, so be prepared to handle that. There is a ticket waiting to be fixed to make IntegrityError database-neutral because at the moment you have to catch MySQLdb.IntegrityError or whatever. I'm going to do a run through those sorts of tickets on the weekend, so that little item will be smoothed over then. So my answer to your question ends here. However, if you really care about the gory details of why this isn't trivial, a little bit of data modelling theory... The root problem here is that this is actually a difficult problem at the database level, too. By default, Django uses a surrogate primary key for models (the automatically generated id value), so there's no constraint present about what constitutes a "unique" item. The fact that you present the same fields to be saved more than once doesn't really carry any information about whether they are the same or different. The get_or_create() utility method makes the assumption that "same fields implies same object", but that's not enforced by the database table constraints. If it was truly the correct assumption, the model should technically have a Meta.unique_together attribute specifying every field in the model. That would translate into a database constraint as well and attempts to create multiple objects with the same fields would raise an IntegrityError in get_or_create(). The problem is that it's a bit heavy-handed -- a potentially large constraint for the database to check each time -- and it's overkill in the sense that often a much smaller set of fields determines uniqueness. The way to solve this as taught in Database Theory 101 is to have a genuine primary key in your data model: something that you can point to and say "this is what makes it unique" and then have that constraint enforced at the database level. There are two problems that make this a little tricky in Django as it is today. The first one is that we don't have proper validation for custom primary keys available -- you should be able to say "check that this primary key field is unique" and not only do we give you back a True or False answer, but in the True case, it should *remain* true until you save the model. So Django needs to actually make a temporary save of the model. That's a little tricky to implement, but not impossible. I've been putting a lot of thought into that recently, because it crops up in a number of different disguises. We'll have that one solved before 1.0 and hopefully long before then. The other problem we would have to solve for truly generically correct support at the database level would be multi-column primary keys. That is not too difficult to do. The real stumbling block is that there's no good way that anybody has come up with to use such models in the admin interface. We use the primary key as part of the URL in the admin interface and primary keys can contain arbitrary characters. So you can't just concatenate the two keys -- no way to tell where one ends and the other starts -- and you can't use a special marker, because that marker could occur in either or both keys, so you'd have to escape every possible occurrence of it, reducing the readability of the URLs quite dramatically in the normal case (unless the marker was a truly weird character). If somebody can come up with a URL addressing scheme for multi-column (more than one and not just two) primary keys that is backwards compatible with our current scheme, the rest is not too painful. It seems like a small item, but it's trickier than it looks. Regards, Malcolm --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---