Thanks for diving on this grenade, Alex! FWIW, I agree with all of your assessments. Just in case I am mistaken, I summarize them as smaller updates > logical clocks > wall clocks (due to imprecision and skew).
Given the small size of your patch [4], I'd say lets try to land that. It is nicer to solve this problem with software rather than with db schema if that is possible. On Thu, Sep 25, 2014 at 9:21 AM, Alexander Tivelkov <ativel...@mirantis.com> wrote: > Hi folks! > > There is a serious issue [0] in the v2 API of Glance which may lead to > race conditions during the concurrent updates of Images' metadata. > It can be fixed in a number of ways, but we need to have some solution > soon, as we are approaching rc1 release, and the race in image updates > looks like a serious problem which has to be fixed in J, imho. > > A quick description of the problem: > When the image-update is called (PUT /v2/images/%image_id%/) we get the > image from the repository, which fetches a record from the DB and forms its > content into an Image Domain Object ([1]), which is then modified (has its > attributes updated) and passed through all the layers of our domain model. > This object is not managed by the SQLAlchemy's session, so the > modifications of its attributes are not tracked anywhere. > When all the processing is done and the updated object is passed back to > the DB repository, it serializes all the attributes of the image into a > dict ([2]) and then this dict is used to create an UPDATE query for the > database. > As this serialization includes all the attribute of the object (rather > then only the modified ones), the update query updates all the columns of > the appropriate database row, putting there the values which were > originally fetched when the processing began. This may obviously overwrite > the values which could be written there by some other concurent request. > > There are two possible solutions to fix this problem. > First, known as the optimistic concurrency control, checks if the > appropriate database row was modified between the data fetching and data > updates. In case of such modification the update operation reports a > "conflict" and fails (and may be retried based on the updated data if > needed). Modification detection is usually based on the timstamps, i.e. the > query updates the row in database only if the timestamp there matches the > timestamp of initially fetched data. > I've introduced this approach in this patch [3], however it has a major > flaw: I used the 'updated_at' attribute as a timestamp, and this attribute > is mapped to a DateTime-typed column. In many RDBMS's (including > MySql<5.6.4) this column stores values with per-second precision and does > not store fractions of seconds. So, even if patch [3] is merged the race > conditions may still occur if there are many updates happening at the same > moment of time. > A better approach would be to add a new column with int (or longint) type > to store millisecond-based (or even microsecond-based) timestamps instead > of (or additionally to) date-time based updated_at. But data model > modification will require to add new migration etc, which is a major step > and I don't know if we want to make it so close to the release. > > The second solution is to keep track of the changed attributes and > properties for the image and do not include the unchanged ones into the > UPDATE query, so nothing gets overwritten. This dramatically reduces the > threat of races, as the updates of different properties do not interfere > with each other. Also this is a usefull change regardless of the race > itself: being able to differentiate between changed and unchanged > attributes may have its own value for other purposes; the DB performance > will also be better when updating just the needed fields instead of all of > them. > I've submitted a patch with this approach as well [4], but it still breaks > some unittests and I am working to fix them right now. > > So, we need to decide which of these approaches (or their combination) to > take: we may stick with optimistic locking on timestamp (and then decide if > we are ok with a per-second timestamps or we need to add a new column), > choose to track state of attributes or combine them together. So, could you > folks please review patches [3] and [4] and come up with some ideas on them? > > Also, probably we should consider targeting [0] to juno-rc1 milestone to > make sure that this bug is fixed in J. Do you guys think it is possible at > this stage? > > Thanks! > > > [0] https://bugs.launchpad.net/glance/+bug/1371728 > [1] > https://github.com/openstack/glance/blob/master/glance/db/__init__.py#L74 > [2] > https://github.com/openstack/glance/blob/master/glance/db/__init__.py#L169 > [3] https://review.openstack.org/#/c/122814/ > [4] https://review.openstack.org/#/c/123722/ > > -- > Regards, > Alexander Tivelkov > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev