On 04/25/2017 02:47 PM, Brian Bouterse wrote: > There are a few use cases and behaviors that I think only an asynchronous > update/delete for importers, > publishers, and repositories will satisfy. Thanks to @asmacdo for covering > some of these also. > > * As a plugin writer, I am in control when my code is running. The data layer > is not expected to change other > than what my code is doing or what I've asked platform to do for me. This > same reasoning applies to unit > associations too (as a similar example of this principle). With unit > associations for example, I don't think > we want new units showing up as part of a repo halfway through the publish > operation. > > * As a plugin writer, I can modify and save the importer without being > concerned that I am overwriting data > the user just updated. This is a write-write race concern. We could say in > documentation "don't do that", but > they may have good reasons to and I don't see a good reason why they can't > call save() during a sync() for > example. Consider that they are defining fields on these subclassed importers > and they want to write data into > them at sync-time.
This applies to previous use case as well. Even with update/delete being asynchronous, I don't think there is any guarantee as to what the data will be when the task actually runs in a multi-user application. There is a race condition on the task queue. Other users' updates and syncs can be queue in any order around what this user is trying to do. If this is a requirement, we may need to include a snapshot of the data in the task. > > * A user should be able to submit a [sync], [modify the importer], [sync] and > only have the update apply to > the second sync. Synchronous updates make it difficult for a user to > determine the impact an update has when > applied while sync code is running for example. > > * If users want the ability to apply an importer update apply immediately > then can cancel any running syncs > and their update will apply immediately then, so we're not limiting them in > any way. > > * Deleting data while code is expected to use that data is running will > likely create negative user > experiences. This is a rephrasing of the "don't change the data later while > plugin code is running" point from > earlier but with a twist. If the publisher is deleted while the plugin > writer's publish() is running, if > plugin code tries to read or save, the task will fail at some random point > with a traceback, which is not a > good experience. Contrast that against what you would get with an > asynchronous design which would allow a > clean PulpException to be emitted by the publish() task just before calling > the publish() code. Having the > PulpException which checks in exactly one place will be a better user > experience with a nice message instead > of a deep traceback and easier to maintain (in one place). Seems this can be handled cleanly/gracefully for synchronous delete as well. The plugin should protect updates with try/catch and raise a PulpException when the DB exception is raised. The sync would terminate immediately and the task would report that the importer had been deleted which seems reasonable. > > Thank you to everyone who is participating in this convo. We don't have > agreement, but we do have a good > dialogue going! What are you thoughts about ^? > > -Brian > > On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <aus...@redhat.com > <mailto:aus...@redhat.com>> wrote: > > I think we have arrived at a consensus around these points: > > * Django's ModelForm validations that are used during `full_clean` is an > anti-pattern in > DjangoRestFramework and should not be used > * Validation, by our definition, will be done by the serializers in the > API layer > * The Task layer should run business logic checks at the time of use > > We have not arrived at a consensus on whether Updates/Deletes for > Importers/Publishers will be synchronous > or asynchronous. I've tried to think through each way, and what it would > need to work. > > *Synchronous Update and Delete: > * > _Validation Heuristic_ > The serializers validate in the API layer only, and it is desirable to > have that validation occur at the > time of the write. > * > * > _Concurrent Writes:_ > Let's assume synchronous Updates and Deletes. There is a possibility of > concurrent writes because > sync/publish tasks write to the importer/publisher tables respectively > and these tables can also be > written by the Update API at any time. If there is a collision the last > write wins. > > Workaround: > Concurrent writes may not be a big deal because they should be not affect > each other. In platform, sync > and publish only write to the last_sync/last_publish fields, which should > never be written by the update > API. If the writes are restricted to only the affected fields, then there > would be no clobbering. As long > as plugin writers' implementations of sync and publish do not write to > the same fields allowed in an API > update, this concurrency is safe. We would obviously need to be clearly > documented for plugin writers. > > _Dealing With Broken Configurations_ > For example, while a sync task is WAITING, the user updates an importer > in a way that breaks sync. The > importer configuration should be read only once at the start of a sync. > If this configuration has been > deleted or is not syncable, business logic checks catch this and fail > immediately. Even if the > configuration is changed during the sync task, the sync task will > continue with the original > configuration. The sync task will update last_sync, but will not write to > other fields. I don't see a > problem here, as long as this is documented for the plugin writer. > > _Summary_: > I don't see any concurrency problems here. The downside of this design is > that sync and publish, > (implemented by plugin writers), should not write to the same fields as > API updates. > > * > * > *Asynchronous Update and Delete* > > _Validation Heuristic_ > Assuming async updates, the data comes into the API layer where it is > validated by the serializers. The > update reserves a task (reservation is based on repository) and waits to > be performed. It is usually > undesirable to separate validation from use. There could be concern that > data validated in the API layer > could become unacceptable by the time the Task layer actually performs > the write. > > Practically, this separation may not be an issue. Valid data becoming > invalid involves validations that > are dependent on state. We can split these validations into two > categories, uniqueness validations and > multi-field validations. Fortunately, the Django ORM will enforce > uniqueness validation, so the database > is protected, but we might have to handle write failures. Multi-field > validations are probably not > validations by our definition, but are business logic, which could be > checked in the Task layer. > _ > _ > _Concurrent write prevention_ > Leveraging the reserved resource system, concurrent writes could not be a > problem because writes could > never be concurrent at all. Syncs, Updates, Deletes, Publishes all make > reservations based on the > repository (this is an assumption based on what we do in Pulp 2), so each > would run in the order that they > were called. > > _User Controlled Order_ > A user can issue a [sync][update][sync] and be confident that the first > sync will use the original > importer configuration, and the second sync will use the updated > configuration. > > _Uncertain State_ > The configuration of an importer is unpredictable if there is a waiting > update task from another user. > > > > > > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com> > https://www.redhat.com/mailman/listinfo/pulp-dev > <https://www.redhat.com/mailman/listinfo/pulp-dev> > > > > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev