On 04/25/2017 02:47 PM, Brian Bouterse wrote:
> There are a few use cases and behaviors that I think only an asynchronous 
> update/delete for importers,
> publishers, and repositories will satisfy. Thanks to @asmacdo for covering 
> some of these also.
> 
> * As a plugin writer, I am in control when my code is running. The data layer 
> is not expected to change other
> than what my code is doing or what I've asked platform to do for me. This 
> same reasoning applies to unit
> associations too (as a similar example of this principle). With unit 
> associations for example, I don't think
> we want new units showing up as part of a repo halfway through the publish 
> operation.
> 
> * As a plugin writer, I can modify and save the importer without being 
> concerned that I am overwriting data
> the user just updated. This is a write-write race concern. We could say in 
> documentation "don't do that", but
> they may have good reasons to and I don't see a good reason why they can't 
> call save() during a sync() for
> example. Consider that they are defining fields on these subclassed importers 
> and they want to write data into
> them at sync-time.

This applies to previous use case as well.

Even with update/delete being asynchronous, I don't think there is any 
guarantee as to what the data will be
when the task actually runs in a multi-user application.  There is a race 
condition on the task queue.  Other
users' updates and syncs can be queue in any order around what this user is 
trying to do.  If this is a
requirement, we may need to include a snapshot of the data in the task.

> 
> * A user should be able to submit a [sync], [modify the importer], [sync] and 
> only have the update apply to
> the second sync. Synchronous updates make it difficult for a user to 
> determine the impact an update has when
> applied while sync code is running for example.
> 
> * If users want the ability to apply an importer update apply immediately 
> then can cancel any running syncs
> and their update will apply immediately then, so we're not limiting them in 
> any way.
> 
> * Deleting data while code is expected to use that data is running will 
> likely create negative user
> experiences. This is a rephrasing of the "don't change the data later while 
> plugin code is running" point from
> earlier but with a twist. If the publisher is deleted while the plugin 
> writer's publish() is running, if
> plugin code tries to read or save, the task will fail at some random point 
> with a traceback, which is not a
> good experience. Contrast that against what you would get with an 
> asynchronous design which would allow a
> clean PulpException to be emitted by the publish() task just before calling 
> the publish() code. Having the
> PulpException which checks in exactly one place will be a better user 
> experience with a nice message instead
> of a deep traceback and easier to maintain (in one place).

Seems this can be handled cleanly/gracefully for synchronous delete as well.  
The plugin should protect
updates with try/catch and raise a PulpException when the DB exception is 
raised.  The sync would terminate
immediately and the task would report that the importer had been deleted which 
seems reasonable.

> 
> Thank you to everyone who is participating in this convo. We don't have 
> agreement, but we do have a good
> dialogue going! What are you thoughts about ^?
> 
> -Brian
> 
> On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <aus...@redhat.com 
> <mailto:aus...@redhat.com>> wrote:
> 
>     I think we have arrived at a consensus around these points:
> 
>     * Django's ModelForm validations that are used during `full_clean` is an 
> anti-pattern in
>     DjangoRestFramework and should not be used
>     * Validation, by our definition, will be done by the serializers in the 
> API layer
>     * The Task layer should run business logic checks at the time of use
> 
>     We have not arrived at a consensus on whether Updates/Deletes for 
> Importers/Publishers will be synchronous
>     or asynchronous. I've tried to think through each way, and what it would 
> need to work.
> 
>     *Synchronous Update and Delete:
>     *
>     _Validation Heuristic_
>     The serializers validate in the API layer only, and it is desirable to 
> have that validation occur at the
>     time of the write. 
>     *
>     *
>     _Concurrent Writes:_
>     Let's assume synchronous Updates and Deletes. There is a possibility of 
> concurrent writes because
>     sync/publish tasks write to the importer/publisher tables respectively 
> and these tables can also be
>     written by the Update API at any time. If there is a collision the last 
> write wins. 
> 
>     Workaround:
>     Concurrent writes may not be a big deal because they should be not affect 
> each other. In platform, sync
>     and publish only write to the last_sync/last_publish fields, which should 
> never be written by the update
>     API. If the writes are restricted to only the affected fields, then there 
> would be no clobbering. As long
>     as plugin writers' implementations of sync and publish do not write to 
> the same fields allowed in an API
>     update, this concurrency is safe. We would obviously need to be clearly 
> documented for plugin writers.
> 
>     _Dealing With Broken Configurations_
>     For example, while a sync task is WAITING, the user updates an importer 
> in a way that breaks sync. The
>     importer configuration should be read only once at the start of a sync. 
> If this configuration has been
>     deleted or is not syncable, business logic checks catch this and fail 
> immediately. Even if the
>     configuration is changed during the sync task, the sync task will 
> continue with the original
>     configuration. The sync task will update last_sync, but will not write to 
> other fields. I don't see a
>     problem here, as long as this is documented for the plugin writer.
> 
>     _Summary_:
>     I don't see any concurrency problems here. The downside of this design is 
> that sync and publish,
>     (implemented by plugin writers), should not write to the same fields as 
> API updates.
> 
>     *
>     *
>     *Asynchronous Update and Delete*
> 
>     _Validation Heuristic_
>     Assuming async updates, the data comes into the API layer where it is 
> validated by the serializers. The
>     update reserves a task (reservation is based on repository) and waits to 
> be performed. It is usually
>     undesirable to separate validation from use. There could be concern that 
> data validated in the API layer
>     could become unacceptable by the time the Task layer actually performs 
> the write.
> 
>     Practically, this separation may not be an issue. Valid data becoming 
> invalid involves validations that
>     are dependent on state. We can split these validations into two 
> categories, uniqueness validations and
>     multi-field validations. Fortunately, the Django ORM will enforce 
> uniqueness validation, so the database
>     is protected, but we might have to handle write failures. Multi-field 
> validations are probably not
>     validations by our definition, but are business logic, which could be 
> checked in the Task layer.
>     _
>     _
>     _Concurrent write prevention_
>     Leveraging the reserved resource system, concurrent writes could not be a 
> problem because writes could
>     never be concurrent at all. Syncs, Updates, Deletes, Publishes all make 
> reservations based on the
>     repository (this is an assumption based on what we do in Pulp 2), so each 
> would run in the order that they
>     were called. 
> 
>     _User Controlled Order_
>     A user can issue a [sync][update][sync] and be confident that the first 
> sync will use the original
>     importer configuration, and the second sync will use the updated 
> configuration. 
> 
>     _Uncertain State_
>     The configuration of an importer is unpredictable if there is a waiting 
> update task from another user.
> 
> 
> 
> 
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to