Re: [Pulp-dev] 3.0 Validation update

Jeff Ortel Mon, 17 Apr 2017 12:10:07 -0700


On 04/14/2017 10:23 AM, Brian Bouterse wrote:
> Did we ever decide on if delete and modify operations on an importer would be 
> synchronous API operations or
> tasking system tasks?
> 
> From what I'm reading, there isn't a MVP need (and maybe ever) to verify the 
> feed is set at the sync API call
> and in the task. We definitely need to do that business logic, i.e. verify 
> the feed != None, in the task for
> the MVP. If we don't need to do it in two places now, then I think doing it 
> procedurally in the sync task here
> [0] would be good enough. That means we would not add it to the model as a 
> method. I think this business logic
> in the sync task would be:
> 
> ```
> if importer.feed is None:
>     raise ValueError("The 'feed' attribute on the importer cannot be none 
> when syncing")  # <--- could should
> probably be a descendent of PulpException since it's a predictable/common 
> error
> ```


+1

> 
> [0]: 
> https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30
> 
> -Brian
> 
> 
> 
> On Wed, Apr 12, 2017 at 10:11 AM, Jeff Ortel <jor...@redhat.com 
> <mailto:jor...@redhat.com>> wrote:
> 
> 
> 
>     On 04/11/2017 04:08 PM, Michael Hrivnak wrote:
>     > I like thinking about this as business logic. Data may be valid, but it 
> may not be usable in a particular context.
>     >
>     > To help figure out where such logic should live, it may help to think 
> about where the check is most important.
>     > I've described that time as "at the time of use" earlier in this 
> discussion (maybe just on IRC). With sync as
>     > an example, a workflow will load an importer's config from the 
> database, check it for problems, and then
>     > immediately use the values it just inspected. This is the critical 
> moment where it must gracefully handle
>     > unusable data. This check ensures correct behavior and avoids an 
> unhandled exception or crash.
> 
>     +1
> 
>     >
>     > We can and should also check for problems at earlier opportunities, 
> such as at the time a user tries to queue
>     > a sync. This improves the user experience, but it is not required for 
> correct and safe operation.
> 
>     The data can change between the time when the task is queued and when it 
> is executed.  This adds complexity
>     and the value added is not deterministic.  -1 for a check here.
> 
>     >
>     > Given that, I think it makes sense to put the check close to the data. 
> A method on the model seems reasonable.
>     > In terms of polluting the model with business logic, it isn't that 
> different from defining a custom query set
>     > on the model, which django encourages.
> 
>     Agreed, but only for data integrity. Each field should be validated.  
> And, the record (cross field) should be
>     validated.  We don't want to store things in the DB that are invalid.  
> Seems like the django ORM already
>     provides for this, right?  But, the model should not have methods to 
> validate itself for a particular
>     operation (usage).  For example, ensure_syncable() that validates the 
> feed URL would not be appropriate.  This
>     should be checked at run-time by the object doing the sync.  Perhaps a 
> double dispatch model in
>     plugin.Importer could do:
> 
>     def synchronize():
>         # do validation and preparation.
>         return self._synchronize()
> 
>     def _synchronize():
>         # overridden by plugins.
>         raise NotImplementedError()
> 
> 
>     >
>     > As a slight tangent, some applications take this sort of checking even 
> further. An admirable approach in REST
>     > API design, which may not be a good idea for us at this time but is 
> interesting to note, is to make a behavior
>     > such as "sync" only available via a link accessed via a known name in 
> an object's representation. That's a
>     > mouthful, so here's an example:
>     >
>     > {
>     >   "id": "foo",
>     >   "feed": "http://cdn.redhat.com/stuff/";,
>     >   "_links":
>     >   {
>     >     "self": "http://here.com/importers/foo";,
>     >     "sync": "http://here.com/importers/foo/sync 
> <http://here.com/importers/foo/sync>"
>     >   }
>     > }
>     >
>     > Consider that the link for starting a sync is not part of the published 
> API, except that it must be obtained
>     > from this representation. There are two advantages here.
>     >
>     > The main advantage I'm pointing out is that when the server creates 
> this representation of an Importer, it
>     > would only include the "sync" link if the current state of the object 
> would allow for a sync. If there were no
>     > feed, there would be no sync link, and thus the client would be unable 
> to even try starting one. So this is a
>     > third opportunity to check whether the object's state is suitable for a 
> sync. It even allows the client to
>     > show or hide a "sync" button without having to re-implement the 
> business logic that's already present on the
>     > server side. Neat, huh?
> 
>     Interesting.
> 
>     However, the duration of how long the 'sync' link is valid is not 
> deterministic.  1 ms after the link is
>     included in the returned document, a DB commit could set it to NULL.  The 
> link could be invalid by the time
>     the user even receives the REST reply.
> 
>     >
>     > Another advantage to this kind of approach is a smaller API surface 
> area. We could theoretically change the
>     > sync URL schema at any time. We could even move it to a new, separate 
> service. We'd still need to document how
>     > to use it, but it's actual location can change. In practice I don't 
> think this aspect is all that valuable
>     > unless you are 100% bought in to this design. But it's fun to think 
> about.
>     >
>     > And to re-state, this kind of thing may not be worth our time to 
> actually do right now, and I'm not proposing
>     > it. I don't know to what extent DRF would make this easy. But I wanted 
> to bring it up for interest's sake as
>     > yet another place in the workflow, even closer to the end user than the 
> other two we've discussed, where
>     > applications have an opportunity to utilize context checking of data.
>     >
>     > On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald <amacd...@redhat.com 
> <mailto:amacd...@redhat.com> <mailto:amacd...@redhat.com
>     <mailto:amacd...@redhat.com>>> wrote:
>     >
>     >     Where should business logic live? As an example, I want to consider 
> the sync task [0] and the need to
>     >     ensure that an importer is syncable. For now, let's say that an 
> importer is syncable if it has a feed_url.
>     >
>     >     Since the call to sync an importer comes from the API, the earliest 
> we can check that the configuration is
>     >     syncable is in a not-yet-written SyncView. Checking syncability 
> here is ideal because it allows us to
>     >     immediately return a meaningful http response instead of happily 
> returning information about a Task that
>     >     we already know is doomed to fail.
>     >
>     >     Performing the check in the API layer is not enough. We have 
> discussed edge cases that lead to an
>     >     importer's feed_url being removed while the sync task is in the 
> WAITING state. To make an assertion to the
>     >     plugin that the feed_url exists, we have to check syncability again 
> when the Task moves into an active state.
>     >
>     >     My thinking is that we should put this business logic on the model.
>     >
>     >     Admittedly, it is not a clean fit with the separation of concerns 
> philosophy but we have already violated
>     >     the concept by putting the sync method on the model. If sync is on 
> the model, it seems like
>     >     ensure_syncable should be too.
>     >
>     >     If we write the platform API layer to use this kind of business 
> logic, then the plugins can add double
>     >     checking business logic without modifying the API and Task layers.
>     >
>     >
>     >     [0]: https://pulp.plan.io/issues/2399 
> <https://pulp.plan.io/issues/2399>
>     <https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>>
>     >
>     >     On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers <sean.my...@redhat.com 
> <mailto:sean.my...@redhat.com>
>     <mailto:sean.my...@redhat.com <mailto:sean.my...@redhat.com>>> wrote:
>     >
>     >         On 04/07/2017 12:08 PM, Brian Bouterse wrote:
>     >         > == questions ==
>     >         > * Where should ^ terms be documented?
>     >
>     >         I'm not really sure, but recommend the wiki as a good starting 
> point for
>     >         putting information that we should probably "officially" 
> document *somewhere*,
>     >         but at the moment we aren't quite sure where.
>     >
>     >         https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>     <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>
>     >         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>     <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>>
>     >
>     >         > * Take the case of a sync which has overrides provided? This 
> isn't in the
>     >         > MVP, but in the future it could be. In that case, does the 
> serializer
>     >         > associated with the importer validate the database data with 
> the overrides
>     >         > "added" on top of it?
>     >
>     >         My question here is "validate against what?". It makes sense to 
> validate
>     >         against database data, but as long as the overrides aren't 
> themselves stored
>     >         in the database, what does this really stop?
>     >
>     >         For example, what prevents two simultaneous syncs of repos 
> using overrides
>     >         that would trigger a constraint violation if they were saved, 
> but don't do
>     >         this because we don't save the overrides?
>     >
>     >         > * For plugin writers writing a serializer for a subclassed 
> Importer, do
>     >         > they also need to express validations for fields defined on 
> the base
>     >         > Importer?
>     >
>     >         It depends on the validation. If it's just validating an 
> individual field,
>     >         no. If it's validation of a combination of values in multiple 
> fields, and
>     >         one of those fields in this case was defined in a subclass, the 
> subclass
>     >         will need to add the proper validation support.
>     >
>     >         > * The database still rejects data that doesn't adhere to the 
> data layer
>     >         > definition right? That occurs even without the DRF serializer 
> correct?
>     >
>     >         Again, this depends. For example, attempting to store an int in 
> a charfield
>     >         will work, because Django will coerce that int to string on 
> save. Attempting
>     >         to store a string in an IntegerField will fail, though, because 
> Django is
>     >         not able to coerce str to int prior to saving. Generally, 
> though, your
>     >         understanding is correct. Anything that the database can't 
> handle will be
>     >         rejected.
>     >
>     >         > * In cases where data is created in the backend, do we need 
> to validate
>     >         > that as a general practice? If we do, do we call the DRF 
> serializer
>     >         > regularly in the backend or just let the database reject 
> "bad" data at the
>     >         > db level?
>     >
>     >         As a general practice, I don't think so. Specifically, though, 
> when we're
>     >         passing data around, like when a bit of platform code is taking 
> incoming
>     >         plugin data and passing it into some standard workflow that 
> platform provides
>     >         (like running sync on an importer, say) I think it's going to 
> be a good idea
>     >         and an all-around gentlemenly thing to do to validate that data 
> in some way
>     >         that appropriate to the process/workflow being invoked.
>     >
>     >         I'm concerned about finding the balance between making things 
> user-friendly
>     >         for plugin writers and having our checking code that provides 
> that user-
>     >         friendly-ness itself be difficult to maintain and end up being 
> pulp-developer-
>     >         unfriendly.
>     >
>     >
>     >         _______________________________________________
>     >         Pulp-dev mailing list
>     >         Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com> 
> <mailto:Pulp-dev@redhat.com
>     <mailto:Pulp-dev@redhat.com>>
>     >         https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev> 
> <https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>     >
>     >
>     >
>     >     _______________________________________________
>     >     Pulp-dev mailing list
>     >     Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com> 
> <mailto:Pulp-dev@redhat.com
>     <mailto:Pulp-dev@redhat.com>>
>     >     https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
>     <https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>     >
>     >
>     >
>     >
>     > --
>     >
>     > Michael Hrivnak
>     >
>     > Principal Software Engineer, RHCE
>     >
>     > Red Hat
>     >
>     >
>     >
>     > _______________________________________________
>     > Pulp-dev mailing list
>     > Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
>     > https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
>     >
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] 3.0 Validation update

Reply via email to