Re: [Pulp-dev] 3.0 Validation update

Jeff Ortel Mon, 17 Apr 2017 12:09:38 -0700


On 04/17/2017 09:50 AM, Michael Hrivnak wrote:
> That MVP scoping sounds reasonable.
> 
> I'm not sure we decided 100% on whether those operations need to be tasks, 
> but I think everyone seems to be
> leaning in the direction that update does not. But it partly will come down 
> to a plugin API question. If
> updates can happen any time, then the plugin should load its config exactly 
> once and not re-read it during the
> task. The plugin API should encourage that workflow, if not enforce it.


The Importer instance is fetched from the DB and its attributes are the 
/config/.  So, would only be fetched
once from the DB for any task.

> 
> Delete could be a little more challenging, because of the potential desire 
> for an importer or publisher to
> want to update attributes on it, like the last time a sync succeeded. Maybe 
> we can manage to avoid the need
> for a plugin to write to its config, which also seems like a very reasonable 
> goal, but that again could be
> part of the plugin API discussion. Although thinking about relationships that 
> are likely to exist in the
> database, having an importer or publisher disappear while a corresponding 
> task is running sounds like it could
> cause a lot of potential problems.

What kind of problems?

> 
> On Fri, Apr 14, 2017 at 11:23 AM, Brian Bouterse <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     Did we ever decide on if delete and modify operations on an importer 
> would be synchronous API operations
>     or tasking system tasks?
> 
>     From what I'm reading, there isn't a MVP need (and maybe ever) to verify 
> the feed is set at the sync API
>     call and in the task. We definitely need to do that business logic, i.e. 
> verify the feed != None, in the
>     task for the MVP. If we don't need to do it in two places now, then I 
> think doing it procedurally in the
>     sync task here [0] would be good enough. That means we would not add it 
> to the model as a method. I think
>     this business logic in the sync task would be:
> 
>     ```
>     if importer.feed is None:
>         raise ValueError("The 'feed' attribute on the importer cannot be none 
> when syncing")  # <--- could
>     should probably be a descendent of PulpException since it's a 
> predictable/common error
>     ```
> 
>     [0]: 
> https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30
>     
> <https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30>
> 
>     -Brian
> 
> 
> 
>     On Wed, Apr 12, 2017 at 10:11 AM, Jeff Ortel <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> 
>         On 04/11/2017 04:08 PM, Michael Hrivnak wrote:
>         > I like thinking about this as business logic. Data may be valid, 
> but it may not be usable in a particular context.
>         >
>         > To help figure out where such logic should live, it may help to 
> think about where the check is most important.
>         > I've described that time as "at the time of use" earlier in this 
> discussion (maybe just on IRC). With sync as
>         > an example, a workflow will load an importer's config from the 
> database, check it for problems, and then
>         > immediately use the values it just inspected. This is the critical 
> moment where it must gracefully handle
>         > unusable data. This check ensures correct behavior and avoids an 
> unhandled exception or crash.
> 
>         +1
> 
>         >
>         > We can and should also check for problems at earlier opportunities, 
> such as at the time a user tries to queue
>         > a sync. This improves the user experience, but it is not required 
> for correct and safe operation.
> 
>         The data can change between the time when the task is queued and when 
> it is executed.  This adds
>         complexity
>         and the value added is not deterministic.  -1 for a check here.
> 
>         >
>         > Given that, I think it makes sense to put the check close to the 
> data. A method on the model seems reasonable.
>         > In terms of polluting the model with business logic, it isn't that 
> different from defining a custom query set
>         > on the model, which django encourages.
> 
>         Agreed, but only for data integrity. Each field should be validated.  
> And, the record (cross field)
>         should be
>         validated.  We don't want to store things in the DB that are invalid. 
>  Seems like the django ORM already
>         provides for this, right?  But, the model should not have methods to 
> validate itself for a particular
>         operation (usage).  For example, ensure_syncable() that validates the 
> feed URL would not be
>         appropriate.  This
>         should be checked at run-time by the object doing the sync.  Perhaps 
> a double dispatch model in
>         plugin.Importer could do:
> 
>         def synchronize():
>             # do validation and preparation.
>             return self._synchronize()
> 
>         def _synchronize():
>             # overridden by plugins.
>             raise NotImplementedError()
> 
> 
>         >
>         > As a slight tangent, some applications take this sort of checking 
> even further. An admirable approach in REST
>         > API design, which may not be a good idea for us at this time but is 
> interesting to note, is to make a behavior
>         > such as "sync" only available via a link accessed via a known name 
> in an object's representation. That's a
>         > mouthful, so here's an example:
>         >
>         > {
>         >   "id": "foo",
>         >   "feed": "http://cdn.redhat.com/stuff/";,
>         >   "_links":
>         >   {
>         >     "self": "http://here.com/importers/foo";,
>         >     "sync": "http://here.com/importers/foo/sync 
> <http://here.com/importers/foo/sync>"
>         >   }
>         > }
>         >
>         > Consider that the link for starting a sync is not part of the 
> published API, except that it must be obtained
>         > from this representation. There are two advantages here.
>         >
>         > The main advantage I'm pointing out is that when the server creates 
> this representation of an Importer, it
>         > would only include the "sync" link if the current state of the 
> object would allow for a sync. If there were no
>         > feed, there would be no sync link, and thus the client would be 
> unable to even try starting one. So this is a
>         > third opportunity to check whether the object's state is suitable 
> for a sync. It even allows the client to
>         > show or hide a "sync" button without having to re-implement the 
> business logic that's already present on the
>         > server side. Neat, huh?
> 
>         Interesting.
> 
>         However, the duration of how long the 'sync' link is valid is not 
> deterministic.  1 ms after the link is
>         included in the returned document, a DB commit could set it to NULL.  
> The link could be invalid by the
>         time
>         the user even receives the REST reply.
> 
>         >
>         > Another advantage to this kind of approach is a smaller API surface 
> area. We could theoretically change the
>         > sync URL schema at any time. We could even move it to a new, 
> separate service. We'd still need to document how
>         > to use it, but it's actual location can change. In practice I don't 
> think this aspect is all that valuable
>         > unless you are 100% bought in to this design. But it's fun to think 
> about.
>         >
>         > And to re-state, this kind of thing may not be worth our time to 
> actually do right now, and I'm not proposing
>         > it. I don't know to what extent DRF would make this easy. But I 
> wanted to bring it up for interest's sake as
>         > yet another place in the workflow, even closer to the end user than 
> the other two we've discussed, where
>         > applications have an opportunity to utilize context checking of 
> data.
>         >
>         > On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald 
> <[email protected] <mailto:[email protected]> <mailto:[email protected]
>         <mailto:[email protected]>>> wrote:
>         >
>         >     Where should business logic live? As an example, I want to 
> consider the sync task [0] and the need to
>         >     ensure that an importer is syncable. For now, let's say that an 
> importer is syncable if it has a feed_url.
>         >
>         >     Since the call to sync an importer comes from the API, the 
> earliest we can check that the configuration is
>         >     syncable is in a not-yet-written SyncView. Checking syncability 
> here is ideal because it allows us to
>         >     immediately return a meaningful http response instead of 
> happily returning information about a Task that
>         >     we already know is doomed to fail.
>         >
>         >     Performing the check in the API layer is not enough. We have 
> discussed edge cases that lead to an
>         >     importer's feed_url being removed while the sync task is in the 
> WAITING state. To make an assertion to the
>         >     plugin that the feed_url exists, we have to check syncability 
> again when the Task moves into an active state.
>         >
>         >     My thinking is that we should put this business logic on the 
> model.
>         >
>         >     Admittedly, it is not a clean fit with the separation of 
> concerns philosophy but we have already violated
>         >     the concept by putting the sync method on the model. If sync is 
> on the model, it seems like
>         >     ensure_syncable should be too.
>         >
>         >     If we write the platform API layer to use this kind of business 
> logic, then the plugins can add double
>         >     checking business logic without modifying the API and Task 
> layers.
>         >
>         >
>         >     [0]: https://pulp.plan.io/issues/2399 
> <https://pulp.plan.io/issues/2399>
>         <https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>>
>         >
>         >     On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers 
> <[email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>> wrote:
>         >
>         >         On 04/07/2017 12:08 PM, Brian Bouterse wrote:
>         >         > == questions ==
>         >         > * Where should ^ terms be documented?
>         >
>         >         I'm not really sure, but recommend the wiki as a good 
> starting point for
>         >         putting information that we should probably "officially" 
> document *somewhere*,
>         >         but at the moment we aren't quite sure where.
>         >
>         >         
> https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>
>         >         
> <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>>
>         >
>         >         > * Take the case of a sync which has overrides provided? 
> This isn't in the
>         >         > MVP, but in the future it could be. In that case, does 
> the serializer
>         >         > associated with the importer validate the database data 
> with the overrides
>         >         > "added" on top of it?
>         >
>         >         My question here is "validate against what?". It makes 
> sense to validate
>         >         against database data, but as long as the overrides aren't 
> themselves stored
>         >         in the database, what does this really stop?
>         >
>         >         For example, what prevents two simultaneous syncs of repos 
> using overrides
>         >         that would trigger a constraint violation if they were 
> saved, but don't do
>         >         this because we don't save the overrides?
>         >
>         >         > * For plugin writers writing a serializer for a 
> subclassed Importer, do
>         >         > they also need to express validations for fields defined 
> on the base
>         >         > Importer?
>         >
>         >         It depends on the validation. If it's just validating an 
> individual field,
>         >         no. If it's validation of a combination of values in 
> multiple fields, and
>         >         one of those fields in this case was defined in a subclass, 
> the subclass
>         >         will need to add the proper validation support.
>         >
>         >         > * The database still rejects data that doesn't adhere to 
> the data layer
>         >         > definition right? That occurs even without the DRF 
> serializer correct?
>         >
>         >         Again, this depends. For example, attempting to store an 
> int in a charfield
>         >         will work, because Django will coerce that int to string on 
> save. Attempting
>         >         to store a string in an IntegerField will fail, though, 
> because Django is
>         >         not able to coerce str to int prior to saving. Generally, 
> though, your
>         >         understanding is correct. Anything that the database can't 
> handle will be
>         >         rejected.
>         >
>         >         > * In cases where data is created in the backend, do we 
> need to validate
>         >         > that as a general practice? If we do, do we call the DRF 
> serializer
>         >         > regularly in the backend or just let the database reject 
> "bad" data at the
>         >         > db level?
>         >
>         >         As a general practice, I don't think so. Specifically, 
> though, when we're
>         >         passing data around, like when a bit of platform code is 
> taking incoming
>         >         plugin data and passing it into some standard workflow that 
> platform provides
>         >         (like running sync on an importer, say) I think it's going 
> to be a good idea
>         >         and an all-around gentlemenly thing to do to validate that 
> data in some way
>         >         that appropriate to the process/workflow being invoked.
>         >
>         >         I'm concerned about finding the balance between making 
> things user-friendly
>         >         for plugin writers and having our checking code that 
> provides that user-
>         >         friendly-ness itself be difficult to maintain and end up 
> being pulp-developer-
>         >         unfriendly.
>         >
>         >
>         >         _______________________________________________
>         >         Pulp-dev mailing list
>         >         [email protected] <mailto:[email protected]> 
> <mailto:[email protected]
>         <mailto:[email protected]>>
>         >         https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev> 
> <https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>         >
>         >
>         >
>         >     _______________________________________________
>         >     Pulp-dev mailing list
>         >     [email protected] <mailto:[email protected]> 
> <mailto:[email protected]
>         <mailto:[email protected]>>
>         >     https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev> 
> <https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>         >
>         >
>         >
>         >
>         > --
>         >
>         > Michael Hrivnak
>         >
>         > Principal Software Engineer, RHCE
>         >
>         > Red Hat
>         >
>         >
>         >
>         > _______________________________________________
>         > Pulp-dev mailing list
>         > [email protected] <mailto:[email protected]>
>         > https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
>         >
> 
> 
>         _______________________________________________
>         Pulp-dev mailing list
>         [email protected] <mailto:[email protected]>
>         https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     [email protected] <mailto:[email protected]>
>     https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
> 
> -- 
> 
> Michael Hrivnak
> 
> Principal Software Engineer, RHCE 
> 
> Red Hat
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] 3.0 Validation update

Reply via email to