Re: [Pulp-dev] 3.0 Validation update

Jeff Ortel Wed, 12 Apr 2017 07:12:21 -0700


On 04/11/2017 04:08 PM, Michael Hrivnak wrote:
> I like thinking about this as business logic. Data may be valid, but it may 
> not be usable in a particular context.
> 
> To help figure out where such logic should live, it may help to think about 
> where the check is most important.
> I've described that time as "at the time of use" earlier in this discussion 
> (maybe just on IRC). With sync as
> an example, a workflow will load an importer's config from the database, 
> check it for problems, and then
> immediately use the values it just inspected. This is the critical moment 
> where it must gracefully handle
> unusable data. This check ensures correct behavior and avoids an unhandled 
> exception or crash.


+1

> 
> We can and should also check for problems at earlier opportunities, such as 
> at the time a user tries to queue
> a sync. This improves the user experience, but it is not required for correct 
> and safe operation.

The data can change between the time when the task is queued and when it is 
executed.  This adds complexity
and the value added is not deterministic.  -1 for a check here.

> 
> Given that, I think it makes sense to put the check close to the data. A 
> method on the model seems reasonable.
> In terms of polluting the model with business logic, it isn't that different 
> from defining a custom query set
> on the model, which django encourages.

Agreed, but only for data integrity. Each field should be validated.  And, the 
record (cross field) should be
validated.  We don't want to store things in the DB that are invalid.  Seems 
like the django ORM already
provides for this, right?  But, the model should not have methods to validate 
itself for a particular
operation (usage).  For example, ensure_syncable() that validates the feed URL 
would not be appropriate.  This
should be checked at run-time by the object doing the sync.  Perhaps a double 
dispatch model in
plugin.Importer could do:

def synchronize():
    # do validation and preparation.
    return self._synchronize()

def _synchronize():
    # overridden by plugins.
    raise NotImplementedError()


> 
> As a slight tangent, some applications take this sort of checking even 
> further. An admirable approach in REST
> API design, which may not be a good idea for us at this time but is 
> interesting to note, is to make a behavior
> such as "sync" only available via a link accessed via a known name in an 
> object's representation. That's a
> mouthful, so here's an example:
> 
> {
>   "id": "foo",
>   "feed": "http://cdn.redhat.com/stuff/";,
>   "_links":
>   {
>     "self": "http://here.com/importers/foo";,
>     "sync": "http://here.com/importers/foo/sync";
>   }
> }
> 
> Consider that the link for starting a sync is not part of the published API, 
> except that it must be obtained
> from this representation. There are two advantages here.
> 
> The main advantage I'm pointing out is that when the server creates this 
> representation of an Importer, it
> would only include the "sync" link if the current state of the object would 
> allow for a sync. If there were no
> feed, there would be no sync link, and thus the client would be unable to 
> even try starting one. So this is a
> third opportunity to check whether the object's state is suitable for a sync. 
> It even allows the client to
> show or hide a "sync" button without having to re-implement the business 
> logic that's already present on the
> server side. Neat, huh?

Interesting.

However, the duration of how long the 'sync' link is valid is not 
deterministic.  1 ms after the link is
included in the returned document, a DB commit could set it to NULL.  The link 
could be invalid by the time
the user even receives the REST reply.

> 
> Another advantage to this kind of approach is a smaller API surface area. We 
> could theoretically change the
> sync URL schema at any time. We could even move it to a new, separate 
> service. We'd still need to document how
> to use it, but it's actual location can change. In practice I don't think 
> this aspect is all that valuable
> unless you are 100% bought in to this design. But it's fun to think about.
> 
> And to re-state, this kind of thing may not be worth our time to actually do 
> right now, and I'm not proposing
> it. I don't know to what extent DRF would make this easy. But I wanted to 
> bring it up for interest's sake as
> yet another place in the workflow, even closer to the end user than the other 
> two we've discussed, where
> applications have an opportunity to utilize context checking of data.
> 
> On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald <amacd...@redhat.com 
> <mailto:amacd...@redhat.com>> wrote:
> 
>     Where should business logic live? As an example, I want to consider the 
> sync task [0] and the need to
>     ensure that an importer is syncable. For now, let's say that an importer 
> is syncable if it has a feed_url. 
> 
>     Since the call to sync an importer comes from the API, the earliest we 
> can check that the configuration is
>     syncable is in a not-yet-written SyncView. Checking syncability here is 
> ideal because it allows us to
>     immediately return a meaningful http response instead of happily 
> returning information about a Task that
>     we already know is doomed to fail. 
> 
>     Performing the check in the API layer is not enough. We have discussed 
> edge cases that lead to an
>     importer's feed_url being removed while the sync task is in the WAITING 
> state. To make an assertion to the
>     plugin that the feed_url exists, we have to check syncability again when 
> the Task moves into an active state.
> 
>     My thinking is that we should put this business logic on the model.
> 
>     Admittedly, it is not a clean fit with the separation of concerns 
> philosophy but we have already violated
>     the concept by putting the sync method on the model. If sync is on the 
> model, it seems like
>     ensure_syncable should be too. 
> 
>     If we write the platform API layer to use this kind of business logic, 
> then the plugins can add double
>     checking business logic without modifying the API and Task layers.
> 
> 
>     [0]: https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>
> 
>     On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers <sean.my...@redhat.com 
> <mailto:sean.my...@redhat.com>> wrote:
> 
>         On 04/07/2017 12:08 PM, Brian Bouterse wrote:
>         > == questions ==
>         > * Where should ^ terms be documented?
> 
>         I'm not really sure, but recommend the wiki as a good starting point 
> for
>         putting information that we should probably "officially" document 
> *somewhere*,
>         but at the moment we aren't quite sure where.
> 
>         https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>
> 
>         > * Take the case of a sync which has overrides provided? This isn't 
> in the
>         > MVP, but in the future it could be. In that case, does the 
> serializer
>         > associated with the importer validate the database data with the 
> overrides
>         > "added" on top of it?
> 
>         My question here is "validate against what?". It makes sense to 
> validate
>         against database data, but as long as the overrides aren't themselves 
> stored
>         in the database, what does this really stop?
> 
>         For example, what prevents two simultaneous syncs of repos using 
> overrides
>         that would trigger a constraint violation if they were saved, but 
> don't do
>         this because we don't save the overrides?
> 
>         > * For plugin writers writing a serializer for a subclassed 
> Importer, do
>         > they also need to express validations for fields defined on the base
>         > Importer?
> 
>         It depends on the validation. If it's just validating an individual 
> field,
>         no. If it's validation of a combination of values in multiple fields, 
> and
>         one of those fields in this case was defined in a subclass, the 
> subclass
>         will need to add the proper validation support.
> 
>         > * The database still rejects data that doesn't adhere to the data 
> layer
>         > definition right? That occurs even without the DRF serializer 
> correct?
> 
>         Again, this depends. For example, attempting to store an int in a 
> charfield
>         will work, because Django will coerce that int to string on save. 
> Attempting
>         to store a string in an IntegerField will fail, though, because 
> Django is
>         not able to coerce str to int prior to saving. Generally, though, your
>         understanding is correct. Anything that the database can't handle 
> will be
>         rejected.
> 
>         > * In cases where data is created in the backend, do we need to 
> validate
>         > that as a general practice? If we do, do we call the DRF serializer
>         > regularly in the backend or just let the database reject "bad" data 
> at the
>         > db level?
> 
>         As a general practice, I don't think so. Specifically, though, when 
> we're
>         passing data around, like when a bit of platform code is taking 
> incoming
>         plugin data and passing it into some standard workflow that platform 
> provides
>         (like running sync on an importer, say) I think it's going to be a 
> good idea
>         and an all-around gentlemenly thing to do to validate that data in 
> some way
>         that appropriate to the process/workflow being invoked.
> 
>         I'm concerned about finding the balance between making things 
> user-friendly
>         for plugin writers and having our checking code that provides that 
> user-
>         friendly-ness itself be difficult to maintain and end up being 
> pulp-developer-
>         unfriendly.
> 
> 
>         _______________________________________________
>         Pulp-dev mailing list
>         Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
>         https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev@redhat.com <mailto:Pulp-dev@redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev 
> <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
> 
> -- 
> 
> Michael Hrivnak
> 
> Principal Software Engineer, RHCE 
> 
> Red Hat
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] 3.0 Validation update

Reply via email to