Re: Model-validation: call for discussions

Malcolm Tredinnick Mon, 19 Jan 2009 03:43:27 -0800

As I understand it, this is primarily about avoiding duplicate
validation of some pieces of data when it's not necessary, right? So
it's really only applicable to the ModelForm case?

This is a pretty good summary of the situation, although I suspect
there's a fairly easy solution at hand, which I've outlined below.

On Mon, 2009-01-19 at 02:47 -0800, mrts wrote:

[...]
> 
> The main focus is on fields.
> 
> Form fields
> ===========

This all sounds like a description of the current situation, which is
fine.

> Model fields
> ============
> 
> Ditto for model fields, omitting the first step: format validation,
> coercion to Python and value validation is generally required.

Agreed.

> 
> If, however, we want to more radically depart from current
> behaviour, then it is possible and indeed reasonable to assume that
> the values assigned to model fields are already the expected Python
> types, as it is the application developer’s responsibility to assure
> that. This would get rid of the format validation and coercion step
> (these are already handled by the form layer). to_python() is
> currently used in model fields for controlling the conversion

As a practical matter, that ship has already sailed. Lots of code
assigns things like strings to datetime fields. I've always thought that
was an unfortunate decision, but it's hardly a showstopper. Model
validators will need to be able to convert from reasonable input
variations to the right Python object. It's not unreasonable to expect,
however, that after validation/cleaning has occurred on a model, all the
attributes contains things of the right type (assuming the model is
valid). That means that subsequent code working with, e.g, a datetime
attribute on a valid model doesn't have to worry that
mymodel.foo.strftime() will raise an exception because "foo" happens to
be a string, not a datetime.

> Interaction between form and model fields
> =========================================
> 
> Forms and models should be orthogonal and oblivious of each other.
> Modelforms serves as the glue and mapping between the two, being
> aware of the internals of both.
> 
> A modelform should have the following responsibilites in validation
> context:
> 
>  * clean the form (performs format validation, coercion to a Python
>    type and value validation),

>  * assign the form field values to the associated model fields,
> 
>  * inform the model fields that basic validation has already been done
> to
>    avoid duplicated validation and call any additional validation
>    methods.

That's one approach, but the alternative I've sketched below is another
one. Short version: when the form field validation would match what the
model field is going to do anyway, don't do anything at the form level.
The model field validation is about to be called anyway.

> Thus, the modelform should be able to invoke only the additional
> and custom model field validators, *skipping the default coercers
> and validators* that can be assumed to be the same in similar form
> and model field classes (e.g. an IntegerField in forms and
> IntegerField in models).

That's a slightly flawed assumption, as it won't necessarily be true.
Particularly in the case of custom form field overrides on a model form.
There could be any number of variations, from the range of data
permitted to the types.

[Aside: It's certainly reasonable to think about the duplicated effort,
but I wouldn't worry about it too much. It's not a huge amount of
duplicated computation, since the case when something is already of the
right type and passes normalisation and validation tends to be quite
quick (an if-check or two and then moving on). So if we don't have a
fantastically neat solution for this initially in the communcation
between ModelForms and Models, the world won't stop spinning. That's
only one use-case in the much larger scheme of things (again, I'm not
dismissing it, but let's not get too hung up on this case). That being
said... ]

The solution here might not be too difficult and doesn't need the
requirement of communication from the form to the model about what it's
already done.

To wit, when a formfield is constructed automatically from the model
field, if the default formfield doesn't do any validation/normalisation
beyond what the normal model field does, the formfield() method on the
Field subclass returns a formfield with no validators. Thus, when the
form cleaning process is run, all the validation for that field is done
by the model. Thus form fields returned for model fields could well be
slightly modified (in that their validator list is different) from what
you get directly from using forms.fields.*.

That dovetails nicely with the reuqirements of custom form fields. When
a custom form field is created, the form field validation won't be
skipped (unless the creator wants to remove any validation from the form
field), so any differences between the default model field validation
and the custom form field validation aren't overlooked: they both get
run (which is the correct behaviour).

> General validation principles
> =============================
> 
> The extended conclusion of the sections above is as follows.
> 
> Double validation should never happen.

It doesn't hurt if it does and sometimes both form and models will need
to run validation on the same piece of data (e.g. the custom form field
case).

> And now something completely different
> ======================================
> 
> "Every problem in computer science can be solved by
> another level of indirection."
>  --- source unknown

"... and now you (often) have two problems" -- Malcolm (précising a few
hundred other people). :-)

> 
> Both form and model fields need similar functionality that is not
> well served by just duplicating validation and coercion functions
> in them.

The two data flows are similar, but not identical. I'd tend to shy away
from class-ifying this if we don't have to. That is, prefer direct
functions over yet another class, simply because there's not really a
class structure there -- a need to poke at class internals -- and it's
not going to enough functions to require namespacing. 

Attempts to abstract away similar-but-not-the-same algorithms can often
lead to more difficult code than writing the same thing out directly in
the two places. It tends to show up in the number of parameters you pass
into the abstract algorithm and/or the complexity of the return types.
And, in this case, it's only two places, not fourteen or anything like
that. So whether this is worth it tends to be implementation specific.

> Action plan for 1.1
> ===================
> 
> The mixin approach is not in scope for 1.1. We go with what we
> already have, factoring the bits in current model fields' to_python
> to a separate django.utils.typeconverters library
> (see 
> http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a01785181a91a37971
> for the first steps in that direction) and using both
> django.core.validators and django.utils.typeconverters throughout
> model and form fields.

As a style issue, there's a bit of an arbitrary split going on there.
Type conversion is really part of what we're calling "validation" (since
the type coercion is part of the normalisation part of the process,
which we tend to lump into validation). I'd lump things a bit closer
together in the source. There's not really an external use-case scenario
for the converters and, even within Django, they don't have a lot of
user outside of the validators (we can't do a lot of sharing with the
database-specific backend converters, since they do tend to be
backend-specific).

Maybe (and looking at my notes, I might upgrade that to probably, given
the number of times I've mentioned it), we need to make a directory for
django.core.validators instead of a single file. That gives us room to
split out converters from things that work with standard types. But
definitely try to keep them close together in the source, rather than
putting them in django.utils, which can sometimes be a dumping ground
for stuff that better belongs elsewhere.

> I'm not sure about Honza's plans in regard avoiding duplicate
> validation, hopefully he comments on that himself.

> Also, let me remind that model and form objects have not been
> discussed in this post (and it's already more than 150 lines long),
> only fields.

I don't think those two situations are particularly tricky, are they?

Validation on forms shouldn't change from what it is now. Partly because
it Just Works pretty logically, but primarily because we should do the
hard work so that users don't have to and don't need to rewrite their
code.

We do need a multi-field validation method on models, similar to
Form.clean(). There's currently a Model.clean() method in Honza's code,
which I've been reviewing, and I've got a note to probably change the
name a bit there so that the clean() name can be reserved for people
wanting to write multi-field validation, keeping things in parallel with
the way Forms work. That's a naming issue, not a substantive one,
however. As you've noted, the algorithm for forms and models isn't that
difference.

Anyway, nice write-up. Be interesting to see what others think.

Regards,
Malcolm

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Model-validation: call for discussions

Reply via email to