Re: Model-level validation

'Barry Johnson' via Django developers (Contributions to Django itself) Fri, 07 Oct 2022 07:53:44 -0700

I agree with James in several ways.   Our large Django application does 
rather extensive validation of data -- but I would argue strongly against 
embedding that validation in the base instance.save() logic.

(I would not argue against Django including a "ValidatingModel", derived 
from Model, that automatically runs defined validations as part of save(). 
 Then developers could choose which base they'd like to subclass when 
designing their objects.  Of course, anyone could simply create their own 
"ValidatingModel" class and derive everything from that class.)

Reason 1 is that business logic validation often requires access to 
multiple model instances -- for example, uniqueness across a set of objects 
about to be updated.  (e.g., "Only one person can be marked as the primary 
contact").  Or internal consistency:   "If this record is of this type, 
then it cannot have any children of that type".  Or even referential 
integrity in some cases:  "The incoming data has a code that serves as the 
primary key in some other table.  Make sure that primary key exists."

Yes, you can encode all of those cross-instance validations into an 
instance-level check, but then that brings us to the second point: 
 Performance.  There are a number of types of validations that are best 
served by operating on sets or lists of instances at a time.   Again, 
consider a referential integrity validation:  If I'm about to bulk_create 
5000 instances, but need to confirm that the "xyz code" is valid for all of 
them, then I should run a query that selects the "xyz table" for all of the 
codes that are referenced within the 5000 items.... instead of doing 5000 
individual lookups within that table.   Yes, one can maintain and access 
caches of known-valid things, but those are awkward to manage from within 
the Model layer.  

It's particularly difficult to write performant validations within the 
model when you're using .only() or .defer().   Unless the validation logic 
is able to detect that certain properties haven't been loaded from the 
database, then they would trigger extra queries retrieving values from the 
database solely for the purpose of validating that they are still correct 
(even though you aren't changing them).

Also on the performance front, there are times that removing the extra 
layer of validation is necessary and appropriate.  With well-tested code, 
once the incoming data has been validated and the 
transformation/operational logic is considered fully tested and accurate, 
then avoiding a second validation on the outbound data can result in a 
significant performance improvement.  If you're dealing with millions or 
billions of records at a time (as we do during data conversions), then 
those significant performance improvements are worthwhile.

Finally, Django supports the queryset .update() method.  Again, validations 
that run within the model instance won't even HAVE instances when using 
.update() -- the queryset manager would need to figure out how to do the 
necessary validation (and if it's a multi-field validation, good luck!)   
There are also cases where the use of raw SQL is appropriate, and one 
obviously cannot lean on instance-level validation in that case.

Validation is indeed important -- but testing the validity of data belongs 
in the business logic layer, not in the database model layer.  Agreed that 
some types of validations can easily be encoded into the database model, 
but then you find yourselves writing two layers of validation ("one simple, 
the other more sophisticated")...  that that makes things even more 
complex.  We do indeed use the model-level validations for single-field 
validations... but we invoke those validations from our business logic at 
the proper time, not during the time we're saving the data to the database.

baj
------------
Barry Johnson
Epicor

On Thursday, October 6, 2022 at 2:47:19 AM UTC-5 James Bennett wrote:

> I see a lot of people mentioning that other ORMs do validation, but not 
> picking up on a key difference:
>
> Many ORMs are designed as standalone packages. For example, in Python 
> SQLAlchemy is a standalone DB/ORM package, and other languages have similar 
> popular ORMs.
>
> But Django's ORM isn't standalone. It's tightly integrated into Django, 
> and Django is a web framework. And once you focus *specifically* on the web 
> framework use case, suddenly things start going differently.
>
> For example: data on the web is "stringly-typed" (effectively, since HTTP 
> doesn't really have data types) and comes in via HTML's form mechanism or 
> other string-y formats like JSON or XML payloads. So you need not just data 
> *validation*, but data *conversion* which works for the web use case.
>
> And since the web use case inevitably involves supporting forms/payloads 
> that don't persist to a relational data store -- think of, for example, a 
> contact form that sends an email, or forms that store their results 
> client-side for things like language or theme preferences -- you inevitably 
> end up needing to do data conversion and validation *independently of the 
> ORM*.
>
> And at that point, you have to start asking tough questions about whether 
> it's worth having *two* conversion and validation layers, just because 
> "every other ORM has this, so we have to put one in the ORM".
>
> Which basically is where Django is. Yes, there are utilities to do your 
> data conversion and validation in the ORM layer if you want to. But Django 
> is, first and foremost, a web framework, which needs to support the web use 
> case I've described above, and so its primary conversion/validation layer 
> can never be the ORM.
>
> Personally, I wish model-level validation had never been added even as an 
> option, because in a web framework like Django it's conceptually the wrong 
> place to put the validation logic. Though that battle was lost many years 
> ago, I'd be *strongly* against trying to expand it or start forcing the ORM 
> to default to doing validation work that, in Django, properly belongs to 
> the forms layer (or to serializers if you use DRF).
>
> So: Django ships with ModelForm, which does the hard work of auto-deriving 
> as much validation logic as possible from your model definition so you 
> don't have to repeat it. DRF ships with ModelSerializer, which does the 
> same thing for its validation/conversion layer. I would strongly urge 
> people to use them. Trying to force all that validation back into the model 
> layer misses the bigger picture of what Django is and how it works.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/65af21e8-a420-4e86-98da-c39e59258002n%40googlegroups.com.

Re: Model-level validation

Reply via email to