I agree with James in several ways. Our large Django application does rather extensive validation of data -- but I would argue strongly against embedding that validation in the base instance.save() logic.
(I would not argue against Django including a "ValidatingModel", derived from Model, that automatically runs defined validations as part of save(). Then developers could choose which base they'd like to subclass when designing their objects. Of course, anyone could simply create their own "ValidatingModel" class and derive everything from that class.) Reason 1 is that business logic validation often requires access to multiple model instances -- for example, uniqueness across a set of objects about to be updated. (e.g., "Only one person can be marked as the primary contact"). Or internal consistency: "If this record is of this type, then it cannot have any children of that type". Or even referential integrity in some cases: "The incoming data has a code that serves as the primary key in some other table. Make sure that primary key exists." Yes, you can encode all of those cross-instance validations into an instance-level check, but then that brings us to the second point: Performance. There are a number of types of validations that are best served by operating on sets or lists of instances at a time. Again, consider a referential integrity validation: If I'm about to bulk_create 5000 instances, but need to confirm that the "xyz code" is valid for all of them, then I should run a query that selects the "xyz table" for all of the codes that are referenced within the 5000 items.... instead of doing 5000 individual lookups within that table. Yes, one can maintain and access caches of known-valid things, but those are awkward to manage from within the Model layer. It's particularly difficult to write performant validations within the model when you're using .only() or .defer(). Unless the validation logic is able to detect that certain properties haven't been loaded from the database, then they would trigger extra queries retrieving values from the database solely for the purpose of validating that they are still correct (even though you aren't changing them). Also on the performance front, there are times that removing the extra layer of validation is necessary and appropriate. With well-tested code, once the incoming data has been validated and the transformation/operational logic is considered fully tested and accurate, then avoiding a second validation on the outbound data can result in a significant performance improvement. If you're dealing with millions or billions of records at a time (as we do during data conversions), then those significant performance improvements are worthwhile. Finally, Django supports the queryset .update() method. Again, validations that run within the model instance won't even HAVE instances when using .update() -- the queryset manager would need to figure out how to do the necessary validation (and if it's a multi-field validation, good luck!) There are also cases where the use of raw SQL is appropriate, and one obviously cannot lean on instance-level validation in that case. Validation is indeed important -- but testing the validity of data belongs in the business logic layer, not in the database model layer. Agreed that some types of validations can easily be encoded into the database model, but then you find yourselves writing two layers of validation ("one simple, the other more sophisticated")... that that makes things even more complex. We do indeed use the model-level validations for single-field validations... but we invoke those validations from our business logic at the proper time, not during the time we're saving the data to the database. baj ------------ Barry Johnson Epicor On Thursday, October 6, 2022 at 2:47:19 AM UTC-5 James Bennett wrote: > I see a lot of people mentioning that other ORMs do validation, but not > picking up on a key difference: > > Many ORMs are designed as standalone packages. For example, in Python > SQLAlchemy is a standalone DB/ORM package, and other languages have similar > popular ORMs. > > But Django's ORM isn't standalone. It's tightly integrated into Django, > and Django is a web framework. And once you focus *specifically* on the web > framework use case, suddenly things start going differently. > > For example: data on the web is "stringly-typed" (effectively, since HTTP > doesn't really have data types) and comes in via HTML's form mechanism or > other string-y formats like JSON or XML payloads. So you need not just data > *validation*, but data *conversion* which works for the web use case. > > And since the web use case inevitably involves supporting forms/payloads > that don't persist to a relational data store -- think of, for example, a > contact form that sends an email, or forms that store their results > client-side for things like language or theme preferences -- you inevitably > end up needing to do data conversion and validation *independently of the > ORM*. > > And at that point, you have to start asking tough questions about whether > it's worth having *two* conversion and validation layers, just because > "every other ORM has this, so we have to put one in the ORM". > > Which basically is where Django is. Yes, there are utilities to do your > data conversion and validation in the ORM layer if you want to. But Django > is, first and foremost, a web framework, which needs to support the web use > case I've described above, and so its primary conversion/validation layer > can never be the ORM. > > Personally, I wish model-level validation had never been added even as an > option, because in a web framework like Django it's conceptually the wrong > place to put the validation logic. Though that battle was lost many years > ago, I'd be *strongly* against trying to expand it or start forcing the ORM > to default to doing validation work that, in Django, properly belongs to > the forms layer (or to serializers if you use DRF). > > So: Django ships with ModelForm, which does the hard work of auto-deriving > as much validation logic as possible from your model definition so you > don't have to repeat it. DRF ships with ModelSerializer, which does the > same thing for its validation/conversion layer. I would strongly urge > people to use them. Trying to force all that validation back into the model > layer misses the bigger picture of what Django is and how it works. > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/65af21e8-a420-4e86-98da-c39e59258002n%40googlegroups.com.