Re: GSoC Meta refactor: Bikeshedding time!!

Marc Tamlyn Wed, 20 Aug 2014 01:47:19 -0700

I'd say ArrayField is a straight up data field at the moment. It stores 0-1
lists of data. It's no different to CommaSeparatedIntegerField (seriously,
why does that exists...)


*If* PG gets the relevant update that will allow `integer[] references`
(i.e. ArrayField(ForeignKey)) then this would be different, and would be
more like a m2m field.

There is an argument that it's 0-N anyway, but in the implementation both
within Django and in the database I don't think the distinction is useful
at the point, from an ORM point of view in any case. For a forms point of
view it's quite different.


On 20 August 2014 09:19, Russell Keith-Magee <russ...@keith-magee.com>
wrote:

>
> On Mon, Aug 18, 2014 at 6:03 PM, Anssi Kääriäinen <anssi.kaariai...@thl.fi
> > wrote:
>
>> On Monday, August 18, 2014 7:45:17 AM UTC+3, Russell Keith-Magee wrote:
>>>
>>> I understand what you're driving at here, and I've had similar thoughts
>>> over the course of the SoC. The catch is that this makes the API for
>>> get_fields() fairly complicated.
>>>
>>> If every field fits into one specific type, then get_fields() just
>>> requires a single boolean flag (do I include fields of type X) for each
>>> field type. We can also easily add new field types by adding new booleans
>>> to the API.
>>>
>>> However, if a field fits into multiple categories, then it's impossible
>>> (or, at least, exceedingly complicated) to make a single call to
>>> get_fields() that will specify all your field requirements. "Get me all
>>> non-virtual data fields" requires "virtual=False, data=True, m2m=False",
>>> but "Get all virtual data fields that represent m2ms" requires
>>> "virtual=True, data=False, m2m=True". You can't pass in both sets of
>>> arguments at the same time, so you either have to make multiple calls to
>>> get_fields(), or you have to invent some sort of query syntax for
>>> get_fields() that allows union queries.
>>>
>>> Plus, at the end of the day, get_fields() is abstracted behind highly
>>> cached and optimised properties for key lookups. These properties are
>>> effectively a cached call to get_fields() with a specific set of arguments
>>> - so even if get_fields() doesn't expose a "one category per field"
>>> requirement, the API will require, at some level, names that have clear
>>> (and preferably non-overlapping) membership.
>>>
>>
>> If fields are in multiple categories then users will want to do the full
>> range of set operation on the categories. Encoding that in to the API
>> doesn't sound promising.
>>
>>
>> I don't think users actually want to get fields based on the suggested
>>>> categorization. I feel we get an easier to use and more flexible API if we
>>>> have higher level categories and allow fields to match multiple categories.
>>>> As a practical example if I want all relation fields, that is going to be
>>>> hard using the suggested API. Getting all relation fields is a more
>>>> realistic use case than getting related virtual objects.
>>>>
>>>
>>> Quite probably true. As a point of interest, the current (as in, 1.6)
>>> API actually doesn't differentiate between category (a) "pure data" and
>>> category (b) "relating data (i.e., FK)" fields - if you ask for "data
>>> fields" you get pure data *and* foreign keys. So, at least as far as
>>> Django's own usage is concerned, you're correct in saying that taxonomy
>>> I've described isn't fully required.
>>>
>>> Daniel's survey of internal usage reveals that there are three use cases
>>> for getting a list of fields in Django's internal API:
>>>
>>>  * Get all data and m2m fields (i.e., categories  a, b, and d). This is
>>> effectively "all fields on *this* model"
>>>
>>>  * Get all data, m2m, related objects, related m2m, and virtual fields
>>> (i.e., categories a, b, d, f, g, h, i - excluding c and e because Django
>>> doesn't currently have any fields of this type). This is "all fields on
>>> this model, or related to this model"
>>>
>>>  * Get all m2m fields (i.e., category d)
>>>
>>> So - at the very least, we need names to describe those three groups. My
>>> intention with describing a richer taxonomy is to try and give names to
>>> other groupings of interest.
>>>
>>> If we want to have all fields to match single and only single category,
>>>> then we need to redefine the categories to make sure ForeignKeys as virtual
>>>> fields are possible, and that more esoteric custom join based fields fit in
>>>> to the categorization.
>>>>
>>>
>>> Agreed - that's why I threw this out there for discussion :-)
>>>
>>> Properties like "data", "virtual", "external", "related", "relating" -
>>> these are high level concepts describing the way a field manifests.
>>> However, that doesn't mean we need to expose these properties as part of
>>> the formal API.
>>>
>>> Part of the underlying problem here -- lets say we roll out Django 1.7
>>> with some version of this API, and in 1.8, foreign key fields change to
>>> become virtual. That effectively becomes backwards incompatible for queries
>>> that are sensitive to a "virtual" flag; but it doesn't change the
>>> underlying need to identify that a field is a foreign key. We need to
>>> capture the latter use case, but not necessarily the former.
>>>
>>
>> Could we go with a minimal API for get_fields()? Instead of having
>> categorization on the get_fields() API, we could provide field flags for
>> the categories. With field flags it is straightforward to filter the return
>> list of get_fields(). As an example, fetching those fields which are
>> relations but which aren't virtual: [f for f in get_fields() if
>> f.relational and not f.virtual]. If this path is taken, then I am not sure
>> how minimal the get_fields() API should be. We likely need flags for at
>> least if the field is defined on local, parent or some remote model.
>>
>> As for changing ForeignKey to virtual field plus concrete field
>> representation - I just realized this will be backwards incompatible no
>> matter what we do regarding categorization. An all-fields including
>> get_fields() call will return separate author (virtual) and author_id
>> (concrete) fields after the split. I am not sure what we can do about this.
>> It would be very unfortunate if we can't refactor the way ForeignKeys work
>> due to the meta API. Any ideas how we can avoid the backwards compatibility
>> trap?
>>
>
> I think Daniel and I might have come up with a way to meet both these
> requirements - a minimalist API for get_fields, with at least some
> protection against the known incoming backwards compatibility issue.
>
> The summary so far: it appears that a complex taxonomy isn't especially
> helpful - firstly, because any complex taxonomy is going to have edge cases
> that are hard to categorize, but also because a complex taxonomy leads to a
> much more complex internal API that is going to be prone to backwards
> compatibility problems.
>
> So - instead of worrying about 'virtual' and other properties like that,
> lets look at why the _meta API is fundamentally used - to get a list of
> fields that need to be handled in data processing. This primarily means
> forms, but other forms of serialisation are also included. In these use
> cases, there are always going to be per-field differences (even a CharField
> and an IntegerField require *slightly* different handling), so we won't
> focus on internal representations, storage mechanisms, or anything like
> that. Instead, lets focus on cardinality - a field represents some sort of
> data that has a cardinality with the object on which it is stored. If
> something has cardinality 1, you can display a single field. If it's
> cardinality N, you need to display a list, or some sort of inline.
>
> This results in 3 categories that are mutually exclusive:
>
> a) "Data fields": Fields of cardinality 0-1:
>
>  * A CharField stores 0 or 1 strings (0 is the case of a nullable field).
>
>  * An IntegerField stores 0 or 1 integers.
>
>  * A FileField stores 0 or 1 file paths.
>
>  * An ImageField stores 0 or 1 file paths - although in being modified, it
> might modify some other fields.
>
>  * A ForeignKey stores 0 or 1 references to another object.
>
>  * A GenericForeignKey stores 0 or 1 references to another object.
>
>  * A notional "DocumentField" on a NoSQL store references 0 or 1 external
> documents.
>
> b) "ManyToMany Fields": Fields that are locally defined that represent a
> cardinality 0-N relationship with another object:
>
>  * Many to Many fields store 0-N references to a second model.
>
> c) "Related Objects": Fields that represent a cardinality 0-N relationship
> with this object, but aren't locally defined:
>
>  * The 'related' side of a ForeignKey
>
>  * The 'related' side of a ManyToMany
>
>  * A GenericRelation representing the reverse side of a GenericForeignKey
>
> These three types are mutually exclusive - you either have cardinality 1
> *or* cardinality N, not both; and you're either locally defined on this
> object or you're not. I can't think of an example of "cardinality 1 data
> that isn't defined on this object", but it would fit into this taxonomy if
> it were needed; I also can't think of a field definition that would span
> models.
>
> In addition to this basic classification, a field can be marked as
> "hidden". The immediate use for this is to hide the related_name='+' case
> of a FK or M2M. Looking forward, it would be used to mask fields that
> exist, but aren't intended to be user visible - for example, in the
> potential future case where a ForeignKey is split in two, or a Composite
> Key, there would be a "hidden" integer field (or fields) storing the actual
> data, and a virtual (but non-hidden) field that is the public API for
> manipulating the relationship. This would also be backwards compatible,
> because the "visible" field list hasn't changed.
>
> Fields are also tracked according to their parentage; this is used by
> tools interacting with inheritance relationships to know which fields are
> actually on this model, and which are inherited from a base class.
>
> This yields the following formal API for _meta:
>
>  * get_fields(data, many_to_many, related, include_hidden, include_parents)
>
>  * @property data_fields (=> get_fields(data=True, many_to_many=False,
> related=False, include_hidden=False, include_parents=True)
>
>  * @property many_to_many_fields (=> get_fields(data=False,
> many_to_many=True, related=False, include_hidden=False,
> include_parents=True)
>
>  * @property related_objects (=> get_fields(data=False,
> many_to_many=False, related=True, include_hidden=False,
> include_parents=True)
>
> Does this sound any more sane as an API?
>
> My one lingering question is whether the "many_to_many" name/category is
> too explicit. I can conceive how an ArrayField could be considered a data
> field (it stores 0-1 arrays of data), or a "many_to_many" field (because it
> stores 0-N instances of some data). This all hinges on whether the
> definition for that field category is that it is a relationship with
> another *model*, or if it's just cardinality N data. It's trivial to call
> it a Data field and just leave it at that, but I'm wondering if there might
> be benefit in broadening the definition of "many_to_many".
>
> Russ %-)
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMwjO1HLabZ7C%3D87Y3F50PWUYDncH1ip_VgtQN-cPOXthk8yHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: GSoC Meta refactor: Bikeshedding time!!

Reply via email to