Re: GSoC Meta refactor: Bikeshedding time!!

Russell Keith-Magee Sun, 17 Aug 2014 21:45:31 -0700

Hi Anssi,

On Sun, Aug 17, 2014 at 3:06 AM, Anssi Kääriäinen <anssi.kaariai...@thl.fi>
wrote:

> On Saturday, August 16, 2014 4:38:30 AM UTC+3, Russell Keith-Magee wrote:
>>
>> b) "Relating" data fields - This means ForeignKey. Fields that manifest
>> as a single column, but represent a relation to another model.
>>
>
> This definition will not work when multicolumn foreign keys are
> introduced. Especially not with the name foreign_key_fields. This would
> either mean that relating data fields do contain fields that have more than
> single backing column, or that foreign_key_fields do not contain all
> foreign key fields.
>
> Michal Petrucha's work on virtual fields aims to make ForeignKeys virtual
> fields - they have one or more backing pure data fields, and then the
> relation is handled by a virtual fields. The work done by him shows that
> this way works well. The patch was actually close to committable already
> during 1.7 development, but as it didn't play well with migrations we had
> to defer it. The point here is that I expect that we will want to make
> ForeignKeys virtual fields soonish. This doesn't play well with the
> categorization.
>

Interesting.

> d) "Relating" external fields - This means ManyToMany fields. Fields that
>> are manifested as an external table, but represent a relation to a
>> different model.
>>
>
> Should we define this category as m2m fields? Calling it
> many_to_many_fields, but defining it as including all external storage
> fields seems a bit problematic.
>

That's exactly what I proposed in my formal naming scheme. I was being
deliberately abstract in the descriptions, but in practice, I agree (d) is
"many_to_many_fields", and the API should say this (unless someone can
think of a good reason why it shouldn't) - like a "relating external field"
that isn't an m2m relation.

> So - firstly, we need a sanity check. Does this taxonomy capture all field
>> types that you can think of? Are there any interpretations of composite
>> fields, or any other esoteric field type (existing or imagined) that don't
>> fit in this taxonomy?
>>
>
> It seems the proposed API has fields in one, and only one category. Maybe
> it would be better to have categorization where fields fall in to multiple
> categories? The categories could be data, virtual, relation,
> reverse_relation and m2m.
>
> For example m2m field would be virtual, related or reverse_related and of
> course m2m. In the future a foreign key would create a backing data field.
> The foreign key itself would be virtual relation field. The reverse side of
> the foreign key would be virtual and reverse_related. GenericForeignKey
> would also be a virtual related field (with two backing data fields).
>

I understand what you're driving at here, and I've had similar thoughts
over the course of the SoC. The catch is that this makes the API for
get_fields() fairly complicated.

If every field fits into one specific type, then get_fields() just requires
a single boolean flag (do I include fields of type X) for each field type.
We can also easily add new field types by adding new booleans to the API.

However, if a field fits into multiple categories, then it's impossible
(or, at least, exceedingly complicated) to make a single call to
get_fields() that will specify all your field requirements. "Get me all
non-virtual data fields" requires "virtual=False, data=True, m2m=False",
but "Get all virtual data fields that represent m2ms" requires
"virtual=True, data=False, m2m=True". You can't pass in both sets of
arguments at the same time, so you either have to make multiple calls to
get_fields(), or you have to invent some sort of query syntax for
get_fields() that allows union queries.

Plus, at the end of the day, get_fields() is abstracted behind highly
cached and optimised properties for key lookups. These properties are
effectively a cached call to get_fields() with a specific set of arguments
- so even if get_fields() doesn't expose a "one category per field"
requirement, the API will require, at some level, names that have clear
(and preferably non-overlapping) membership.

I don't see how a TranslationField would fit into the above categorization.
> A TranslationField is defined as a field that gets a single translation
> from related translations table. So, it is the reverse side of a foreign
> key with an additional restriction on language (in effect generating a join
> condition JOIN article_translations ON article.id =
> article_translations.article_id AND article_translations.language = 'fi').
> At least as defined this isn't in category g as it doesn't return all
> reverse objects of category b. It doesn't fit in to any other category
> either. So, we need some changes to the wording.
>
> As another example we might someday want to allow fully custom join
> condition fields. These fields wouldn't be foreign key, external data nor
> many to many fields nor the reverse of those categories
>
> Comments welcome. Obviously, this has enormous potential to devolve into
>> bike shedding, so I'd appreciate it if people kept that in mind. If you
>> have a preference for something like short vs long form names, feel free to
>> state it, but please don't let this devolve into arguments over the
>> relative merits of pith over verbosity in API naming. It's much more
>> important that we clarify the matters of substance - i.e., that we have a
>> complete and correct taxonomy - not that we fixate on the names themselves.
>>
>
> I don't think users actually want to get fields based on the suggested
> categorization. I feel we get an easier to use and more flexible API if we
> have higher level categories and allow fields to match multiple categories.
> As a practical example if I want all relation fields, that is going to be
> hard using the suggested API. Getting all relation fields is a more
> realistic use case than getting related virtual objects.
>

Quite probably true. As a point of interest, the current (as in, 1.6) API
actually doesn't differentiate between category (a) "pure data" and
category (b) "relating data (i.e., FK)" fields - if you ask for "data
fields" you get pure data *and* foreign keys. So, at least as far as
Django's own usage is concerned, you're correct in saying that taxonomy
I've described isn't fully required.

Daniel's survey of internal usage reveals that there are three use cases
for getting a list of fields in Django's internal API:

 * Get all data and m2m fields (i.e., categories  a, b, and d). This is
effectively "all fields on *this* model"

 * Get all data, m2m, related objects, related m2m, and virtual fields
(i.e., categories a, b, d, f, g, h, i - excluding c and e because Django
doesn't currently have any fields of this type). This is "all fields on
this model, or related to this model"

 * Get all m2m fields (i.e., category d)

So - at the very least, we need names to describe those three groups. My
intention with describing a richer taxonomy is to try and give names to
other groupings of interest.

If we want to have all fields to match single and only single category,
> then we need to redefine the categories to make sure ForeignKeys as virtual
> fields are possible, and that more esoteric custom join based fields fit in
> to the categorization.
>

Agreed - that's why I threw this out there for discussion :-)

Properties like "data", "virtual", "external", "related", "relating" -
these are high level concepts describing the way a field manifests.
However, that doesn't mean we need to expose these properties as part of
the formal API.

Part of the underlying problem here -- lets say we roll out Django 1.7 with
some version of this API, and in 1.8, foreign key fields change to become
virtual. That effectively becomes backwards incompatible for queries that
are sensitive to a "virtual" flag; but it doesn't change the underlying
need to identify that a field is a foreign key. We need to capture the
latter use case, but not necessarily the former.

> BTW where are the github discussions located? I didn't spot them from the
> referenced PR 2894.
>

The discussions on github aren't the best record of the discussion that
have been had.  They're mostly tied to earlier versions of the patch, and
an earlier pull request (the number of which I can't seem to find right
now). Unfortunately, most of the productive discussions in this area have
been on IRC or voice chat, so there isn't a good archive.

Russ %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAJxq848v3Dss9BSRobCgy_zfC93bV2_kj1iJ%3DOxv1BsCk24NyA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: GSoC Meta refactor: Bikeshedding time!!

Reply via email to