On Wed, Apr 17, 2013 at 03:49:11AM -0700, Anssi Kääriäinen wrote:
> The basic idea is that there is a new ForeignObject class. A
> ForeignObject basically just takes a related model, and from_fields
> and to_fields which are the names of the fields used for the relation.
> Then, the ORM knows how to create JOINs, subqueries etc based on a
> ForeignObject. The ForeignObject itself is a virtual field, it doesn't
> have any concrete fields in the DB.
> 
> There is currently zero public API support for using the
> ForeignObject. The addition of ForeignObject is there to make the work
> on composite fields and multicolumn foreign keys in particular easier.

Thanks for the overview. For the sake of completeness, we had a
discussion on IRC earlier today, the outcome of which is that the
ForeignObject refactor will simplify the task immensely. We also came
upon a few concerns which I'll describe in a moment.

> I took a look at the code and I think the .Meta fields implementation
> is good. (Minor point is that there is no need for the field.virtual
> flag - it should be enough to check if it has a db column or not).

Agreed. Actually, the way I tried to go about the implementation, was
to replace field.column with field.columns which would be a tuple and
in case of virtual fields this would be a property which automatically
gathers the columns of all of its enclosed concrete fields, which
means a field.virtual would be necessary, but it looks like this
brings an inappropriate amount of complexity with little benefit.

> Do you have any plans for updatable primary keys? Updatable primary
> keys would be useful: the current api where obj.lastname = newval;
> obj.save() will result in save of new object instead of update of the
> lastname is plain ugly. OTOH This problem might be too complex to
> include in GSoC - cascading the updates is going to be hard. One
> possible solution is to add some way to create the foreign keys as ON
> UPDATE CASCADE and let the DB solve the problem.

Now I get to the issues that I still haven't figured out or would like
the opinion of others on. I'll separate them into sections for better
readability.

Updatable primary keys
----------------------

The one you mention here is the way the ORM finds out whether to
perform an INSERT or UPDATE on model.save(). The current way this
works is just fine for models with an AutoField which is not visible
to the user. The problem is that as soon as you use an explicit
primary key and aren't extremely careful, you'll get unexpected
results.

This isn't directly related to composite primary keys, however, those
make it more apparent. I actually raised this issue two years ago
(IIRC) and I think I discussed it with Carl and possibly others; the
outcome was that we'd just put a note into the docs that would warn
people to be careful not to make any part of primary key user-editable
in the admin and generally be cautious when handling them directly or
otherwise bad things would happen.

I still think it would be good to change this behavior. The downsides
are that
 1) it would subtly change behavior for models with explicit primary
    keys in a backwards-incompatible way
 2) it would require models to hold something like "the last known
    value of the primary key in DB"
 3) it would cause weird behavior in situations like if you have two
    separate instances corresponding to the same DB row, modify the PK
    in one and save. I don't really know how much of an issue this
    would be.
 4) as Anssi said, we would need a cascading mechanism for updates.
    With certain backends, this could probably be done by the database
    itself, with others, I believe we'd have to do it manually
    (SQLite?)...

Again, this is not entirely related to this project, but it would
still be nice to get this done before composite fields are released,
should we agree we want this implemented.

GenericForeignKey
-----------------

Two years ago we gave this some thought and decided to leave this for
a later stage. I think the later stage is here.

It's fairly easy to represent composite values as strings, something
like the following works quite well::

    ",".join(escape(value) for value in composite_value)

Of course, we can just do this and stick it into the database as the
object_id. The problem is with GenericRelation. This is something that
works on the database level and we can't just ask the database server
to compare a string built this way with a tuple of other values.

Ideally, we'd need to come up with an encoding that would be possible
to reproduce as a SQL expression, preferably with the same result on
all DB backends. (The expression itself can be backend-dependent, the
result should probably not.)

As far as I know, Simone Federici had something that was close to what
we need, but IIRC, it had some issues. Simone, could you please chime
in and provide some more detail on this? Have you succeeded? Was it a
dead end?

I'll admit right away that I'm not enough of a SQL wizard and I most
certainly don't know the quirks of all supported database backends
well enough to come up with a working solution to this. That's why
I'll need a bit of help from more experienced people.

Why I think we need to concern ourselves with this? The admin keeps a
log of changes. This log uses parts of the contenttypes framework.
It's true it does not use GenericForeignKey directly, but it does
practically the same thing.

Now, I think it wouldn't make much sense to merge support for
CompositeFields without them being supported by the admin. For that,
we need a string representation for composite values that we won't
change later -- doing that would invalidate all admin log entries for
models with a composite primary key from previous Django versions.
That, however, means we need to get the right representation *before*
this is merged into master. Otherwise we won't be able to change it
later, thus making it impossible to add support for
GenericForeignKey/GenericRelation.

It might be possible to still go ahead without contenttypes support
and then later use a different representation for contenttypes, but
that would mean we'd have to use one encoding for the admin and
another one for contenttypes. Ew. Not nice IMO.

Or we can just say that GenericRelation won't ever be supported, which
would make it all easier right now, but I don't really like this
either. Especially since there are people who have asked me about
contenttypes support explicitly.

If there's another possibility I haven't thought of, by all means, let
me know, because right now this looks quite bad to me.

Model._meta.fields and friends
------------------------------

In order to add proper support for virtual fields, I had to make
certain modifications to things in _meta, mainly the way the list of
fields is stored. Currently there's _meta.virtual_fields, _meta.fields
and a few others. (Right now, after the ForeignObject/ForeignKey work,
this is somewhat inconsistent as certain virtual field types are in
_meta.virtual_fields and others are in _meta.fields.)

Instead, I put everything into _meta.fields and created aliases for
certain meaningful subsets, some of which are there already (like
concrete_fields, local_concrete_fields, virtual_fields etc.).

If I turn ForeignKey into a virtual field which adds an extra local
field to store the raw id, this changes the contents of _meta.fields
for every model there is containing a ForeignKey. I know the _meta
object is considered private (though I seem to recall some discussions
about making it public which, I hope, has not happened yet), but we
all know there are lots of applications out there that use things in
_meta anyway. This has the potential of breaking some of them.

I don't think there's much we can do about it. We might try to keep
things in _meta the way they were but personally, I don't think it's
worth it -- we'd be cluttering internal APIs just to support people
doing things they should know might break between versions. I just
mention this here because this might have a nontrivial negative
impact.

Database backends
-----------------

This is quite obvious, some things like sql_create_model will require
updates, though these things are generally used from the base database
adaptor classes and not overridden by backends. However, certain
features, like backends.introspection.get_relations will require a
complete rewrite.

Take get_relations for example. This method is responsible for
detecting ForeignKeys and other relationships between database tables.
Currently the data structure it returns is designed to only support
single-column relationships which obviously doesn't have enough
expressive power to encode a composite relationship.

The question I'd like to raise here is, what's the status of the
database backend APIs? Are they private? Are they public? Are they
somewhere in between? I know there are quite a few third-party backend
connectors. Can I just go ahead and change the API of get_relations,
or do we need some kind of backwards compatibility?

Speaking of get_relations, I really look forward to parsing composite
relationships from SQLite CREATE TABLE statements... d-:

Those blasted __in lookups
--------------------------

__in lookups are used quite often, for example, during each
model.delete(), if the delete cascades. For now I do this as a long
CNF expression, which, as I am told, is horribly inefficient on
certain backends.

Say, you wanted to do the following (which is only valid in certain
SQL servers, I don't recall which ones anymore, though):

SELECT a, b, c FROM tbl1, tbl2
WHERE (a, b, c) IN [(1, 2, 3), (4, 5, 6)];

The current expression would be

SELECT a, b, c FROM tbl1, tbl2
WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 5 AND c = 6);

Might something like the following be a viable alternative? Looks
backend-agnostic to me as well, though it adds two levels of nested
SELECTs...

SELECT a, b, c FROM tbl1, tbl2
WHERE EXISTS (SELECT a1, b1, c1, FROM (SELECT 1 as a, 2 as b, 3 as c
                                       UNION SELECT 4, 5, 6)
              WHERE a1=1 AND b1=b AND c1=c);

Of course, the specific expression to be used would be generated by
the database backend, so we could still use
(a, b, c) IN [(1, 2, 3), (4, 5, 6)]
where available...


This would be all for tonight. I hope there isn't anything else, but a
quick scan through my notes doesn't yield anything.

Project timeline is in progress.

Michal

Attachment: signature.asc
Description: Digital signature

Reply via email to