On Wed, Apr 17, 2013 at 03:49:11AM -0700, Anssi Kääriäinen wrote: > The basic idea is that there is a new ForeignObject class. A > ForeignObject basically just takes a related model, and from_fields > and to_fields which are the names of the fields used for the relation. > Then, the ORM knows how to create JOINs, subqueries etc based on a > ForeignObject. The ForeignObject itself is a virtual field, it doesn't > have any concrete fields in the DB. > > There is currently zero public API support for using the > ForeignObject. The addition of ForeignObject is there to make the work > on composite fields and multicolumn foreign keys in particular easier.
Thanks for the overview. For the sake of completeness, we had a discussion on IRC earlier today, the outcome of which is that the ForeignObject refactor will simplify the task immensely. We also came upon a few concerns which I'll describe in a moment. > I took a look at the code and I think the .Meta fields implementation > is good. (Minor point is that there is no need for the field.virtual > flag - it should be enough to check if it has a db column or not). Agreed. Actually, the way I tried to go about the implementation, was to replace field.column with field.columns which would be a tuple and in case of virtual fields this would be a property which automatically gathers the columns of all of its enclosed concrete fields, which means a field.virtual would be necessary, but it looks like this brings an inappropriate amount of complexity with little benefit. > Do you have any plans for updatable primary keys? Updatable primary > keys would be useful: the current api where obj.lastname = newval; > obj.save() will result in save of new object instead of update of the > lastname is plain ugly. OTOH This problem might be too complex to > include in GSoC - cascading the updates is going to be hard. One > possible solution is to add some way to create the foreign keys as ON > UPDATE CASCADE and let the DB solve the problem. Now I get to the issues that I still haven't figured out or would like the opinion of others on. I'll separate them into sections for better readability. Updatable primary keys ---------------------- The one you mention here is the way the ORM finds out whether to perform an INSERT or UPDATE on model.save(). The current way this works is just fine for models with an AutoField which is not visible to the user. The problem is that as soon as you use an explicit primary key and aren't extremely careful, you'll get unexpected results. This isn't directly related to composite primary keys, however, those make it more apparent. I actually raised this issue two years ago (IIRC) and I think I discussed it with Carl and possibly others; the outcome was that we'd just put a note into the docs that would warn people to be careful not to make any part of primary key user-editable in the admin and generally be cautious when handling them directly or otherwise bad things would happen. I still think it would be good to change this behavior. The downsides are that 1) it would subtly change behavior for models with explicit primary keys in a backwards-incompatible way 2) it would require models to hold something like "the last known value of the primary key in DB" 3) it would cause weird behavior in situations like if you have two separate instances corresponding to the same DB row, modify the PK in one and save. I don't really know how much of an issue this would be. 4) as Anssi said, we would need a cascading mechanism for updates. With certain backends, this could probably be done by the database itself, with others, I believe we'd have to do it manually (SQLite?)... Again, this is not entirely related to this project, but it would still be nice to get this done before composite fields are released, should we agree we want this implemented. GenericForeignKey ----------------- Two years ago we gave this some thought and decided to leave this for a later stage. I think the later stage is here. It's fairly easy to represent composite values as strings, something like the following works quite well:: ",".join(escape(value) for value in composite_value) Of course, we can just do this and stick it into the database as the object_id. The problem is with GenericRelation. This is something that works on the database level and we can't just ask the database server to compare a string built this way with a tuple of other values. Ideally, we'd need to come up with an encoding that would be possible to reproduce as a SQL expression, preferably with the same result on all DB backends. (The expression itself can be backend-dependent, the result should probably not.) As far as I know, Simone Federici had something that was close to what we need, but IIRC, it had some issues. Simone, could you please chime in and provide some more detail on this? Have you succeeded? Was it a dead end? I'll admit right away that I'm not enough of a SQL wizard and I most certainly don't know the quirks of all supported database backends well enough to come up with a working solution to this. That's why I'll need a bit of help from more experienced people. Why I think we need to concern ourselves with this? The admin keeps a log of changes. This log uses parts of the contenttypes framework. It's true it does not use GenericForeignKey directly, but it does practically the same thing. Now, I think it wouldn't make much sense to merge support for CompositeFields without them being supported by the admin. For that, we need a string representation for composite values that we won't change later -- doing that would invalidate all admin log entries for models with a composite primary key from previous Django versions. That, however, means we need to get the right representation *before* this is merged into master. Otherwise we won't be able to change it later, thus making it impossible to add support for GenericForeignKey/GenericRelation. It might be possible to still go ahead without contenttypes support and then later use a different representation for contenttypes, but that would mean we'd have to use one encoding for the admin and another one for contenttypes. Ew. Not nice IMO. Or we can just say that GenericRelation won't ever be supported, which would make it all easier right now, but I don't really like this either. Especially since there are people who have asked me about contenttypes support explicitly. If there's another possibility I haven't thought of, by all means, let me know, because right now this looks quite bad to me. Model._meta.fields and friends ------------------------------ In order to add proper support for virtual fields, I had to make certain modifications to things in _meta, mainly the way the list of fields is stored. Currently there's _meta.virtual_fields, _meta.fields and a few others. (Right now, after the ForeignObject/ForeignKey work, this is somewhat inconsistent as certain virtual field types are in _meta.virtual_fields and others are in _meta.fields.) Instead, I put everything into _meta.fields and created aliases for certain meaningful subsets, some of which are there already (like concrete_fields, local_concrete_fields, virtual_fields etc.). If I turn ForeignKey into a virtual field which adds an extra local field to store the raw id, this changes the contents of _meta.fields for every model there is containing a ForeignKey. I know the _meta object is considered private (though I seem to recall some discussions about making it public which, I hope, has not happened yet), but we all know there are lots of applications out there that use things in _meta anyway. This has the potential of breaking some of them. I don't think there's much we can do about it. We might try to keep things in _meta the way they were but personally, I don't think it's worth it -- we'd be cluttering internal APIs just to support people doing things they should know might break between versions. I just mention this here because this might have a nontrivial negative impact. Database backends ----------------- This is quite obvious, some things like sql_create_model will require updates, though these things are generally used from the base database adaptor classes and not overridden by backends. However, certain features, like backends.introspection.get_relations will require a complete rewrite. Take get_relations for example. This method is responsible for detecting ForeignKeys and other relationships between database tables. Currently the data structure it returns is designed to only support single-column relationships which obviously doesn't have enough expressive power to encode a composite relationship. The question I'd like to raise here is, what's the status of the database backend APIs? Are they private? Are they public? Are they somewhere in between? I know there are quite a few third-party backend connectors. Can I just go ahead and change the API of get_relations, or do we need some kind of backwards compatibility? Speaking of get_relations, I really look forward to parsing composite relationships from SQLite CREATE TABLE statements... d-: Those blasted __in lookups -------------------------- __in lookups are used quite often, for example, during each model.delete(), if the delete cascades. For now I do this as a long CNF expression, which, as I am told, is horribly inefficient on certain backends. Say, you wanted to do the following (which is only valid in certain SQL servers, I don't recall which ones anymore, though): SELECT a, b, c FROM tbl1, tbl2 WHERE (a, b, c) IN [(1, 2, 3), (4, 5, 6)]; The current expression would be SELECT a, b, c FROM tbl1, tbl2 WHERE (a = 1 AND b = 2 AND c = 3) OR (a = 4 AND b = 5 AND c = 6); Might something like the following be a viable alternative? Looks backend-agnostic to me as well, though it adds two levels of nested SELECTs... SELECT a, b, c FROM tbl1, tbl2 WHERE EXISTS (SELECT a1, b1, c1, FROM (SELECT 1 as a, 2 as b, 3 as c UNION SELECT 4, 5, 6) WHERE a1=1 AND b1=b AND c1=c); Of course, the specific expression to be used would be generated by the database backend, so we could still use (a, b, c) IN [(1, 2, 3), (4, 5, 6)] where available... This would be all for tonight. I hope there isn't anything else, but a quick scan through my notes doesn't yield anything. Project timeline is in progress. Michal
signature.asc
Description: Digital signature