Re: Weekly check-in (this should be #5, right...?)
On Jul 6, 1:11 pm, Michal Petrucha wrote: > Hmm, this is exactly what I had in mind when thinking about this > problem. I see that even despite keeping the changes to a minimum, the > patch looks quite big. I'll definitely review this once I start > working on relationship fields. No wonder the patch is quite big. I accidentally branched from the conditional aggregation branch, so it has all the things in that patch included. And that patch is much larger than the multicolumn_join patch. I pushed a new branch to github (https://github.com/akaariai/django/ tree/multicolumn_join), this time the patch is much smaller: 4 files changed, 77 insertions(+), 63 deletions(-) I will destroy the composite_join branch. I didn't like the composite_join name anyways, multicolumn_join is much better name... :) - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Weekly check-in (this should be #5, right...?)
On Tue, Jul 05, 2011 at 03:39:53PM -0700, akaariai wrote: > I did a glance-over of your github branch. I was especially looking > for how you will handle LEFT OUTER JOINS involving composite primary > keys / foreign keys. If I am not missing something, I think this > haven't been done yet. I have myself been thinking about this issue, > and I thought it would be good to share what I have found out. This is something I'd like to leave for the second half of the program. For now I'm focusing on making CompositeFields work in the model they've been added to, relationship fields and all this stuff is scheduled for the second half of July. (-: > The problematic part for multi-column join conditions is in > django.db.models.sql.query: > def join(self, connector, ...): > """ > Returns an alias for the join in 'connection', either reusing an > existing alias for that join or creating a new one. 'connection' > is a > tuple (lhs, table, lhs_col, col) where 'lhs' is either an existing > table alias or a table name. The join correspods to the SQL > equivalent > of:: > > lhs.lhs_col = table.col > """ > > Obviously this can not work for creating multi-column joins. > > The connection information is stored in alias_map, join_map and > rev_join_map. In particular, in alias_map is stored (table, alias, > join_type, lhs, lhs_col, col, nullable). Currently the contents of the > alias_map is turned into SQL (sql/compliler.py, get_from_clause()) as: > > result.append('%s %s%s ON (%s.%s = %s.%s)' > % (join_type, qn(name), alias_str, qn(lhs), > qn2(lhs_col), qn(alias), qn2(col))) > > The most simple way to extend this to contain more columns would > probably be the following: > - connection is defined as (lhs, table, lhs_col1, col1, lhs_col2, > col2, ...) > - alias_map format needs to change a bit so that the extra columns > can be stored in there. One could store the extra column after the > nullable. Cleaner would be to have the columns in one tuple: (table, > alias, join_type, lhs, (cols), nullable) > - Limited amount of places needs to be fixed, most notably the > get_from_clause() of compiler.py > > The downside of the above is that it does not support any other join > conditions than ones involving 2 tables and a list of anded columns. > For composite fields this is enough. Yeah, I had something like this planned. To be precise, the exact same thing you wrote in the second e-mail. > For future usage it would be nice if one could pass in Where nodes as > the connection. This would allow for arbitrary join conditions. The > Where node knows how to turn itself into SQL, how to relabel aliases > and so on. This approach has some problems, however: > - How to generate the Where node? > - How to match existing joins to new joins? Currently this is done > by checking that the connection four-tuple is equivalent to the > existing join's four tuple. I don't think Where nodes know how to > check equivalence to another node. And even if where nodes knew how to > do that, also all the leaf nodes would need to know how to do that. > - Performance issues, cloning a Where node is more expensive than > cloning a tuple. Also construction, equivalence checking and other > operations too are somewhat more expensive than using tuples. > - Overkill for composite fields > > Of course, the approaches could be combined, that is you pass in the > join condition as a tuple, and you can pass extra_filters (default > None) as a Where node. This would keep the normal case efficient but > allow for more complex join conditions if really needed. The join > having extra_filters could not be reused, except when explicitly > stated. I don't know, somehow I'm not entirely convinced this would be really useful. Yes, I realize sometimes people need more complex joins etc, but that's what raw queries are for. Besides, I can't really imagine how a user might use this feature -- I don't see how a QuerySet filter could be used to supply these extra conditions, only if we added some hooks things like custom fields could use or something like that. But still, this would add a ton of complexity into the code for little gain, IMHO. Anyway, should there be consensus that we want something like this, I'll gladly look into it sometime later. But for now, to keep things realistic as far as my GSoC project is concerned, I'll stick to the easier solution. > Having said all this, for this project "extend the connection tuple" > approach seems to be the only sane choice. > > The work you have done looks very promising. I hope this post has been > at least somewhat useful to you. Thanks, I really appreciate it. On Tue, Jul 05, 2011 at 06:15:50PM -0700, akaariai wrote: > I implemented this in my github branch > https://github.com/akaariai/django/tree/composite_join > > With this you can do: > a = Article.objects.all() > a.query.get_initial_alias() > a.query.join(('basic_article', 'foo', ('pk_part1', 'p
Re: Weekly check-in (this should be #5, right...?)
On Jul 6, 1:39 am, akaariai wrote: > Having said all this, for this project "extend the connection tuple" > approach seems to be the only sane choice. I implemented this in my github branch https://github.com/akaariai/django/tree/composite_join With this you can do: a = Article.objects.all() a.query.get_initial_alias() a.query.join(('basic_article', 'foo', ('pk_part1', 'pk_part2'), ('foo_pk_part1', 'foo_pk_part2')), promote=True) print a.query SELECT "basic_article"."id", "basic_article"."headline", "basic_article"."pub_date" FROM "basic_article" LEFT OUTER JOIN "foo" ON ("basic_article"."pk_part1" = "foo"."foo_pk_part1" AND "basic_article"."pk_part2" = "foo"."foo_pk_part2") The connection parameter is now a tuple (lhs, table, (lhs_col1, lhs_col2, ...), (col1, col2, ...)). This seemed to be the way of least pain. All current test are passed on sqlite3. There will probably be problems when more complex queries are tried with multi-column join conditions. I hope this gives at least an idea how to approach the multi-column outer joins problem. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Weekly check-in (this should be #5, right...?)
On Jun 27, 4:22 am, Michal Petrucha wrote: > some visible progress on my project at long last. I spent most of the > last week digging deep inside the ORM's entrails to make composite > field lookups possible and finally it looks promising. > > While working on this I found out the extra_filters approach I > intended to use was a dead end (which reminded me of what Russ wrote > in response to my proposal: "I'm almost completely certain you'll > find some gremlin lurking underneath some dark corner of the code"). I did a glance-over of your github branch. I was especially looking for how you will handle LEFT OUTER JOINS involving composite primary keys / foreign keys. If I am not missing something, I think this haven't been done yet. I have myself been thinking about this issue, and I thought it would be good to share what I have found out. The problematic part for multi-column join conditions is in django.db.models.sql.query: def join(self, connector, ...): """ Returns an alias for the join in 'connection', either reusing an existing alias for that join or creating a new one. 'connection' is a tuple (lhs, table, lhs_col, col) where 'lhs' is either an existing table alias or a table name. The join correspods to the SQL equivalent of:: lhs.lhs_col = table.col """ Obviously this can not work for creating multi-column joins. The connection information is stored in alias_map, join_map and rev_join_map. In particular, in alias_map is stored (table, alias, join_type, lhs, lhs_col, col, nullable). Currently the contents of the alias_map is turned into SQL (sql/compliler.py, get_from_clause()) as: result.append('%s %s%s ON (%s.%s = %s.%s)' % (join_type, qn(name), alias_str, qn(lhs), qn2(lhs_col), qn(alias), qn2(col))) The most simple way to extend this to contain more columns would probably be the following: - connection is defined as (lhs, table, lhs_col1, col1, lhs_col2, col2, ...) - alias_map format needs to change a bit so that the extra columns can be stored in there. One could store the extra column after the nullable. Cleaner would be to have the columns in one tuple: (table, alias, join_type, lhs, (cols), nullable) - Limited amount of places needs to be fixed, most notably the get_from_clause() of compiler.py The downside of the above is that it does not support any other join conditions than ones involving 2 tables and a list of anded columns. For composite fields this is enough. For future usage it would be nice if one could pass in Where nodes as the connection. This would allow for arbitrary join conditions. The Where node knows how to turn itself into SQL, how to relabel aliases and so on. This approach has some problems, however: - How to generate the Where node? - How to match existing joins to new joins? Currently this is done by checking that the connection four-tuple is equivalent to the existing join's four tuple. I don't think Where nodes know how to check equivalence to another node. And even if where nodes knew how to do that, also all the leaf nodes would need to know how to do that. - Performance issues, cloning a Where node is more expensive than cloning a tuple. Also construction, equivalence checking and other operations too are somewhat more expensive than using tuples. - Overkill for composite fields Of course, the approaches could be combined, that is you pass in the join condition as a tuple, and you can pass extra_filters (default None) as a Where node. This would keep the normal case efficient but allow for more complex join conditions if really needed. The join having extra_filters could not be reused, except when explicitly stated. Having said all this, for this project "extend the connection tuple" approach seems to be the only sane choice. The work you have done looks very promising. I hope this post has been at least somewhat useful to you. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: [GSoC composite fields] Weekly check-in (this should be #5, right...?)
Hi, some visible progress on my project at long last. I spent most of the last week digging deep inside the ORM's entrails to make composite field lookups possible and finally it looks promising. While working on this I found out the extra_filters approach I intended to use was a dead end (which reminded me of what Russ wrote in response to my proposal: "I'm almost completely certain you'll find some gremlin lurking underneath some dark corner of the code"). In the end I had to make virtual fields part of the field list in model options which required updating code in quite a few places to reflect the change. Most of it should be all right by now, the test suite runs well except for four GFK and admin related tests which I'll look at tomorrow. Setting a CompositeField's primary_key option still isn't supported, however, most of it should already be prepared. Watch my progress at GitHub [1]. Michal [1] https://github.com/konk/django signature.asc Description: Digital signature