Re: Weekly check-in (this should be #5, right...?)

2011-07-07 Thread akaariai
On Jul 6, 1:11 pm, Michal Petrucha  wrote:

> Hmm, this is exactly what I had in mind when thinking about this
> problem. I see that even despite keeping the changes to a minimum, the
> patch looks quite big. I'll definitely review this once I start
> working on relationship fields.

No wonder the patch is quite big. I accidentally branched from the
conditional aggregation branch, so it has all the things in that patch
included. And that patch is much larger than the multicolumn_join
patch.

I pushed a new branch to github (https://github.com/akaariai/django/
tree/multicolumn_join), this time the patch is much smaller:  4 files
changed, 77 insertions(+), 63 deletions(-)

I will destroy the composite_join branch. I didn't like the
composite_join name anyways, multicolumn_join is much better
name... :)

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Weekly check-in (this should be #5, right...?)

2011-07-06 Thread Michal Petrucha
On Tue, Jul 05, 2011 at 03:39:53PM -0700, akaariai wrote:
> I did a glance-over of your github branch. I was especially looking
> for how you will handle LEFT OUTER JOINS involving composite primary
> keys / foreign keys. If I am not missing something, I think this
> haven't been done yet. I have myself been thinking about this issue,
> and I thought it would be good to share what I have found out.

This is something I'd like to leave for the second half of the
program. For now I'm focusing on making CompositeFields work in the
model they've been added to, relationship fields and all this stuff is
scheduled for the second half of July. (-:
 
> The problematic part for multi-column join conditions is in
> django.db.models.sql.query:
> def join(self, connector, ...):
> """
> Returns an alias for the join in 'connection', either reusing an
> existing alias for that join or creating a new one. 'connection'
> is a
> tuple (lhs, table, lhs_col, col) where 'lhs' is either an existing
> table alias or a table name. The join correspods to the SQL
> equivalent
> of::
> 
> lhs.lhs_col = table.col
> """
> 
> Obviously this can not work for creating multi-column joins.
> 
> The connection information is stored in alias_map, join_map and
> rev_join_map. In particular, in alias_map is stored (table, alias,
> join_type, lhs, lhs_col, col, nullable). Currently the contents of the
> alias_map is turned into SQL (sql/compliler.py, get_from_clause()) as:
> 
> result.append('%s %s%s ON (%s.%s = %s.%s)'
> % (join_type, qn(name), alias_str, qn(lhs),
>  qn2(lhs_col), qn(alias), qn2(col)))
> 
> The most simple way to extend this to contain more columns would
> probably be the following:
>  - connection is defined as (lhs, table, lhs_col1, col1, lhs_col2,
> col2, ...)
>  - alias_map format needs to change a bit so that the extra columns
> can be stored in there. One could store the extra column after the
> nullable. Cleaner would be to have the columns in one tuple: (table,
> alias, join_type, lhs, (cols), nullable)
>  - Limited amount of places needs to be fixed, most notably the
> get_from_clause() of compiler.py
> 
> The downside of the above is that it does not support any other join
> conditions than ones involving 2 tables and a list of anded columns.
> For composite fields this is enough.

Yeah, I had something like this planned. To be precise, the exact
same thing you wrote in the second e-mail.

> For future usage it would be nice if one could pass in Where nodes as
> the connection. This would allow for arbitrary join conditions. The
> Where node knows how to turn itself into SQL, how to relabel aliases
> and so on. This approach has some problems, however:
>   - How to generate the Where node?
>   - How to match existing joins to new joins? Currently this is done
> by checking that the connection four-tuple is equivalent to the
> existing join's four tuple. I don't think Where nodes know how to
> check equivalence to another node. And even if where nodes knew how to
> do that, also all the leaf nodes would need to know how to do that.
>   - Performance issues, cloning a Where node is more expensive than
> cloning a tuple. Also construction, equivalence checking and other
> operations too are somewhat more expensive than using tuples.
>   - Overkill for composite fields
> 
> Of course, the approaches could be combined, that is you pass in the
> join condition as a tuple, and you can pass extra_filters (default
> None) as a Where node. This would keep the normal case efficient but
> allow for more complex join conditions if really needed. The join
> having extra_filters could not be reused, except when explicitly
> stated.

I don't know, somehow I'm not entirely convinced this would be really
useful. Yes, I realize sometimes people need more complex joins etc,
but that's what raw queries are for. Besides, I can't really imagine
how a user might use this feature -- I don't see how a QuerySet filter
could be used to supply these extra conditions, only if we added some
hooks things like custom fields could use or something like that. But
still, this would add a ton of complexity into the code for little
gain, IMHO.

Anyway, should there be consensus that we want something like this,
I'll gladly look into it sometime later. But for now, to keep things
realistic as far as my GSoC project is concerned, I'll stick to the
easier solution.

> Having said all this, for this project "extend the connection tuple"
> approach seems to be the only sane choice.
> 
> The work you have done looks very promising. I hope this post has been
> at least somewhat useful to you.

Thanks, I really appreciate it.

On Tue, Jul 05, 2011 at 06:15:50PM -0700, akaariai wrote:
> I implemented this in my github branch 
> https://github.com/akaariai/django/tree/composite_join
> 
> With this you can do:
> a = Article.objects.all()
> a.query.get_initial_alias()
> a.query.join(('basic_article', 'foo', ('pk_part1', 'p

Re: Weekly check-in (this should be #5, right...?)

2011-07-05 Thread akaariai
On Jul 6, 1:39 am, akaariai  wrote:
> Having said all this, for this project "extend the connection tuple"
> approach seems to be the only sane choice.

I implemented this in my github branch 
https://github.com/akaariai/django/tree/composite_join

With this you can do:
a = Article.objects.all()
a.query.get_initial_alias()
a.query.join(('basic_article', 'foo', ('pk_part1', 'pk_part2'),
('foo_pk_part1', 'foo_pk_part2')), promote=True)
print a.query
SELECT "basic_article"."id", "basic_article"."headline",
"basic_article"."pub_date" FROM "basic_article" LEFT OUTER JOIN "foo"
ON ("basic_article"."pk_part1" = "foo"."foo_pk_part1" AND
"basic_article"."pk_part2" = "foo"."foo_pk_part2")

The connection parameter is now a tuple (lhs, table, (lhs_col1,
lhs_col2, ...), (col1, col2, ...)). This seemed to be the way of least
pain.

All current test are passed on sqlite3. There will probably be
problems when more complex queries are tried with multi-column join
conditions. I hope this gives at least an idea how to approach the
multi-column outer joins problem.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Weekly check-in (this should be #5, right...?)

2011-07-05 Thread akaariai
On Jun 27, 4:22 am, Michal Petrucha  wrote:
> some visible progress on my project at long last. I spent most of the
> last week digging deep inside the ORM's entrails to make composite
> field lookups possible and finally it looks promising.
>
> While working on this I found out the extra_filters approach I
> intended to use was a dead end (which reminded me of what Russ wrote
> in response to my proposal: "I'm almost completely certain you'll
> find some gremlin lurking underneath some dark corner of the code").

I did a glance-over of your github branch. I was especially looking
for how you will handle LEFT OUTER JOINS involving composite primary
keys / foreign keys. If I am not missing something, I think this
haven't been done yet. I have myself been thinking about this issue,
and I thought it would be good to share what I have found out.

The problematic part for multi-column join conditions is in
django.db.models.sql.query:
def join(self, connector, ...):
"""
Returns an alias for the join in 'connection', either reusing an
existing alias for that join or creating a new one. 'connection'
is a
tuple (lhs, table, lhs_col, col) where 'lhs' is either an existing
table alias or a table name. The join correspods to the SQL
equivalent
of::

lhs.lhs_col = table.col
"""

Obviously this can not work for creating multi-column joins.

The connection information is stored in alias_map, join_map and
rev_join_map. In particular, in alias_map is stored (table, alias,
join_type, lhs, lhs_col, col, nullable). Currently the contents of the
alias_map is turned into SQL (sql/compliler.py, get_from_clause()) as:

result.append('%s %s%s ON (%s.%s = %s.%s)'
% (join_type, qn(name), alias_str, qn(lhs),
 qn2(lhs_col), qn(alias), qn2(col)))

The most simple way to extend this to contain more columns would
probably be the following:
 - connection is defined as (lhs, table, lhs_col1, col1, lhs_col2,
col2, ...)
 - alias_map format needs to change a bit so that the extra columns
can be stored in there. One could store the extra column after the
nullable. Cleaner would be to have the columns in one tuple: (table,
alias, join_type, lhs, (cols), nullable)
 - Limited amount of places needs to be fixed, most notably the
get_from_clause() of compiler.py

The downside of the above is that it does not support any other join
conditions than ones involving 2 tables and a list of anded columns.
For composite fields this is enough.

For future usage it would be nice if one could pass in Where nodes as
the connection. This would allow for arbitrary join conditions. The
Where node knows how to turn itself into SQL, how to relabel aliases
and so on. This approach has some problems, however:
  - How to generate the Where node?
  - How to match existing joins to new joins? Currently this is done
by checking that the connection four-tuple is equivalent to the
existing join's four tuple. I don't think Where nodes know how to
check equivalence to another node. And even if where nodes knew how to
do that, also all the leaf nodes would need to know how to do that.
  - Performance issues, cloning a Where node is more expensive than
cloning a tuple. Also construction, equivalence checking and other
operations too are somewhat more expensive than using tuples.
  - Overkill for composite fields

Of course, the approaches could be combined, that is you pass in the
join condition as a tuple, and you can pass extra_filters (default
None) as a Where node. This would keep the normal case efficient but
allow for more complex join conditions if really needed. The join
having extra_filters could not be reused, except when explicitly
stated.

Having said all this, for this project "extend the connection tuple"
approach seems to be the only sane choice.

The work you have done looks very promising. I hope this post has been
at least somewhat useful to you.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSoC composite fields] Weekly check-in (this should be #5, right...?)

2011-06-26 Thread Michal Petrucha
Hi,

some visible progress on my project at long last. I spent most of the
last week digging deep inside the ORM's entrails to make composite
field lookups possible and finally it looks promising.

While working on this I found out the extra_filters approach I
intended to use was a dead end (which reminded me of what Russ wrote
in response to my proposal: "I'm almost completely certain you'll
find some gremlin lurking underneath some dark corner of the code").

In the end I had to make virtual fields part of the field list in
model options which required updating code in quite a few places to
reflect the change. Most of it should be all right by now, the test
suite runs well except for four GFK and admin related tests which I'll
look at tomorrow.

Setting a CompositeField's primary_key option still isn't supported,
however, most of it should already be prepared.

Watch my progress at GitHub [1].

Michal

[1] https://github.com/konk/django


signature.asc
Description: Digital signature