Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Anssi Kääriäinen


On Saturday, July 5, 2014 1:47:30 AM UTC+3, Jon Dufresne wrote:

> Sorry, but I felt like I was reporting information interesting to 
> developers. 
>
> 1.) Wrong JOIN type (OUTER vs INNER) producing inefficient queries 
> 2.) FieldError in 1.7 where there wasn't one in 1.6 
>
> If these are of no interest, I will not continue the discussion. 
>

This is the right list, especially concerning 2). Even if 2) turns out to 
be a false alarm, I much rather get a couple of false alarms than a 
regression in final release of 1.7.

 - Anssi 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/37ea28e0-20da-4bb3-bf31-210472d35fd9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Anssi Kääriäinen
On Saturday, July 5, 2014 12:42:39 AM UTC+3, Jon Dufresne wrote:
>
> Suppose I have the following models: 
>
> --- 
> class Container(models.Model): 
> pass 
>
> class Item(models.Model): 
> container = models.ForeignKey(Container) 
> previous = models.ForeignKey('self', null=True) 
> current = models.BooleanField() 
> flag = models.BooleanField() 
> --- 
>
> Item represents a chain of items, all grouped by a container (like a 
> linked list). The field "current" represents the most recent item 
> (front of the list). The field "flag" is simply something to query on. 
>
> Suppose I perform the following query: 
>
> --- 
> Item.objects.filter(current=True, 
> container__item__previous__isnull=True, container__item__flag=True) 
> --- 
>
> That is, I'm looking for all current items such that the first item in 
> the chain has flag = true. Django 1.6 produces the following SQL for 
> this query: 
>
> --- 
> SELECT ... 
> FROM "myapp_item" 
> INNER JOIN "myapp_container" 
> ON ( "myapp_item"."container_id" = "myapp_container"."id" ) 
> LEFT OUTER JOIN "myapp_item" T3 
> ON ( "myapp_container"."id" = T3."container_id" ) 
> WHERE ("myapp_item"."current" = True  AND T3."previous_id" IS NULL AND 
> T3."flag" = True ) 
> --- 
>
> The OUTER JOIN is the problem. Why is this not a more efficient INNER 
> JOIN? This query is very inefficient as the the database gets larger. 
> This causes slow downs on pages. 
>

This have been fixed in 1.7. Pre-1.7 there was no join "demotion". That is, 
when the ORM generated a LEFT JOIN expression it was never changed back to 
INNER JOIN. Here the container__item__previous__isnull=True generates LEFT 
JOIN for the item join.

In 1.7 there exists some new code to do join demotion. In this case it 
works because container__item__flag=True can produce results only when the 
item join produces results => the query never produces results with LEFT 
JOIN of item => the existing LEFT JOIN for alias T3 can be "demoted" to 
INNER JOIN. 

Interestingly, if I try this on 1.7, this query is apparently not even 
> supported. I get the following error: 
>

I wasn't able to reproduce this - are you sure the query is exactly the 
same both on 1.6 and 1.7? Are models are set up correctly before generating 
the query? If after double-checking you still get the error, could you post 
a sample project somewhere?

 - Anssi 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/d9253e4a-df9c-4e75-bc5a-3f883a62db0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Jon Dufresne
> ... and this is more appropriate on django-users

Sorry, but I felt like I was reporting information interesting to developers.

1.) Wrong JOIN type (OUTER vs INNER) producing inefficient queries
2.) FieldError in 1.7 where there wasn't one in 1.6

If these are of no interest, I will not continue the discussion.

Cheers,
Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADhq2b5-QZffBC1eVbfKZuKfjJdSUBXP4e9Ad5HGK8-U%2B4o8AA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Javier Guerra Giraldez
On Fri, Jul 4, 2014 at 5:23 PM, Stephen J. Butler
 wrote:
> * Build a list of relevant containers first:
> Container.objects.filter(item__previous__isnull=True,
> item__flag=True).values_list('pk', flat=True). Then
> Items.objects.filter(current=True, container__in=flagged_containers). Not a
> problem as long as your query doesn't grow too large for your DB (that is,
> too many containers that might be considered)


i guess Items.objects.filter(current=True,
container__in=Container.objects.filter(item__previous__isnull=True,
item__flag=True)) would be better, it gives the the opportunity to not
get the container list in Python.

... and this is more appropriate on django-users

-- 
Javier

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFkDaoROfS14GrKPb-7XbSTGxyb8g1hNBvZSFDQhSzwTEq7DOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Stephen J. Butler
On Fri, Jul 4, 2014 at 4:42 PM, Jon Dufresne  wrote:

> Suppose I have the following models:
>
> ---
> class Container(models.Model):
> pass
>
> class Item(models.Model):
> container = models.ForeignKey(Container)
> previous = models.ForeignKey('self', null=True)
> current = models.BooleanField()
> flag = models.BooleanField()
> ---
>

[snip]


> A slow query is better than no query at all. Any particular reason
> this was removed? How would one model and query what I'm trying to
> achieve moving forward to avoid inefficient queries and exceptions?
>

Two ways come to mind:

* If it's only "flag" on the first item in a sequence then move "flag" to
be on present on Container also. A "first_flag" field. Could manage in a
signal or override Item.save(). That would let you not nest the query, and
could be more efficient.

* Build a list of relevant containers first:
Container.objects.filter(item__previous__isnull=True,
item__flag=True).values_list('pk', flat=True). Then
Items.objects.filter(current=True, container__in=flagged_containers). Not a
problem as long as your query doesn't grow too large for your DB (that is,
too many containers that might be considered)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAD4ANxVC4SJjWM9_bw4uN3--HgcVDSj8s%3DAd%3D1vimTcxJ0Vh5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Jon Dufresne
On Fri, Jul 4, 2014 at 2:54 PM, Javier Guerra Giraldez
 wrote:
> what purpose does the "container__item__previous__isnull=True"
> argument serve here?

To filter on the initial item in the list. I'm looking for flag=True
*only* on the first item in the list. flag=True could be set on other
items in the same chain, but I don't care about those. The goal of the
query is:

I'm looking for all *current* items such that the *first item* in the
chain has *flag = true*.

In raw SQL I want the query to be:

SELECT myapp_item.*
FROM "myapp_item"
INNER JOIN "myapp_container"
ON ( "myapp_item"."container_id" = "myapp_container"."id" )
INNER JOIN "myapp_item" T3
ON ( "myapp_container"."id" = T3."container_id" )
WHERE ("myapp_item"."current" = True  AND T3."previous_id" IS NULL AND
T3."flag" = True )

So the WHERE condition flag=True, only applies to the initial item.

>i think it means "an item that belongs to a container that has an item with no 
>'previous'"

Correct, but not just any container, but the same container as the
item being filtered.

> which if it's a linked list, then any non-empty container would comply

Right, except then the "container__item__flag=True" should influence it further.

> and since you start the query from the item, then the container is non-empty 
> by definition.

Yes, correct.

> btw, a linked list in a database?  can you elaborate on that idea?

It is just one way to think about it. It is not really a linked list.
I simply meant the items are linked together by the "previous" point.

In my actual application item represent a step with history (previous
being that history). Container represents the process of the step (has
actual fields). So each step has a previous step in the process.
Sometimes I want to query for the current step in the process based on
the initial step.

Cheers,
Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADhq2b7Uy6nY0Vo0__%2B3Ka1%2B_Hv7_smnJrDb4rtL%2B6sF%2B3QE5A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Javier Guerra Giraldez
On Fri, Jul 4, 2014 at 4:42 PM, Jon Dufresne  wrote:
> Item.objects.filter(current=True,
> container__item__previous__isnull=True, container__item__flag=True)
> ---
>
> That is, I'm looking for all current items such that the first item in
> the chain has flag = true. Django 1.6 produces the following SQL for
> this query:


what purpose does the "container__item__previous__isnull=True"
argument serve here?  i think it means "an item that belongs to a
container that has an item with no 'previous'", which if it's a linked
list, then any non-empty container would comply, and since you start
the query from the item, then the container is non-empty by
definition.

btw, a linked list in a database?  can you elaborate on that idea?

-- 
Javier

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFkDaoR6GwBq2r68V7HbHhCnjednpL7GBGA1rpOOeru15Wr_%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

2014-07-04 Thread Jon Dufresne
Suppose I have the following models:

---
class Container(models.Model):
pass

class Item(models.Model):
container = models.ForeignKey(Container)
previous = models.ForeignKey('self', null=True)
current = models.BooleanField()
flag = models.BooleanField()
---

Item represents a chain of items, all grouped by a container (like a
linked list). The field "current" represents the most recent item
(front of the list). The field "flag" is simply something to query on.

Suppose I perform the following query:

---
Item.objects.filter(current=True,
container__item__previous__isnull=True, container__item__flag=True)
---

That is, I'm looking for all current items such that the first item in
the chain has flag = true. Django 1.6 produces the following SQL for
this query:

---
SELECT ...
FROM "myapp_item"
INNER JOIN "myapp_container"
ON ( "myapp_item"."container_id" = "myapp_container"."id" )
LEFT OUTER JOIN "myapp_item" T3
ON ( "myapp_container"."id" = T3."container_id" )
WHERE ("myapp_item"."current" = True  AND T3."previous_id" IS NULL AND
T3."flag" = True )
---

The OUTER JOIN is the problem. Why is this not a more efficient INNER
JOIN? This query is very inefficient as the the database gets larger.
This causes slow downs on pages.

Interestingly, if I try this on 1.7, this query is apparently not even
supported. I get the following error:

---
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/jon/djtest/djtest/test.py", line 3, in 
qs = Item.objects.filter(current=True,
container__item__previous__isnull=True, container__item__flag=True)
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/manager.py",
line 92, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/query.py",
line 689, in filter
return self._filter_or_exclude(False, *args, **kwargs)
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/query.py",
line 707, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/sql/query.py",
line 1287, in add_q
clause, require_inner = self._add_q(where_part, self.used_aliases)
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/sql/query.py",
line 1314, in _add_q
current_negated=current_negated, connector=connector)
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/sql/query.py",
line 1181, in build_filter
lookups, value)
  File 
"/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/fields/related.py",
line 1506, in get_lookup_constraint
raise exceptions.FieldError('Relation fields do not support nested lookups')
FieldError: Relation fields do not support nested lookups
---

A slow query is better than no query at all. Any particular reason
this was removed? How would one model and query what I'm trying to
achieve moving forward to avoid inefficient queries and exceptions?

Should I file this as a bug?

Cheers,
Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CADhq2b7%3DcsYSOU2gSCL%3DwTD8Mztc9On%2ByRLWmUef%2B-y-fWryuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.