subject:"QuerySet.count\(\) inaccurate across ForeignKey relationships"

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

2007-11-01 Thread George Vilches


Karen Tracey wrote:
> On 11/1/07, *George Vilches* <[EMAIL PROTECTED] 
> > wrote:
> [snip]
> 
> For reporting purposes though, we would like to be able to
> .select_related() on User, and get a cached copy of each of the OneToOne
> relationships.  It seems reasonable by the very essence of OneToOne, but
> I don't know if there's some limitation that would prevent that
> following from happening.  However, when we pull 50 users, having 50*N
> tables of extra queries when we need data from a few separate places
> makes the whole task unappealing to use the Django ORM for.  I wouldn't
> think about asking that this should be in the ORM, except that Django
> supports reverse foreign keys so intelligently that it seems a direct
> 1:1 correlation between rows is intuitive to go in either direction.
> 
> 
> In the absence of explicit support for the reverse relation on OneToOne 
> fields, can't you use the QuerySet extra() method ( 
> http://www.djangoproject.com/documentation/db-api/#extra-select-none-where-none-params-none-tables-none)
>  
> to pull in all the various tables/columns you are interested in in a 
> single query? 

We have used this in a few places already, and it's fine for just 
manipulating the data directly, but we really wanted to have instances 
of the Models themselves populated from that single query.  The models 
have a bunch of nice methods for manipulating the data, and it would be 
annoying to have to manually create instances of 10+ models and insert 
40+ columns of data into them to get to use those helper methods.  Too 
much typing. :)

 > p.s. to the Django website maintainers: pretty favicon!

Agreed, I like it.


George

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

2007-11-01 Thread Karen Tracey

On 11/1/07, George Vilches <[EMAIL PROTECTED]> wrote:
[snip]

> For reporting purposes though, we would like to be able to
> .select_related() on User, and get a cached copy of each of the OneToOne
> relationships.  It seems reasonable by the very essence of OneToOne, but
> I don't know if there's some limitation that would prevent that
> following from happening.  However, when we pull 50 users, having 50*N
> tables of extra queries when we need data from a few separate places
> makes the whole task unappealing to use the Django ORM for.  I wouldn't
> think about asking that this should be in the ORM, except that Django
> supports reverse foreign keys so intelligently that it seems a direct
> 1:1 correlation between rows is intuitive to go in either direction.
>

In the absence of explicit support for the reverse relation on OneToOne
fields, can't you use the QuerySet extra() method (
http://www.djangoproject.com/documentation/db-api/#extra-select-none-where-none-params-none-tables-none)
to pull in all the various tables/columns you are interested in in a single
query?

Karen

p.s. to the Django website maintainers: pretty favicon!

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

2007-11-01 Thread Malcolm Tredinnick

On Thu, 2007-11-01 at 08:53 -0400, George Vilches wrote:
> (Off-list because this mostly doesn't apply to non qs-rf people)
> 
> Thank you for the clarification on OneToOneFields and required 
> relationships.  We've been working with a very high volume legacy 
> database that we're importing the data into, and that data's not exactly 
> pristine.  We're used to related tables solving the problems with 
> non-correlated data, it sounds like we're just going to be forced to 
> clean it up, and that's fine.  Django's ORM provides a different way of 
> approaching the problem, and we just have to be willing to take full 
> advantage of that mechanism and not the previous SQL "trickery" that's 
> been used. :)
> 
> As always, your involvement in these questions is appreciated. 
> Sometimes what we expect intuitively is a bug just isn't because we're 
> taking advantage of a "non-feature" of the system.
> 
> ---
> 
> I had one other question in there directed to you, regarding 
> OneToOneFields.  In the new qs-rf, is it *possible* to make these 
> relationships bidirectional for the purposes of caching data?

Probably best to ask these questions on django-developers in future, so
that we can get a few opinions.

It might be possible when I fix up #5020. There's probably no good
reason that won't work for reverse relations. By default
select_related() won't follow reverse relations ... there are just too
many possibilities and, by and large, it would be very inefficient,
since you generally don't care about them (in the 90% case). However,
with the features of #5020 in place, you could probably specify the
reverse relations you wanted to follow and have it work. After all, User
has an implicit field called "member_set", or something like that, in
your example, so qs.select_related('member_set') might work -- not quite
sure what the reverse for a OneToOneField looks like, because I've never
used it -- I tend to stick to ForeignKey(unique=True) for a lot of those
types of things, but that's mostly an implementation detail.

I haven't finished integrating David Cramer's work there. His idea is
quite a good one, but I have to port the whole implementation across.
I've started doing it in it a branch locally, but it's still work in
progress. So I don't know if this really will work easily in practice,
but it's not beyond the realms of possibility.

> If they're not going to be currently but it is something the ORM is 
> capable of, it would be something I would work on as well as the Bit 
> class.  Here's our deal:
> 
> class User(models.Model): pass
> 
> class Member(models.Model):
>models.OneToOneField(User)
> class MemberAvatar(models.Model):
>models.OneToOneField(User)
> class MemberBadges(models.Model):
>models.OneToOneField(User)
> ...
> 
> We have 10+ of these types of tables.  The data makes sense to be 
> separated because there's a lot of large data blocks in some of these 
> columns, and we have *very large* amounts of data (100GB range on one 
> implementation of this system), so having it all in one table is 
> inappropriate.
> 
> For reporting purposes though, we would like to be able to 
> .select_related() on User, and get a cached copy of each of the OneToOne 
> relationships.  It seems reasonable by the very essence of OneToOne, but 
> I don't know if there's some limitation that would prevent that 
> following from happening.  However, when we pull 50 users, having 50*N 
> tables of extra queries when we need data from a few separate places 
> makes the whole task unappealing to use the Django ORM for.

Well, I'd be rearranging my access so that it's only N extra queries:
collect all the User objects first and then add a
filter(user_id__in=[...]). But I understand your point.

For some things, though, Django's ORM isn't going to be ideal. We aren't
trying to be SQLAlchemy. That's intentional.

Something that's been requested in the past and might be worth fleshing
out in the future (probably after queryset-refactor, though) is how to
take a result set from custom SQL and easily convert that back into a
collection of models. So that people can write their own custom SQL but
get back models in a QuerySet sort of format. Not exactly a QuerySet,
since adding extra filters probably won't work easily, but it's
interesting to play with. That's possibly even post-1.0 work, but it's
also not something that necessarily has to sit in core after the
queryset refactor lands, so I'm hoping there's some exploration that
happens there in a few months.

Regards,
Malcolm

-- 
Why be difficult when, with a little bit of effort, you could be
impossible. 
http://www.pointy-stick.com/blog/

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

2007-11-01 Thread George Vilches


Alright, I guess it's not offlist.  Sorry for the extra chatter folks.

George Vilches wrote:
> (Off-list because this mostly doesn't apply to non qs-rf people)
> 
> Thank you for the clarification on OneToOneFields and required 
> relationships.  We've been working with a very high volume legacy 
> database that we're importing the data into, and that data's not exactly 
> pristine.  We're used to related tables solving the problems with 
> non-correlated data, it sounds like we're just going to be forced to 
> clean it up, and that's fine.  Django's ORM provides a different way of 
> approaching the problem, and we just have to be willing to take full 
> advantage of that mechanism and not the previous SQL "trickery" that's 
> been used. :)
> 
> As always, your involvement in these questions is appreciated. 
> Sometimes what we expect intuitively is a bug just isn't because we're 
> taking advantage of a "non-feature" of the system.
> 
> ---
> 
> I had one other question in there directed to you, regarding 
> OneToOneFields.  In the new qs-rf, is it *possible* to make these 
> relationships bidirectional for the purposes of caching data?
> 
> If they're not going to be currently but it is something the ORM is 
> capable of, it would be something I would work on as well as the Bit 
> class.  Here's our deal:
> 
> class User(models.Model): pass
> 
> class Member(models.Model):
>models.OneToOneField(User)
> class MemberAvatar(models.Model):
>models.OneToOneField(User)
> class MemberBadges(models.Model):
>models.OneToOneField(User)
> ...
> 
> We have 10+ of these types of tables.  The data makes sense to be 
> separated because there's a lot of large data blocks in some of these 
> columns, and we have *very large* amounts of data (100GB range on one 
> implementation of this system), so having it all in one table is 
> inappropriate.
> 
> For reporting purposes though, we would like to be able to 
> .select_related() on User, and get a cached copy of each of the OneToOne 
> relationships.  It seems reasonable by the very essence of OneToOne, but 
> I don't know if there's some limitation that would prevent that 
> following from happening.  However, when we pull 50 users, having 50*N 
> tables of extra queries when we need data from a few separate places 
> makes the whole task unappealing to use the Django ORM for.  I wouldn't 
> think about asking that this should be in the ORM, except that Django 
> supports reverse foreign keys so intelligently that it seems a direct 
> 1:1 correlation between rows is intuitive to go in either direction.
> 
> As always, we don't ask that you actually do anything for this if it 
> isn't done, we would just like to know if it's possible, and if you have 
> any thoughts about where the likely place to go about fixing it is in 
> qs-rf.  Or, you can tell us that we're wrong about our assumption that 
> OneToOneFields should be bidirectional, because of XXX, and we'll 
> respect your decisions.  You know what's better for Django than us, by 
> far. :)
> 
> 
> Thanks,
> George
> 
> 
> Malcolm Tredinnick wrote:
>> On Thu, 2007-11-01 at 00:43 -0400, George Vilches wrote:
>>> Karen Tracey wrote:
 On 10/31/07, *George Vilches* <[EMAIL PROTECTED] 
 > wrote:

 Or (I just saw your follow-up e-mail), is all of this a moot point 
 since
 something like this is going to be made totally invalid in the future?


 That's it.  Targeting a non-unique column in a ForeignKey field is 
 invalid.  It's invalid now, and will continue to be invalid.  The 
 difference is that it's not caught and flagged as an error now, and will 
 be when queryset-refactor gets merged into trunk. 

 If you can whittle down your real problem to an example that does not 
 include this kind of invalid relationship (you said you are actually 
 using OneToOne fields?) maybe we could make some progress on 
 understanding/solving the real issue you are facing.
>>> Here's an example that's a lot closer to home:
>>>
>>> class Person(models.Model):
>>>  name = models.CharField(max_length=100)
>>>  status = models.IntegerField()
>>>
>>> class PersonInfo(models.Model):
>>>  person = models.OneToOneField(Person)
>>>  phone = models.CharField(max_length=10)
>>>
>>>
>>>
>>>  >>> from qs.models import *
>>>  >>> Person.objects.create(name='user1', status=1)
>>> 
>>>  >>> PersonInfo.objects.create(person_id=1, phone='111')
>>> 
>>>  >>> PersonInfo.objects.create(person_id=99, phone='222')
>>> 
>>>
>>> Yes, I know.  At this point we've added a PersonInfo that doesn't map to 
>>> an existing person (id=99).
>>>
>>> If at this point you're saying, "well that's stupid", I would ask you to 
>>> point out where it says that a OneToOneField has any sort of forced 
>>> referential integrity across the relationship (it most definitely is not 
>>> generating constraints like the unique ForeignKey does).  
>> On

OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

2007-11-01 Thread George Vilches

(Off-list because this mostly doesn't apply to non qs-rf people)

Thank you for the clarification on OneToOneFields and required 
relationships.  We've been working with a very high volume legacy 
database that we're importing the data into, and that data's not exactly 
pristine.  We're used to related tables solving the problems with 
non-correlated data, it sounds like we're just going to be forced to 
clean it up, and that's fine.  Django's ORM provides a different way of 
approaching the problem, and we just have to be willing to take full 
advantage of that mechanism and not the previous SQL "trickery" that's 
been used. :)

As always, your involvement in these questions is appreciated. 
Sometimes what we expect intuitively is a bug just isn't because we're 
taking advantage of a "non-feature" of the system.

---

I had one other question in there directed to you, regarding 
OneToOneFields.  In the new qs-rf, is it *possible* to make these 
relationships bidirectional for the purposes of caching data?

If they're not going to be currently but it is something the ORM is 
capable of, it would be something I would work on as well as the Bit 
class.  Here's our deal:

class User(models.Model): pass

class Member(models.Model):
   models.OneToOneField(User)
class MemberAvatar(models.Model):
   models.OneToOneField(User)
class MemberBadges(models.Model):
   models.OneToOneField(User)
...

We have 10+ of these types of tables.  The data makes sense to be 
separated because there's a lot of large data blocks in some of these 
columns, and we have *very large* amounts of data (100GB range on one 
implementation of this system), so having it all in one table is 
inappropriate.

For reporting purposes though, we would like to be able to 
.select_related() on User, and get a cached copy of each of the OneToOne 
relationships.  It seems reasonable by the very essence of OneToOne, but 
I don't know if there's some limitation that would prevent that 
following from happening.  However, when we pull 50 users, having 50*N 
tables of extra queries when we need data from a few separate places 
makes the whole task unappealing to use the Django ORM for.  I wouldn't 
think about asking that this should be in the ORM, except that Django 
supports reverse foreign keys so intelligently that it seems a direct 
1:1 correlation between rows is intuitive to go in either direction.

As always, we don't ask that you actually do anything for this if it 
isn't done, we would just like to know if it's possible, and if you have 
any thoughts about where the likely place to go about fixing it is in 
qs-rf.  Or, you can tell us that we're wrong about our assumption that 
OneToOneFields should be bidirectional, because of XXX, and we'll 
respect your decisions.  You know what's better for Django than us, by 
far. :)

Thanks,
George

Malcolm Tredinnick wrote:
> On Thu, 2007-11-01 at 00:43 -0400, George Vilches wrote:
>> Karen Tracey wrote:
>>> On 10/31/07, *George Vilches* <[EMAIL PROTECTED] 
>>> > wrote:
>>>
>>> Or (I just saw your follow-up e-mail), is all of this a moot point since
>>> something like this is going to be made totally invalid in the future?
>>>
>>>
>>> That's it.  Targeting a non-unique column in a ForeignKey field is 
>>> invalid.  It's invalid now, and will continue to be invalid.  The 
>>> difference is that it's not caught and flagged as an error now, and will 
>>> be when queryset-refactor gets merged into trunk. 
>>>
>>> If you can whittle down your real problem to an example that does not 
>>> include this kind of invalid relationship (you said you are actually 
>>> using OneToOne fields?) maybe we could make some progress on 
>>> understanding/solving the real issue you are facing.
>> Here's an example that's a lot closer to home:
>>
>> class Person(models.Model):
>>  name = models.CharField(max_length=100)
>>  status = models.IntegerField()
>>
>> class PersonInfo(models.Model):
>>  person = models.OneToOneField(Person)
>>  phone = models.CharField(max_length=10)
>>
>>
>>
>>  >>> from qs.models import *
>>  >>> Person.objects.create(name='user1', status=1)
>> 
>>  >>> PersonInfo.objects.create(person_id=1, phone='111')
>> 
>>  >>> PersonInfo.objects.create(person_id=99, phone='222')
>> 
>>
>> Yes, I know.  At this point we've added a PersonInfo that doesn't map to 
>> an existing person (id=99).
>>
>> If at this point you're saying, "well that's stupid", I would ask you to 
>> point out where it says that a OneToOneField has any sort of forced 
>> referential integrity across the relationship (it most definitely is not 
>> generating constraints like the unique ForeignKey does).  
> 
> On the contrary, it is asking for referential integrity at database
> table creation time. Have a look at the output of "manage.py sql
> " for that app. Unfortunately, SQLite and MySQL with the MyIsam
> storage engine won't raise complaints (since they don't enforce the
>

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Malcolm Tredinnick


On Thu, 2007-11-01 at 15:55 +1100, Malcolm Tredinnick wrote:
[...]
>  Unfortunately, SQLite and MySQL with the MyIsam
> storage engine won't raise complaints (since they don't enforce the
> integrity) and MySQL with the InnoDB engine has what is really a bug in
> that you can't use normal SQL syntax to create the Foreign Key, so it
> might not be enforcing the constraint either.

By the way, this last item is ticket #5729. I'll fix it at some point.
Not too hard, just needs doing.

Malcolm

-- 
On the other hand, you have different fingers. 
http://www.pointy-stick.com/blog/


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Malcolm Tredinnick

On Thu, 2007-11-01 at 00:43 -0400, George Vilches wrote:
> Karen Tracey wrote:
> > On 10/31/07, *George Vilches* <[EMAIL PROTECTED] 
> > > wrote:
> > 
> > Or (I just saw your follow-up e-mail), is all of this a moot point since
> > something like this is going to be made totally invalid in the future?
> > 
> > 
> > That's it.  Targeting a non-unique column in a ForeignKey field is 
> > invalid.  It's invalid now, and will continue to be invalid.  The 
> > difference is that it's not caught and flagged as an error now, and will 
> > be when queryset-refactor gets merged into trunk. 
> > 
> > If you can whittle down your real problem to an example that does not 
> > include this kind of invalid relationship (you said you are actually 
> > using OneToOne fields?) maybe we could make some progress on 
> > understanding/solving the real issue you are facing.
> 
> Here's an example that's a lot closer to home:
> 
> class Person(models.Model):
>  name = models.CharField(max_length=100)
>  status = models.IntegerField()
> 
> class PersonInfo(models.Model):
>  person = models.OneToOneField(Person)
>  phone = models.CharField(max_length=10)
> 
> 
> 
>  >>> from qs.models import *
>  >>> Person.objects.create(name='user1', status=1)
> 
>  >>> PersonInfo.objects.create(person_id=1, phone='111')
> 
>  >>> PersonInfo.objects.create(person_id=99, phone='222')
> 
> 
> Yes, I know.  At this point we've added a PersonInfo that doesn't map to 
> an existing person (id=99).
> 
> If at this point you're saying, "well that's stupid", I would ask you to 
> point out where it says that a OneToOneField has any sort of forced 
> referential integrity across the relationship (it most definitely is not 
> generating constraints like the unique ForeignKey does).  

On the contrary, it is asking for referential integrity at database
table creation time. Have a look at the output of "manage.py sql
" for that app. Unfortunately, SQLite and MySQL with the MyIsam
storage engine won't raise complaints (since they don't enforce the
integrity) and MySQL with the InnoDB engine has what is really a bug in
that you can't use normal SQL syntax to create the Foreign Key, so it
might not be enforcing the constraint either.

You don't mention what database you are using, but if you're using a
backend that doesn't enforce referential integrity, then we can't catch
these types of errors. It's up to you to be careful.

> I don't think 
> it is a requirement,

It is a requirement. It's a one-to-one non-null *relationship*. It must
relate to something!

>  and would ask that you take the rest of this 
> example in that spirit.

Except that examining the results of invalid input is "garbage in,
garbage out". You can't draw conclusions (in fact, you're getting
garbage back because of this bad input; it's not just hypothetical).

>  >>> PersonInfo.objects.count()
> 2L
>  >>> PersonInfo.objects.filter(person__id__gte=-1).count()
> 2L
>  >>> PersonInfo.objects.filter(person__status__gte=-1).count()
> 1L
> 
> So the filter has to be on a non-related field in order to get the query 
> to force the INNER JOIN and not optimize the result out.
> 
> More importantly, I'm getting different counts, and the reasoning seems 
> to be based on an internal optimization. 

The optimisation is entirely valid. What Django does is remove any
comparison to a remote primary key. Since we are comparing to a
non-null, unique field the value on table A's field (the one doing the
referencing) must be the same as the value on table B's primary key
field. No information is lost in removing this join.

There is a subtle bug in this optimisation for to_field attributes. But
that's not what you're seeing and you no doubt came across when
searching for related tickets whilst investigating this (#4088, #4306).

In your example, you've broken referential integrity, which invalidates
one of the invariants of the optimsation. But it's your input data that
is bad. Given invalid input data, all bets are off as to what output you
get.

Please stop trying to twist the ORM beyond all expectations. If you have
relations, they must actually refer to something. Uniquely. You are
poking into edge cases that really don't even look like they should
intuitively, and certainly don't in practice.

Regards,
Malcolm

-- 
Despite the cost of living, have you noticed how popular it remains? 
http://www.pointy-stick.com/blog/

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread George Vilches

Karen Tracey wrote:
> On 10/31/07, *George Vilches* <[EMAIL PROTECTED] 
> > wrote:
> 
> Or (I just saw your follow-up e-mail), is all of this a moot point since
> something like this is going to be made totally invalid in the future?
> 
> 
> That's it.  Targeting a non-unique column in a ForeignKey field is 
> invalid.  It's invalid now, and will continue to be invalid.  The 
> difference is that it's not caught and flagged as an error now, and will 
> be when queryset-refactor gets merged into trunk. 
> 
> If you can whittle down your real problem to an example that does not 
> include this kind of invalid relationship (you said you are actually 
> using OneToOne fields?) maybe we could make some progress on 
> understanding/solving the real issue you are facing.

Here's an example that's a lot closer to home:

class Person(models.Model):
 name = models.CharField(max_length=100)
 status = models.IntegerField()

class PersonInfo(models.Model):
 person = models.OneToOneField(Person)
 phone = models.CharField(max_length=10)

 >>> from qs.models import *
 >>> Person.objects.create(name='user1', status=1)

 >>> PersonInfo.objects.create(person_id=1, phone='111')

 >>> PersonInfo.objects.create(person_id=99, phone='222')

Yes, I know.  At this point we've added a PersonInfo that doesn't map to 
an existing person (id=99).

If at this point you're saying, "well that's stupid", I would ask you to 
point out where it says that a OneToOneField has any sort of forced 
referential integrity across the relationship (it most definitely is not 
generating constraints like the unique ForeignKey does).  I don't think 
it is a requirement, and would ask that you take the rest of this 
example in that spirit.

 >>> PersonInfo.objects.count()
2L
 >>> PersonInfo.objects.filter(person__id__gte=-1).count()
2L
 >>> PersonInfo.objects.filter(person__status__gte=-1).count()
1L

So the filter has to be on a non-related field in order to get the query 
to force the INNER JOIN and not optimize the result out.

More importantly, I'm getting different counts, and the reasoning seems 
to be based on an internal optimization.  That seems like a bug to me, 
I've made an effort to get a fake filter on the other table, and am 
still denied the accurate count.

What can I do to get the JOIN without having to also generate a (very 
slow when data sets are large) useless WHERE clause?

---

Someone might suggest that I do the selection from the POV of Person: 
Person.objects.filter(personinfo__...).  This is no good because I need 
to use the data from both Person and PersonInfo (I didn't put the 
select_related() in here for clarity's sake), and OneToOneFields don't 
work in the opposite direction, so I'd get a bunch of extra queries, one 
for each instance of Person returned (to get the PersonInfo bits).  The 
intention here is for reporting, and that returns a lot of rows, so I 
need to select on the relationship in the direction that caches the most 
data.

If OneToOneFields worked in both directions from the point of view of 
select_related(), which seems like a reasonable assumption, then pretty 
much this entire e-mail would not be a problem.  Malcolm, is this 
something that might be happening in qs-rf, having a OneToOneField that 
will follow in both directions?  We have a setup that looks like this: 
User is the base class, and then Member, MemberInfo, MemberData, and 
about 5 others all have OneToOneFields to User.  We would *LOVE* to be 
able to do User.select_related(depth=2) and get all those relations in a 
single query.  Right now, you can at most 2 tables/models worth of data 
with a single query, and all the other model instances that are touched 
result in an extra query.  Yuck. :)

Thanks,
George

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Karen Tracey

On 10/31/07, George Vilches <[EMAIL PROTECTED]> wrote:

> Or (I just saw your follow-up e-mail), is all of this a moot point since
> something like this is going to be made totally invalid in the future?
>

That's it.  Targeting a non-unique column in a ForeignKey field is invalid.
It's invalid now, and will continue to be invalid.  The difference is that
it's not caught and flagged as an error now, and will be when
queryset-refactor gets merged into trunk.

If you can whittle down your real problem to an example that does not
include this kind of invalid relationship (you said you are actually using
OneToOne fields?) maybe we could make some progress on understanding/solving
the real issue you are facing.

Karen

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Malcolm Tredinnick


On Wed, 2007-10-31 at 23:05 -0400, George Vilches wrote:
> Malcolm Tredinnick wrote:
> >>
> >>  >>> Assembly.objects.select_related()
> >> [, ,  >> Assembly object>, ,  >> object>, ]
> >>  >>> len(Assembly.objects.select_related())
> >> 6
> >>  >>> Assembly.objects.select_related().count()
> >> 2L
> >>
> >>
> >> Since I'm using select_related(), I would expect it to follow the INNER 
> >> JOINs to do the .count().  I can force it to use the JOINs by doing this:
> > 
> > This is where you've made a mistake (aside from the to_field problem
> > Karen pointed out). Your assumption is wrong. select_related() is only
> > an optimisation as far as loading the data form the database. It should
> > *never* change the result of a quantitative query like this. If it does,
> > it would be a bug.
> > 
> > Regards,
> > Malcolm
> 
> 
> 
> So is it expected behavior then that when I use filter criteria like this:
> 
>  >>> 
> Assembly.objects.select_related().filter(item_group__id__gte=-1).count()
> 6L

Once again, the select_related() bit has nothing to do with it. The
filter() call is dragging in the remote table. Filters have an effect on
the result set, select_related() is just an optimisation that controls
when the result set pieces are retrieved.

Rather than addressing each of your questions one-by-one, since they're
all of the same variety, I'll simply repeat that if including
select_related() changes the result set, it's a bug. So you should try
each of your queries with and without select_related() if you're in
doubt.

Regards,
Malcolm

-- 
He who laughs last thinks slowest. 
http://www.pointy-stick.com/blog/


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread George Vilches


Malcolm Tredinnick wrote:
>>
>>  >>> Assembly.objects.select_related()
>> [, , > Assembly object>, , > object>, ]
>>  >>> len(Assembly.objects.select_related())
>> 6
>>  >>> Assembly.objects.select_related().count()
>> 2L
>>
>>
>> Since I'm using select_related(), I would expect it to follow the INNER 
>> JOINs to do the .count().  I can force it to use the JOINs by doing this:
> 
> This is where you've made a mistake (aside from the to_field problem
> Karen pointed out). Your assumption is wrong. select_related() is only
> an optimisation as far as loading the data form the database. It should
> *never* change the result of a quantitative query like this. If it does,
> it would be a bug.
> 
> Regards,
> Malcolm



So is it expected behavior then that when I use filter criteria like this:

 >>> 
Assembly.objects.select_related().filter(item_group__id__gte=-1).count()
6L

That I get 6 instead of 2?  (It appears that the combination of 
select_related() and filter() is now affecting the count of the Assembly 
rows in the query, which is what would be expected from a strictly SQL 
perspective).


If that's correct, then is this also correct? (rehash of the message 
about distinct()/count()):

  >>>
Assembly.objects.select_related().filter(item_group__id__gte=-1).distinct()

[, , , , , ]
  >>>
Assembly.objects.select_related().filter(item_group__id__gte=-1).distinct().count()
2L
  >>>
Assembly.objects.select_related().filter(item_group__id__gte=-1).count()
6L


Or (I just saw your follow-up e-mail), is all of this a moot point since 
something like this is going to be made totally invalid in the future?

Thanks,
George


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Malcolm Tredinnick

On Wed, 2007-10-31 at 18:45 -0400, George Vilches wrote:
> First of all, thank you for the well thought-out response, I appreciate 
> your efforts. :)
> 
> Here is the crux of the matter, without any examples, and you phrased it 
> best:
>  > "You've overridden that by
>  > specifying a to_field whose value is not unique (allowing that may be
>  > a bug in Django)."
> 
> The whole point is that I think the behavior is a bug in Django, because 
> it's inconsistent as to which the count() is generating.  I was hoping 
> someone knew the core well enough to explain this inconsistency.  Core 
> devs, is this a bug, or do all of us who have commented on it not 
> understand why this behavior gives these apparently inconsistent values?

It's a bug. It's possible, although I haven't dug into the SQL spec in
detail to confirm this yet, that it's actually invalid SQL to have a
reference to a non-unique column like this. Certainly PostgreSQL will
raise an error if you try to do it and as far as I can understand it,
that should be the correct result. The fact that some databases don't
raise an error means it's either undefined or another case of database
vendors treating the spec as a wishlist from some committee, rather than
a requirement. The root problem is that the join relation is not
well-defined in this case when you look at it from the relational
algebra perspective.

It will be a validation error once queryset-refactor is merged into
trunk. You must put unique=True on the targets of a "to_field" relation
and you might as well start doing that now.

Regards,
Malcolm

-- 
Many are called, few volunteer. 
http://www.pointy-stick.com/blog/

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Malcolm Tredinnick


On Wed, 2007-10-31 at 11:09 -0400, George Vilches wrote:
> Stated simply:
> queryset.select_related().count() with no filter criteria generates a 
> wrong query across a ForeignKey relationship.
> 
> The problem: A QuerySet operation that involves a .count() across a 
> ForeignKey relationship does not actually join in the ForeignKey tables 
> to do the select_related().count(), unless you force it to use criteria 
> on the ForeignKey table.  If the ForeignKey match happens to be the 
> primary key in the foreign table as well (not in the example), then the 
> filter can't be on it, because QuerySet seems to optimize out the JOIN 
> (less important but possibly helpful info).
> 
> 
> Example (create a new Django app and paste this right in, and then do 
> the rest from the shell):
> 
> Assume I have two models in models.py:
> 
> class Item(models.Model):
>  name = models.CharField(max_length=100)
>  group = models.IntegerField()
> 
> class Assembly(models.Model):
>  desc = models.CharField(max_length=100)
>  item_group = models.ForeignKey(Item, to_field='group')
> 
> 
> and they are populated with data like so:
> 
> from qs.models import *
> Item.objects.create(name='item1', group=1)
> Item.objects.create(name='item2', group=1)
> Item.objects.create(name='item3', group=1)
> Item.objects.create(name='item4', group=2)
> Item.objects.create(name='item5', group=2)
> Item.objects.create(name='item6', group=2)
> Assembly.objects.create(desc='as1', item_group_id=1)
> Assembly.objects.create(desc='as1', item_group_id=2)
> 
> 
> 
> Now, run these commands (I've included output from the Python interpreter):
> 
> 
>  >>> Assembly.objects.select_related()
> [, ,  Assembly object>, ,  object>, ]
>  >>> len(Assembly.objects.select_related())
> 6
>  >>> Assembly.objects.select_related().count()
> 2L
> 
> 
> Since I'm using select_related(), I would expect it to follow the INNER 
> JOINs to do the .count().  I can force it to use the JOINs by doing this:

This is where you've made a mistake (aside from the to_field problem
Karen pointed out). Your assumption is wrong. select_related() is only
an optimisation as far as loading the data form the database. It should
*never* change the result of a quantitative query like this. If it does,
it would be a bug.

Regards,
Malcolm

-- 
I intend to live forever - so far so good. 
http://www.pointy-stick.com/blog/


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread George Vilches

First of all, thank you for the well thought-out response, I appreciate 
your efforts. :)

Here is the crux of the matter, without any examples, and you phrased it 
best:
 > "You've overridden that by
 > specifying a to_field whose value is not unique (allowing that may be
 > a bug in Django)."

The whole point is that I think the behavior is a bug in Django, because 
it's inconsistent as to which the count() is generating.  I was hoping 
someone knew the core well enough to explain this inconsistency.  Core 
devs, is this a bug, or do all of us who have commented on it not 
understand why this behavior gives these apparently inconsistent values?

I'm going to have to generate a different example, because what I'm 
using here is obviously causing people to think about the wrong part of 
the problem.  The problem is that .select_related().count() has wildly 
differing behavior depending on whether there's filter() criteria on the 
ForeignKey table, and/or the use of distinct() prior to the count().

Our internal example that has a problem is related to the use of a 
OneToOneField, so there is no many-to-many relationship there.  In fact, 
it's a zero-to-one relation or a one-to-one relation, but it exhibits 
the same behavior because OneToOneFields function like ForeignKeys. 
People sometimes get upset and confused by OneToOneField, but it looks 
like I just made it worse for everybody. :)

So, my example (which attempted to simplify things with ForeignKeys) 
just caused people to try to think about the relationships here, and 
that's not what was really the problem.  For that, I apologize.

Thanks,
George

Karen Tracey wrote:
> On 10/31/07, *George Vilches* <[EMAIL PROTECTED] 
> > wrote:
> 
> [snip]
> 
> Here is what I want to count:
> 
> "All the quantity of items necessary to complete all the selected
> assemblies".
> 
> That's not thinking in SQL, that's very much thinking about the objects
> at hand.  I have Assemblies, and I need to know the total number of
> Items.  That's a concept that can be thought about strictly from the
> objects.
> 
> So, if I have two assemblies, and Assembly 1 connects to 2 items, and
> Assembly 2 connects to 3 items, I want the total result of 5 when I
> do a
> count() where I refer to the Items (select_related()).
> 
> 
> A ForeignKey in the Assembly model (as you have it now) is not the right 
> way to describe the relationship you state in the above sentence.  A 
> ForeignKey is used to describe a many-to-one relationship, where the 
> "many" side is the model containing the ForeignKey and the "one" side is 
> the model that is the target of the ForeignKey.  That is, many Assembly 
> instances might "point to" the same Item instance, but each individual 
> Assembly is associated with exactly one Item.  So a single Item would 
> have a set of Assembly objects "connected" to it, not vice-versa.
> 
> This happens naturally by default because the ForeignKey uses the 
> primary key field of the target model.  You've overridden that by 
> specifying a to_field whose value is not unique (allowing that may be a 
> bug in Django). I think this is the crux of what's causing the 
> differences you see in the various ways of determining count() -- Django 
> is assuming a unique relationship which in you case does not exist.
> 
> Now, as to fixing it for your case, I'm not sure how to do it because 
> I'm not clear on the relationship between Assembly and Item.  Is it 
> many-to-one (only reversed from the way currently defined in your 
> models) or many-to-many?  If it really is many-to-one, then moving the 
> ForeignKey field over into the Item model may be all you need to do 
> (plus turning around your thinking, which often trips me up when trying 
> to work out ForeignKey relationships).
> 
> Karen

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread Karen Tracey

On 10/31/07, George Vilches <[EMAIL PROTECTED]> wrote:
[snip]

Here is what I want to count:
>
> "All the quantity of items necessary to complete all the selected
> assemblies".
>
> That's not thinking in SQL, that's very much thinking about the objects
> at hand.  I have Assemblies, and I need to know the total number of
> Items.  That's a concept that can be thought about strictly from the
> objects.
>
> So, if I have two assemblies, and Assembly 1 connects to 2 items, and
> Assembly 2 connects to 3 items, I want the total result of 5 when I do a
> count() where I refer to the Items (select_related()).

A ForeignKey in the Assembly model (as you have it now) is not the right way
to describe the relationship you state in the above sentence.  A ForeignKey
is used to describe a many-to-one relationship, where the "many" side is the
model containing the ForeignKey and the "one" side is the model that is the
target of the ForeignKey.  That is, many Assembly instances might "point to"
the same Item instance, but each individual Assembly is associated with
exactly one Item.  So a single Item would have a set of Assembly objects
"connected" to it, not vice-versa.

This happens naturally by default because the ForeignKey uses the primary
key field of the target model.  You've overridden that by specifying a
to_field whose value is not unique (allowing that may be a bug in Django). I
think this is the crux of what's causing the differences you see in the
various ways of determining count() -- Django is assuming a unique
relationship which in you case does not exist.

Now, as to fixing it for your case, I'm not sure how to do it because I'm
not clear on the relationship between Assembly and Item.  Is it many-to-one
(only reversed from the way currently defined in your models) or
many-to-many?  If it really is many-to-one, then moving the ForeignKey field
over into the Item model may be all you need to do (plus turning around your
thinking, which often trips me up when trying to work out ForeignKey
relationships).

Karen

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread guettli . google


> But all of this is besides the point, because one way or the other, 
> Django is doing *something* wrong.  Sometimes the .count() returns 6, 
> and sometimes it returns 2 in the provided examples.

About .distinct():

Django does not count: It uses 'SELECT count(*) ...'. The results
you see are from your database. 

Have a look at the used sql statements with [1] or
from django.db import connection
assert False, connection.queries


1: SQLLogMiddleware http://www.djangosnippets.org/snippets/344/

Some days ago, I was confused, too, because one QuerySet contained
an object several times.

I didn't looked at the select_related problem, since I never used it.

 Thomas

-- 
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread George Vilches

koenb wrote:
> First things first: I would either define a separate Group object
> here, or use a many2manyfield if you don't need to know the group
> numbers.

The example was crafted because the actual models and data are part of a 
internal application that I am not allowed to publish the source of, and 
is more appropriately structured in the DB (no fake integers to 
demonstrate the problem).  This example strictly demonstrates the 
problem we're seeing internally in the most minimal set we could devise.

> Second thing: why use the assembly object if you want to count items ?
> You are thinking to much in SQL instead of ORM.
> If you say Assembly.objects.count(), I would think you want to count
> assembly objects, which should yield 2 no matter what extra stuff you
> pull in using select_related(). For the ORM side of things,
> select_related just prepopulates some related data, no more, no less
> (which is of no use in combination with count).
> In that view, it would be better if the filter would automatically use
> distinct here and also return 2, since you are still only counting
> assembly objects. If you do the same using a many2manyfield, the
> result is correct as far as I can tell.
> If you want to count items, I would use
> Item.objects.filter(whatever_filter).count().

Here is what I want to count:

"All the quantity of items necessary to complete all the selected 
assemblies".

That's not thinking in SQL, that's very much thinking about the objects 
at hand.  I have Assemblies, and I need to know the total number of 
Items.  That's a concept that can be thought about strictly from the 
objects.

So, if I have two assemblies, and Assembly 1 connects to 2 items, and 
Assembly 2 connects to 3 items, I want the total result of 5 when I do a 
count() where I refer to the Items (select_related()).

.select_related() does more than "prepopulates some related data".  It 
actively changes the face of the total information pulled, especially 
when you are filtering on ForeignKey-based criteria.  I can filter on 
something that's more than two relations away, and I would be changing 
the total face of the data that's being made available in the ORM.

But all of this is besides the point, because one way or the other, 
Django is doing *something* wrong.  Sometimes the .count() returns 6, 
and sometimes it returns 2 in the provided examples.  From your point of 
view, it should always return 2.  From my point of view, it should 
return 6 whenever .select_related() occurs.  Either way, neither of our 
behaviors are occurring, and we probably need someone more intimate with 
the Django internals to verify if this is expected behavior.

Do you see how my needs still fit within the concept of the ORM?  Also, 
do you see how there's likely still a bug here of some sort?  I'm as 
much interested as isolating the exact source of the bug as I am with 
getting the proper behavior defined for this activity. :)

Thanks,
George

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread koenb


First things first: I would either define a separate Group object
here, or use a many2manyfield if you don't need to know the group
numbers.

Second thing: why use the assembly object if you want to count items ?
You are thinking to much in SQL instead of ORM.
If you say Assembly.objects.count(), I would think you want to count
assembly objects, which should yield 2 no matter what extra stuff you
pull in using select_related(). For the ORM side of things,
select_related just prepopulates some related data, no more, no less
(which is of no use in combination with count).
In that view, it would be better if the filter would automatically use
distinct here and also return 2, since you are still only counting
assembly objects. If you do the same using a many2manyfield, the
result is correct as far as I can tell.
If you want to count items, I would use
Item.objects.filter(whatever_filter).count().

Koen





--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread George Vilches


[EMAIL PROTECTED] wrote:
> On Wed, Oct 31, 2007 at 11:09:49AM -0400, George Vilches wrote:
>> Stated simply:
>> queryset.select_related().count() with no filter criteria generates a 
>> wrong query across a ForeignKey relationship.
> 
> Hi,
> 
> do you get the correct results, if you use .distinct()?
> 
> Distinct is not enabled by default. This means, a query over foreignkeys
> can give you duplicate entries in the queryset.
> 
>  Thomas
> 

I do not get the correct results if I use .distinct(), and that's to be 
expected, because the INNER JOIN results are 6 totally unique rows.  For 
verification:

 >>> Assembly.objects.select_related().distinct().count()
2L

But the results actually expand to show things that are even more 
...wrong/strange/odd:


 >>> 
Assembly.objects.select_related().filter(item_group__id__gte=-1).distinct() 

[, , , , , ]
 >>> 
Assembly.objects.select_related().filter(item_group__id__gte=-1).distinct().count()
2L
 >>> 
Assembly.objects.select_related().filter(item_group__id__gte=-1).count()
6L


Why would the .distinct() show 6 objects, but the .distinct().count() 
only show a count of 2?  Something seems broken about this.


Thanks,
George

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread guettli . google


On Wed, Oct 31, 2007 at 11:09:49AM -0400, George Vilches wrote:
> 
> Stated simply:
> queryset.select_related().count() with no filter criteria generates a 
> wrong query across a ForeignKey relationship.

Hi,

do you get the correct results, if you use .distinct()?

Distinct is not enabled by default. This means, a query over foreignkeys
can give you duplicate entries in the queryset.

 Thomas

-- 
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

QuerySet.count() inaccurate across ForeignKey relationships

2007-10-31 Thread George Vilches


Stated simply:
queryset.select_related().count() with no filter criteria generates a 
wrong query across a ForeignKey relationship.

The problem: A QuerySet operation that involves a .count() across a 
ForeignKey relationship does not actually join in the ForeignKey tables 
to do the select_related().count(), unless you force it to use criteria 
on the ForeignKey table.  If the ForeignKey match happens to be the 
primary key in the foreign table as well (not in the example), then the 
filter can't be on it, because QuerySet seems to optimize out the JOIN 
(less important but possibly helpful info).


Example (create a new Django app and paste this right in, and then do 
the rest from the shell):

Assume I have two models in models.py:

class Item(models.Model):
 name = models.CharField(max_length=100)
 group = models.IntegerField()

class Assembly(models.Model):
 desc = models.CharField(max_length=100)
 item_group = models.ForeignKey(Item, to_field='group')


and they are populated with data like so:

from qs.models import *
Item.objects.create(name='item1', group=1)
Item.objects.create(name='item2', group=1)
Item.objects.create(name='item3', group=1)
Item.objects.create(name='item4', group=2)
Item.objects.create(name='item5', group=2)
Item.objects.create(name='item6', group=2)
Assembly.objects.create(desc='as1', item_group_id=1)
Assembly.objects.create(desc='as1', item_group_id=2)



Now, run these commands (I've included output from the Python interpreter):


 >>> Assembly.objects.select_related()
[, , , , , ]
 >>> len(Assembly.objects.select_related())
6
 >>> Assembly.objects.select_related().count()
2L


Since I'm using select_related(), I would expect it to follow the INNER 
JOINs to do the .count().  I can force it to use the JOINs by doing this:

 >>> 
Assembly.objects.select_related().filter(item_group__id__gte=-1).count()
6L

But the requires a dependency on the data that seems ridiculous.  By 
providing .select_related() without any filter criteria, the JOINs 
should still be done just like they would be in the normal query. 
Here's an example of the query logs:

 >>> Assembly.objects.select_related()
gives this SQL:
  SELECT `qs_assembly`.`id`, `qs_assembly`.`desc`, 
`qs_assembly`.`item_group_id`, `qs_item`.`id`, `qs_item`.`name`, 
`qs_item`.`group` FROM `qs_assembly` INNER JOIN `qs_item` ON 
(`qs_assembly`.`item_group_id` = `qs_item`.`group`)

BUT
 >>> Assembly.objects.select_related().count()
gives this SQL:
SELECT COUNT(*) FROM `qs_assembly`

Why doesn't it have an INNER JOIN?

 >>> 
Assembly.objects.select_related().filter(item_group__id__gte=-1).count()
gives this SQL:
  SELECT COUNT(*) FROM `qs_assembly` INNER JOIN `qs_item` ON 
(`qs_assembly`.`item_group_id` = `qs_item`.`group`) WHERE `qs_item`.`id` 
 >= -1

which would be what I want, but without the WHERE clause.


This whole thing also has implications on using .select_related(depth=N) 
parameters.  The further you select_related() depth-wise, your count may 
expand or shrink even more erratically.


So, is this a bug?  If it isn't, can someone explain the difference in 
behavior between .select_related()'s repr() and 
.select_related().count()'s behavior?


Thanks,
George

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

Re: OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

OneToOneField direction? (was Re: QuerySet.count() inaccurate across ForeignKey relationships)

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

Re: QuerySet.count() inaccurate across ForeignKey relationships

QuerySet.count() inaccurate across ForeignKey relationships

21 matches

Site Navigation

Mail list logo

Footer information