For an app I've been assembling, I need to filter a particular data-set based on multiple many-to-many criteria where the criteria span multiple records in the far participant of the M2M. The basics are the recordset (Items) and each Item has a M2M relationship with Properties (a key/value pair)
class Property(model): name = CharField(...) value = CharField(...) class Item(model: properties = ManyToMany(Property) I want to filter my list of items for items that have two different properties. In SQL, it would look something like SELECT * FROM x_item WHERE x_id IN ( SELECT ip.item_id FROM x_item_property ip INNER JOIN x_property p ON ip.property_id = p.id WHERE p.name = 'foo' and p.value = 'bar' ) AND x_id IN ( SELECT ip.item_id FROM x_item_property ip INNER JOIN x_property p ON ip.property_id = p.id WHERE p.name = 'baz' and p.value = 'bip' ) This query returns all the items that have both "foo=bar" and "baz=bip" from their associated property lists. This is different than just filtering the M2M relationship based on the select_related() because once you've filtered for those items that have "foo=bar", you've eliminated all rows in which "baz=bip" and thus filtering by that again would return an empty dataset. However, I'm having trouble getting Django to do all the work on the server side. I've got a workaround in place where I do items = models.Items.objects properties = models.Items.objects.select_related( ).filter(properties__name='foo' ).filter(properties__value='bar') item_ids = [item.id for item in items] # XXX items = items.filter(id__in = item_ids) properties = models.Items.objects.select_related( ).filter(properties__name='baz' ).filter(properties__value='bip') item_ids = [item.id for item in items] # XXX items = items.filter(id__in = item_ids) # do something with items While this works (this is some pseudo-code in the email, but should give you the idea of what's going on, even if there are a couple syntax errors), it pulls back the data-sets for the properties (in the XXX lines), building potentially huge lists on the Django side (rather than the DB side) and then shipping those huge lists back to the server as the clause to an IN query. I've already experienced some problems with sluggish behavior and this is a only a medium-sized record-set (about 800 IDs coming back per sub-query). Live data will actually have some that bring back thousands upon thousands. Ideally, I'd have some way of doing something like items = models.Items.objects properties = models.Items.objects.select_related( ).filter(properties__name='foo' ).filter(properties__value='bar') items = items.filter(id__in = properties) properties = models.Items.objects.select_related( ).filter(properties__name='baz' ).filter(properties__value='bip') items = items.filter(id__in = properties) # do something with items This would allow for lazy evaluation of a single query, doing all the filtering on the DB side rather than shipping huge quantities of data over the connection 3x. However, when I tried the above, the "= properties" clause expanded as a string (perhaps as a repr() of bits of the data), not as the full SELECT query, and produced bogus SQL which the server promptly complained about. Any hints on how to go about this and make the server do all the work? Push come to shove, I can monkey with an .extra() call and put the SQL in the WHERE clause by hand, but it would be nice/legible/understandable to use pure Django. Thanks, -tkc --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---