#28519: Add `filter`, `exclude` and other base `QuerySet` methods to combined
`QuerySet`
-------------------------------------+-------------------------------------
     Reporter:  Stanislav Karpov     |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  1.11
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  union,               |             Triage Stage:
  intersection, difference           |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Description changed by Stanislav Karpov:

Old description:

> Hello.
>
> I found a discussion about the filtering of combined `QuerySet` (union,
> intersection, difference):
> [https://github.com/django/django/pull/7727#issuecomment-269283216]
> As I understood it, as a result of the discussion, it was suggested to
> filter the queries before union:
>   If I understood your code correctly (I still did not test it), you are
> still generating select * from (x union y) -- while this is more powerful
> it is also unnecessary in most cases. I'd also argue that it is less
> efficient, cause filtering on that result can (usually and probably
> should) be pushed down to the underlying queries (when possible) to limit
> the result set before joining them.
> [https://github.com/django/django/pull/7727#issuecomment-269461236]
>
> But such a decision blocks the possibility of re-use of `QuerySet`,
> which is very convenient when working with `GenericForeignKey`.
>
> **A similar example of the structure of a real project (only a structure,
> not an entities).**
>
> {{{
> import typing
>
> from django.db import models
> from django.contrib.contenttypes.fields import GenericForeignKey,
> GenericRelation
>

> class Manager(models.Model):
>     first_name = models.CharField(max_length=255)
>     last_name = models.CharField(max_length=255)
>

> class Organization(models.Model):
>     # pickup
>     title = models.CharField(max_length=255)
>     foundation_date = models.DateField()
>     manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
>

> class Warehouse(models.Model):
>     # pickup
>     organization = models.ForeignKey(Organization,
> on_delete=models.CASCADE)
>     address = models.TextField()
>

> class City(models.Model):
>     # bus
>     title = models.CharField(max_length=255)
>     code = models.CharField(max_length=10)
>     manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
>

> class Depot(models.Model):
>     # bus
>     title = models.CharField(max_length=255)
>     city = models.ForeignKey(City, on_delete=models.CASCADE)
>

> class Garage(models.Model):
>     # wagon
>     address = models.TextField()
>     manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
>

> class AirbagManager(models.Manager):
>     def filter_by_manager__union_and_inner_join(
>         self, manager: typing.Union[Manager, int],
>     ) -> models.QuerySet:
>         manager_id = self._clean_manager_id(manager)
>         return (
>             self
>             # If I'm not mistaken, now we can't direct call `union` from
> `Manager`,
>             # but this variant increases the readability
>             .union(
>                 # INNER JOIN, INNER JOIN, INNER JOIN
> self.filter(pickup__warehouse__organization__manager_id=manager_id),
>                 # INNER JOIN, INNER JOIN, INNER JOIN
>                 self.filter(bus__depot__city__manager_id=manager_id),
>                 # INNER JOIN, INNER JOIN
>                 self.filter(wagon__garage__manager_id=manager_id),
>             )
>         )
>
>     def filter_by_manager__left_join(
>         self, manager: typing.Union[Manager, int],
>     ) -> models.QuerySet:
>         # NOT WORKING (only suggestion)
>
>         manager_id = self._clean_manager_id(manager)
>         return self.filter(
>             # LEFT JOIN, LEFT JOIN, LEFT JOIN
> models.Q(pickup__warehouse__organization__manager_id=manager_id)
>             # LEFT JOIN, LEFT JOIN, LEFT JOIN
>             | models.Q(bus__depot__city__manager_id=manager_id)
>             # LEFT JOIN, LEFT JOIN
>             | models.Q(wagon__garage__manager_id=manager_id),
>         )
>
>     def _clean_manager_id(self, manager: typing.Union[Manager, int]) ->
> int:
>         if isinstance(manager, Manager):
>             return manager.id
>         elif isinstance(manager, int):
>             return manager
>         else:
>             raise ValueError
>

> class Airbag(models.Model):
>     serial_number = models.CharField(max_length=255)
>     state = models.IntegerField()
>
>     vehicle_content_type = models.ForeignKey(
>         'contenttypes.ContentType',
>         on_delete=models.CASCADE,
>     )
>     vehicle_id = models.IntegerField(db_index=True)
>     vehicle = GenericForeignKey('vehicle_content_type', 'vehicle_id')
>
>     objects = AirbagManager()
>

> class BaseVehicle(models.Model):
>     production_date = models.DateField()
>     airbags = GenericRelation(
>         Airbag,
>         object_id_field='vehicle_id',
>         content_type_field='vehicle_content_type',
>         related_query_name='%(class)s',
>     )
>
>     class Meta:
>         abstract = True
>

> class Pickup(BaseVehicle):
>     carrying_kg = models.FloatField()
>     warehouse = models.ForeignKey(Warehouse, on_delete=models.CASCADE)
>

> class Bus(BaseVehicle):
>     floors_number = models.IntegerField()
>     depot = models.ForeignKey(Depot, on_delete=models.CASCADE)
>

> class Wagon(BaseVehicle):
>     garage = models.ForeignKey(Garage, on_delete=models.CASCADE)
>

> Airbag.objects.filter_by_manager__union_and_inner_join(15).filter(state__gt=2)
> # Expected SQL
> """
> SELECT *
> FROM (
>   (
>     SELECT
>       "airbag"."id",
>       "airbag"."serial_number",
>       "airbag"."state",
>       "airbag"."vehicle_content_type_id",
>       "airbag"."vehicle_id"
>     FROM "airbag"
>       INNER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND
> ("airbag"."vehicle_content_type_id" = 46))
>       INNER JOIN "warehouse" ON ("pickup"."warehouse_id" =
> "warehouse"."id")
>       INNER JOIN "organization" ON ("warehouse"."organization_id" =
> "organization"."id")
>     WHERE "organization"."manager_id" = 15
>   )
>
>   UNION
>
>   (
>     SELECT
>       "airbag"."id",
>       "airbag"."serial_number",
>       "airbag"."state",
>       "airbag"."vehicle_content_type_id",
>       "airbag"."vehicle_id"
>     FROM "airbag"
>       INNER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND
> ("airbag"."vehicle_content_type_id" = 49))
>       INNER JOIN "depot" ON ("bus"."depot_id" = "depot"."id")
>       INNER JOIN "city" ON ("depot"."city_id" = "city"."id")
>     WHERE "city"."manager_id" = 15
>   )
>
>   UNION
>   (
>     SELECT
>       "airbag"."id",
>       "airbag"."serial_number",
>       "airbag"."state",
>       "airbag"."vehicle_content_type_id",
>       "airbag"."vehicle_id"
>     FROM "airbag"
>       INNER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND
> ("airbag"."vehicle_content_type_id" = 43))
>       INNER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id")
>     WHERE "garage"."manager_id" = 15
>   )
> ) AS U
> WHERE U.state > 2;
> """
>
> # VS
>
> Airbag.objects.filter_by_manager__left_join(15).filter(state__gt=2)
> # Real SQL
> """
> SELECT
>   "airbag"."id",
>   "airbag"."serial_number",
>   "airbag"."state",
>   "airbag"."vehicle_content_type_id",
>   "airbag"."vehicle_id"
> FROM "airbag"
>   LEFT OUTER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND
> ("airbag"."vehicle_content_type_id" = 46))
>   LEFT OUTER JOIN "warehouse" ON ("pickup"."warehouse_id" =
> "warehouse"."id")
>   LEFT OUTER JOIN "organization" ON ("warehouse"."organization_id" =
> "organization"."id")
>
>   LEFT OUTER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND
> ("airbag"."vehicle_content_type_id" = 49))
>   LEFT OUTER JOIN "depot" ON ("bus"."depot_id" = "depot"."id")
>   LEFT OUTER JOIN "city" ON ("depot"."city_id" = "city"."id")
>
>   LEFT OUTER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND
> ("airbag"."vehicle_content_type_id" = 43))
>   LEFT OUTER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id")
> WHERE (
>   (
>       "organization"."manager_id" = 15
>       OR "city"."manager_id" = 15
>       OR "garage"."manager_id" = 15
>   )
>   AND "airbag"."state" > 2
> );
> """
> }}}
>
> Select from a test database containing 30 million airbags with using
> UNION and INNER JOIN takes 1.1 seconds, and using LEFT JOIN - 10.0
> seconds (as the database grows, the access time will increase).
> PostgreSQL 9.6 (default Docker image).

New description:

 Hello.

 I found a discussion about the filtering of combined `QuerySet` (union,
 intersection, difference):
 [https://github.com/django/django/pull/7727#issuecomment-269283216]
 As I understood it, as a result of the discussion, it was suggested to
 filter the queries before union:
   If I understood your code correctly (I still did not test it), you are
 still generating select * from (x union y) -- while this is more powerful
 it is also unnecessary in most cases. I'd also argue that it is less
 efficient, cause filtering on that result can (usually and probably
 should) be pushed down to the underlying queries (when possible) to limit
 the result set before joining them.
 [https://github.com/django/django/pull/7727#issuecomment-269461236]

 But such a decision blocks the possibility of re-use of `QuerySet`,  which
 is very convenient when working with `GenericForeignKey`.

 **A similar example of the structure of a real project (only a structure,
 not an entities).**

 {{{
 import typing

 from django.db import models
 from django.contrib.contenttypes.fields import GenericForeignKey,
 GenericRelation


 class Manager(models.Model):
     first_name = models.CharField(max_length=255)
     last_name = models.CharField(max_length=255)


 class Organization(models.Model):
     # pickup
     title = models.CharField(max_length=255)
     foundation_date = models.DateField()
     manager = models.ForeignKey(Manager, on_delete=models.CASCADE)


 class Warehouse(models.Model):
     # pickup
     organization = models.ForeignKey(Organization,
 on_delete=models.CASCADE)
     address = models.TextField()


 class City(models.Model):
     # bus
     title = models.CharField(max_length=255)
     code = models.CharField(max_length=10)
     manager = models.ForeignKey(Manager, on_delete=models.CASCADE)


 class Depot(models.Model):
     # bus
     title = models.CharField(max_length=255)
     city = models.ForeignKey(City, on_delete=models.CASCADE)


 class Garage(models.Model):
     # wagon
     address = models.TextField()
     manager = models.ForeignKey(Manager, on_delete=models.CASCADE)


 class AirbagManager(models.Manager):
     def filter_by_manager__union_and_inner_join(
         self, manager: typing.Union[Manager, int],
     ) -> models.QuerySet:
         # NOT WORKING (only proposal)

         manager_id = self._clean_manager_id(manager)
         return (
             self
             # If I'm not mistaken, now we can't direct call `union` from
 `Manager`,
             # but this variant increases the readability
             .union(
                 # INNER JOIN, INNER JOIN, INNER JOIN
 self.filter(pickup__warehouse__organization__manager_id=manager_id),
                 # INNER JOIN, INNER JOIN, INNER JOIN
                 self.filter(bus__depot__city__manager_id=manager_id),
                 # INNER JOIN, INNER JOIN
                 self.filter(wagon__garage__manager_id=manager_id),
             )
         )

     def filter_by_manager__left_join(
         self, manager: typing.Union[Manager, int],
     ) -> models.QuerySet:
         manager_id = self._clean_manager_id(manager)
         return self.filter(
             # LEFT JOIN, LEFT JOIN, LEFT JOIN
 models.Q(pickup__warehouse__organization__manager_id=manager_id)
             # LEFT JOIN, LEFT JOIN, LEFT JOIN
             | models.Q(bus__depot__city__manager_id=manager_id)
             # LEFT JOIN, LEFT JOIN
             | models.Q(wagon__garage__manager_id=manager_id),
         )

     def _clean_manager_id(self, manager: typing.Union[Manager, int]) ->
 int:
         if isinstance(manager, Manager):
             return manager.id
         elif isinstance(manager, int):
             return manager
         else:
             raise ValueError


 class Airbag(models.Model):
     serial_number = models.CharField(max_length=255)
     state = models.IntegerField()

     vehicle_content_type = models.ForeignKey(
         'contenttypes.ContentType',
         on_delete=models.CASCADE,
     )
     vehicle_id = models.IntegerField(db_index=True)
     vehicle = GenericForeignKey('vehicle_content_type', 'vehicle_id')

     objects = AirbagManager()


 class BaseVehicle(models.Model):
     production_date = models.DateField()
     airbags = GenericRelation(
         Airbag,
         object_id_field='vehicle_id',
         content_type_field='vehicle_content_type',
         related_query_name='%(class)s',
     )

     class Meta:
         abstract = True


 class Pickup(BaseVehicle):
     carrying_kg = models.FloatField()
     warehouse = models.ForeignKey(Warehouse, on_delete=models.CASCADE)


 class Bus(BaseVehicle):
     floors_number = models.IntegerField()
     depot = models.ForeignKey(Depot, on_delete=models.CASCADE)


 class Wagon(BaseVehicle):
     garage = models.ForeignKey(Garage, on_delete=models.CASCADE)


 Airbag.objects.filter_by_manager__union_and_inner_join(15).filter(state__gt=2)
 # Expected SQL
 """
 SELECT *
 FROM (
   (
     SELECT
       "airbag"."id",
       "airbag"."serial_number",
       "airbag"."state",
       "airbag"."vehicle_content_type_id",
       "airbag"."vehicle_id"
     FROM "airbag"
       INNER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND
 ("airbag"."vehicle_content_type_id" = 46))
       INNER JOIN "warehouse" ON ("pickup"."warehouse_id" =
 "warehouse"."id")
       INNER JOIN "organization" ON ("warehouse"."organization_id" =
 "organization"."id")
     WHERE "organization"."manager_id" = 15
   )

   UNION

   (
     SELECT
       "airbag"."id",
       "airbag"."serial_number",
       "airbag"."state",
       "airbag"."vehicle_content_type_id",
       "airbag"."vehicle_id"
     FROM "airbag"
       INNER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND
 ("airbag"."vehicle_content_type_id" = 49))
       INNER JOIN "depot" ON ("bus"."depot_id" = "depot"."id")
       INNER JOIN "city" ON ("depot"."city_id" = "city"."id")
     WHERE "city"."manager_id" = 15
   )

   UNION
   (
     SELECT
       "airbag"."id",
       "airbag"."serial_number",
       "airbag"."state",
       "airbag"."vehicle_content_type_id",
       "airbag"."vehicle_id"
     FROM "airbag"
       INNER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND
 ("airbag"."vehicle_content_type_id" = 43))
       INNER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id")
     WHERE "garage"."manager_id" = 15
   )
 ) AS U
 WHERE U.state > 2;
 """

 # VS

 Airbag.objects.filter_by_manager__left_join(15).filter(state__gt=2)
 # Real SQL
 """
 SELECT
   "airbag"."id",
   "airbag"."serial_number",
   "airbag"."state",
   "airbag"."vehicle_content_type_id",
   "airbag"."vehicle_id"
 FROM "airbag"
   LEFT OUTER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND
 ("airbag"."vehicle_content_type_id" = 46))
   LEFT OUTER JOIN "warehouse" ON ("pickup"."warehouse_id" =
 "warehouse"."id")
   LEFT OUTER JOIN "organization" ON ("warehouse"."organization_id" =
 "organization"."id")

   LEFT OUTER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND
 ("airbag"."vehicle_content_type_id" = 49))
   LEFT OUTER JOIN "depot" ON ("bus"."depot_id" = "depot"."id")
   LEFT OUTER JOIN "city" ON ("depot"."city_id" = "city"."id")

   LEFT OUTER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND
 ("airbag"."vehicle_content_type_id" = 43))
   LEFT OUTER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id")
 WHERE (
   (
       "organization"."manager_id" = 15
       OR "city"."manager_id" = 15
       OR "garage"."manager_id" = 15
   )
   AND "airbag"."state" > 2
 );
 """
 }}}

 Select from a test database containing 30 million airbags with using UNION
 and INNER JOIN takes 1.1 seconds, and using LEFT JOIN - 10.0 seconds (as
 the database grows, the access time will increase). PostgreSQL 9.6
 (default Docker image).

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/28519#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/063.d90b25e696f923f20b23564cf4220c41%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to