#17276: Slow anti-join query against Postgres -------------------------------------+------------------------------------- Reporter: dmitry@… | Owner: nobody Type: | Status: new Cleanup/optimization | Version: 1.3 Component: Database layer | Resolution: (models, ORM) | Triage Stage: Accepted Severity: Normal | Needs documentation: 0 Keywords: orm, postgres, | Patch needs improvement: 0 join, anti-join | UI/UX: 0 Has patch: 0 | Needs tests: 0 | Easy pickings: 0 | -------------------------------------+------------------------------------- Description changed by ramiro:
Old description: > For background on this ticket please see the following two discussions: > > * http://stackoverflow.com/questions/8190474/postgres-slow-outer-query- > when-using-a-primary-key > * https://groups.google.com/group/django- > users/browse_thread/thread/7d13f2d8748b4f9f > > Basically, in a many-to-many mappings between models Student and Course, > if I want to find all instances of Students that aren't registered for > classes, I would issue the following Django query: > > Student.objects.filter(course__isnull=True) > > Django translates this into the following query: > > SELECT "student"."id", "student"."name" > FROM "student" > LEFT OUTER JOIN "course_students" > ON ("student"."id" = "course_students"."student_id") > LEFT OUTER JOIN "course" > ON ("course_students"."course_id" = "course"."id") > WHERE "course"."id" IS NULL > > The problem is that the way the WHERE clause is generated is very > inefficient (at least when used with Postgres). Changing WHERE to > "course_students"."student_id" IS NOT NULL yields orders of magnitude > improved query plan. Here's the difference I'm seeing on real data: > > Django way: (cost=1072479.36..6256437.79 rows=1 width=145) > Hand-crafted: (cost=1518.71..1533.35 rows=1 width=145) > > I'm attaching a sample project with the model already set up. To see the > generated SQL query, simply run "python manage.py anti-join." New description: For background on this ticket please see the following two discussions: * http://stackoverflow.com/questions/8190474/postgres-slow-outer-query- when-using-a-primary-key * https://groups.google.com/group/django- users/browse_thread/thread/7d13f2d8748b4f9f Basically, in a many-to-many mappings between models Student and Course, if I want to find all instances of Students that aren't registered for classes, I would issue the following Django query: {{{ Student.objects.filter(course__isnull=True) }}} Django translates this into the following query: {{{ SELECT "student"."id", "student"."name" FROM "student" LEFT OUTER JOIN "course_students" ON ("student"."id" = "course_students"."student_id") LEFT OUTER JOIN "course" ON ("course_students"."course_id" = "course"."id") WHERE "course"."id" IS NULL }}} The problem is that the way the WHERE clause is generated is very inefficient (at least when used with Postgres). Changing WHERE to "course_students"."student_id" IS NOT NULL yields orders of magnitude improved query plan. Here's the difference I'm seeing on real data: * Django way: (cost=1072479.36..6256437.79 rows=1 width=145) * Hand-crafted: (cost=1518.71..1533.35 rows=1 width=145) I'm attaching a sample project with the model already set up. To see the generated SQL query, simply run "python manage.py anti-join." -- -- Ticket URL: <https://code.djangoproject.com/ticket/17276#comment:2> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-updates@googlegroups.com. To unsubscribe from this group, send email to django-updates+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-updates?hl=en.