Re: [Django] #34597: Django ORM query SQL generation changed (and is apparently a lot slower)
#34597: Django ORM query SQL generation changed (and is apparently a lot slower) -+- Reporter: Lorand Varga |Owner: nobody Type: | Status: new Cleanup/optimization | Component: Database layer | Version: 3.2 (models, ORM) | Severity: Normal | Resolution: Keywords: | Triage Stage: | Unreviewed Has patch: 0| Needs documentation: 0 Needs tests: 0| Patch needs improvement: 0 Easy pickings: 0|UI/UX: 0 -+- Changes (by Lorand Varga): * status: closed => new * resolution: needsinfo => Comment: Adding more information: Django 2.2 behavior: {{{ EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" = True AND "test_app_blog"."api_has_translation" = False AND NOT ( "test_app_blog"."id" IN ( SELECT U0."id" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = ("test_app_blog"."id") ) ) ) ) ORDER BY "test_app_blog"."updated_at" DESC Sort (cost=1047131.99..1047357.72 rows=90293 width=1985) (actual time=888.530..888.533 rows=4 loops=1) Sort Key: test_app_blog.updated_at DESC Sort Method: quicksort Memory: 33kB -> Index Scan using test_app_blog_is_published_4b47c652_uniq on test_app_blog (cost=0.42..995257.82 rows=90293 width=1985) (actual time=72.414..888.496 rows=4 loops=1) Index Cond: (is_published = true) Filter: ((NOT api_has_translation) AND (NOT (SubPlan 1))) Rows Removed by Filter: 191941 SubPlan 1 -> Nested Loop Left Join (cost=0.71..8.57 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=179902) Join Filter: (u0.id = u1.blog_id) Filter: (u1.id IS NULL) Rows Removed by Filter: 0 -> Index Only Scan using test_app_blog_pkey on test_app_blog u0 (cost=0.42..2.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=179902) Index Cond: (id = test_app_blog.id) Heap Fetches: 27765 -> Index Scan using test_app_translation_51c6d5db on test_app_translation u1 (cost=0.29..6.07 rows=5 width=8) (actual time=0.001..0.001 rows=0 loops=179902) Index Cond: (blog_id = test_app_blog.id) Planning Time: 0.500 ms Execution Time: 888.606 ms }}} Django 3.2 (and 4.2) behavior: {{{ EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "test_app_blog"."id" AND "test_app_blog"."id" = "test_app_blog"."id" ) LIMIT 1 ) ) ) ORDER BY "test_app_blog"."updated_at" DESC Nested Loop Anti Join (cost=1001.15..131927.02 rows=180586 width=1985) (actual time=208264.409..3034435.256 rows=4 loops=1) Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id = test_app_blog.id)) Rows Removed by Join Filter: 16853800739 -> Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on test_app_blog (cost=0.42..107115.43 rows=180586 width=1985) (actual time=1.080..17454.750
Re: [Django] #34597: Django ORM query SQL generation changed (and is apparently a lot slower)
#34597: Django ORM query SQL generation changed (and is apparently a lot slower) -+- Reporter: Lorand Varga |Owner: nobody Type: | Status: closed Cleanup/optimization | Component: Database layer | Version: 3.2 (models, ORM) | Severity: Normal | Resolution: needsinfo Keywords: | Triage Stage: | Unreviewed Has patch: 0| Needs documentation: 0 Needs tests: 0| Patch needs improvement: 0 Easy pickings: 0|UI/UX: 0 -+- Changes (by Simon Charette): * status: new => closed * severity: Release blocker => Normal * cc: Simon Charette (added) * type: Bug => Cleanup/optimization * version: 4.2 => 3.2 * resolution: => needsinfo Comment: More likely to be related to #32143 (8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1) which switched to using `EXISTS` over `NOT IN` and was already part of Django 3.2. AFAIK this is the only report in three years about this change making things slower and the fact it jumped from a couple of seconds to hours makes me think there might be something else at play here (lack of analyze?). FWIW this change was advised [https://www.percona.com/blog/sql- optimizations-in-postgresql-in-vs-exists-vs-any-all-vs-join/ by this article] and examples in the wild of `NOT IN` performing poorly. Without more details such as the Postgres execution plans (use `EXPLAIN` on the query) it's very hard to provide any guidance here so I'll close as ''needsinfo'' for now as I cannot reproduce any form of slowdown with the provided model. The `AND "blog"."id" = "blog"."id"` part seems definitely fishy though so it'd be great to see if you get fast results without it as that appear to be a bug. In the mean time you can use `~Q(id__in=Blog.objects.filter(translation=None, id=OuterRef("id")))` instead of `~Q(translation=None)`. -- Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:1> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/0107018853e4646b-54496b3e-4852-4d0e-8bc8-92ee048ac4e2-00%40eu-central-1.amazonses.com.
[Django] #34597: Django ORM query SQL generation changed (and is apparently a lot slower)
#34597: Django ORM query SQL generation changed (and is apparently a lot slower) -+- Reporter: Lorand | Owner: nobody Varga | Type: Bug| Status: new Component: Database |Version: 4.2 layer (models, ORM)| Severity: Release| Keywords: blocker| Triage Stage: | Has patch: 0 Unreviewed | Needs documentation: 0 |Needs tests: 0 Patch needs improvement: 0 | Easy pickings: 0 UI/UX: 0 | -+- My app has 2 models (making things simple so debugging is easier): {{{ class Blog(models.Model): title = models.CharField(max_length=60, blank=True) is_published = models.BooleanField(default=generate_random_bool, db_index=True) class Translation(models.Model): blog = models.ForeignKey('Blog', on_delete=models.CASCADE, related_name='translation') }}} There is a reverse foreign key relation between translation and Blog. Also a blog can have multiple translations (or none). I have this code block: {{{ from django.db.models import Q qs = Blog.objects.filter(Q(is_published=True) & ~Q(translation=None)) print(qs.query) }}} django 2.2 ORM generates this output: {{{ SELECT "blog"."id", "blog"."title", "blog"."is_published" FROM "blog" WHERE ( "blog"."is_published" = True AND NOT ( "blog"."id" IN ( SELECT U0."id" FROM "blog" U0 LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = ("blog"."id") ) ) ) ) }}} For the same code block the django 3.2+ ORM (tried also the latest version of django 4.2.1) it generates: {{{ SELECT "blog"."id", "blog"."title", "blog"."is_published" FROM "blog" WHERE ( "blog"."is_published" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "blog" U0 LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "blog"."id" AND "blog"."id" = "blog"."id" ) LIMIT 1 ) ) ) }}} which is a whole lot slower (at least on the same postgresql 13.7 instance). The django 2.2 version executes in a couple of seconds while the newer version executes in half an hour. I think (but am not completely sure, might be terribly wrong) that the issue was introduced here: https://github.com/django/django/pull/13300/files Environment: asgiref==3.7.1 backports.zoneinfo==0.2.1 Django==3.2.19 model-bakery==1.11.0 psycopg2-binary==2.8.6 pytz==2023.3 sqlparse==0.4.4 typing_extensions==4.6.1 Db environment: PostgreSQL 13.7 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit Bug does not manifest in django 2.2.12. Bug manifests in django 3.2 and django 4.2. -- Ticket URL: <https://code.djangoproject.com/ticket/34597> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/01070188537179c3-15612b98-395c-4d46-814a-db7b517e619c-00%40eu-central-1.amazonses.com.