Re: [Django] #34597: Django ORM query SQL generation changed (and is apparently a lot slower)

2023-05-28 Thread Django
#34597: Django ORM query SQL generation changed (and is apparently a lot slower)
-+-
 Reporter:  Lorand Varga |Owner:  nobody
 Type:   |   Status:  new
  Cleanup/optimization   |
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Lorand Varga):

 * status:  closed => new
 * resolution:  needsinfo =>


Comment:

 Adding more information:

 Django 2.2 behavior:

 {{{
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
 "test_app_blog"."is_published" = True
 AND "test_app_blog"."api_has_translation" = False
 AND NOT (
   "test_app_blog"."id" IN (
 SELECT
   U0."id"
 FROM
   "test_app_blog" U0
   LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
 U1."blog_id")
 WHERE
   (
 U1."id" IS NULL
 AND U0."id" = ("test_app_blog"."id")
   )
   )
 )
   )
 ORDER BY
   "test_app_blog"."updated_at" DESC

 Sort  (cost=1047131.99..1047357.72 rows=90293 width=1985) (actual
 time=888.530..888.533 rows=4 loops=1)
   Sort Key: test_app_blog.updated_at DESC
   Sort Method: quicksort  Memory: 33kB
   ->  Index Scan using test_app_blog_is_published_4b47c652_uniq on
 test_app_blog  (cost=0.42..995257.82 rows=90293 width=1985) (actual
 time=72.414..888.496 rows=4 loops=1)
 Index Cond: (is_published = true)
 Filter: ((NOT api_has_translation) AND (NOT (SubPlan 1)))
 Rows Removed by Filter: 191941
 SubPlan 1
   ->  Nested Loop Left Join  (cost=0.71..8.57 rows=1 width=4)
 (actual time=0.003..0.003 rows=1 loops=179902)
 Join Filter: (u0.id = u1.blog_id)
 Filter: (u1.id IS NULL)
 Rows Removed by Filter: 0
 ->  Index Only Scan using test_app_blog_pkey on
 test_app_blog u0  (cost=0.42..2.44 rows=1 width=4) (actual
 time=0.002..0.002 rows=1 loops=179902)
   Index Cond: (id = test_app_blog.id)
   Heap Fetches: 27765
 ->  Index Scan using test_app_translation_51c6d5db on
 test_app_translation u1  (cost=0.29..6.07 rows=5 width=8) (actual
 time=0.001..0.001 rows=0 loops=179902)
   Index Cond: (blog_id = test_app_blog.id)
 Planning Time: 0.500 ms
 Execution Time: 888.606 ms
 }}}

 Django 3.2 (and 4.2) behavior:
 {{{
 EXPLAIN ANALYZE
 SELECT
   "test_app_blog"."id",
   "test_app_blog"."created_at",
   "test_app_blog"."updated_at",
   "test_app_blog"."is_published",
   "test_app_blog"."api_has_translation"
 FROM
   "test_app_blog"
 WHERE
   (
 "test_app_blog"."is_published"
 AND NOT "test_app_blog"."api_has_translation"
 AND NOT (
   EXISTS(
 SELECT
   (1) AS "a"
 FROM
   "test_app_blog" U0
   LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" =
 U1."blog_id")
 WHERE
   (
 U1."id" IS NULL
 AND U0."id" = "test_app_blog"."id"
 AND "test_app_blog"."id" = "test_app_blog"."id"
   )
 LIMIT
   1
   )
 )
   )
 ORDER BY
   "test_app_blog"."updated_at" DESC

  Nested Loop Anti Join  (cost=1001.15..131927.02 rows=180586 width=1985)
 (actual time=208264.409..3034435.256 rows=4 loops=1)
Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id =
 test_app_blog.id))
Rows Removed by Join Filter: 16853800739
->  Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on
 test_app_blog  (cost=0.42..107115.43 rows=180586 width=1985) (actual
 time=1.080..17454.750 

Re: [Django] #34597: Django ORM query SQL generation changed (and is apparently a lot slower)

2023-05-25 Thread Django
#34597: Django ORM query SQL generation changed (and is apparently a lot slower)
-+-
 Reporter:  Lorand Varga |Owner:  nobody
 Type:   |   Status:  closed
  Cleanup/optimization   |
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:  needsinfo
 Keywords:   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-
Changes (by Simon Charette):

 * status:  new => closed
 * severity:  Release blocker => Normal
 * cc: Simon Charette (added)
 * type:  Bug => Cleanup/optimization
 * version:  4.2 => 3.2
 * resolution:   => needsinfo


Comment:

 More likely to be related to #32143
 (8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1) which switched to using
 `EXISTS` over `NOT IN` and was already part of Django 3.2.

 AFAIK this is the only report in three years about this change making
 things slower and the fact it jumped from a couple of seconds to hours
 makes me think there might be something else at play here (lack of
 analyze?). FWIW this change was advised [https://www.percona.com/blog/sql-
 optimizations-in-postgresql-in-vs-exists-vs-any-all-vs-join/ by this
 article] and examples in the wild of `NOT IN` performing poorly.

 Without more details such as the Postgres execution plans (use `EXPLAIN`
 on the query) it's very hard to provide any guidance here so I'll close as
 ''needsinfo'' for now as I cannot reproduce any form of slowdown with the
 provided model. The `AND "blog"."id" = "blog"."id"` part seems definitely
 fishy though so it'd be great to see if you get fast results without it as
 that appear to be a bug.

 In the mean time you can use
 `~Q(id__in=Blog.objects.filter(translation=None, id=OuterRef("id")))`
 instead of `~Q(translation=None)`.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34597#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018853e4646b-54496b3e-4852-4d0e-8bc8-92ee048ac4e2-00%40eu-central-1.amazonses.com.


[Django] #34597: Django ORM query SQL generation changed (and is apparently a lot slower)

2023-05-25 Thread Django
#34597: Django ORM query SQL generation changed (and is apparently a lot slower)
-+-
   Reporter:  Lorand |  Owner:  nobody
  Varga  |
   Type:  Bug| Status:  new
  Component:  Database   |Version:  4.2
  layer (models, ORM)|
   Severity:  Release|   Keywords:
  blocker|
   Triage Stage: |  Has patch:  0
  Unreviewed |
Needs documentation:  0  |Needs tests:  0
Patch needs improvement:  0  |  Easy pickings:  0
  UI/UX:  0  |
-+-
 My app has 2 models (making things simple so debugging is easier):
 {{{
 class Blog(models.Model):
   title = models.CharField(max_length=60, blank=True)
   is_published = models.BooleanField(default=generate_random_bool,
 db_index=True)

 class Translation(models.Model):
   blog = models.ForeignKey('Blog', on_delete=models.CASCADE,
 related_name='translation')
 }}}

 There is a reverse foreign key relation between translation and Blog. Also
 a blog can have multiple translations (or none).

 I have this code block:
 {{{
 from django.db.models import Q
 qs = Blog.objects.filter(Q(is_published=True) & ~Q(translation=None))
 print(qs.query)
 }}}

 django 2.2 ORM generates this output:
 {{{
 SELECT
   "blog"."id",
   "blog"."title",
   "blog"."is_published"
 FROM
   "blog"
 WHERE
   (
 "blog"."is_published" = True
 AND NOT (
   "blog"."id" IN (
 SELECT
   U0."id"
 FROM
   "blog" U0
   LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id")
 WHERE
   (
 U1."id" IS NULL
 AND U0."id" = ("blog"."id")
   )
   )
 )
   )
 }}}

 For the same code block the django 3.2+ ORM (tried also the latest version
 of django 4.2.1) it generates:
 {{{
 SELECT
   "blog"."id",
   "blog"."title",
   "blog"."is_published"
 FROM
   "blog"
 WHERE
   (
 "blog"."is_published"
 AND NOT (
   EXISTS(
 SELECT
   (1) AS "a"
 FROM
   "blog" U0
   LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id")
 WHERE
   (
 U1."id" IS NULL
 AND U0."id" = "blog"."id"
 AND "blog"."id" = "blog"."id"
   )
 LIMIT
   1
   )
 )
   )
 }}}
 which is a whole lot slower (at least on the same postgresql 13.7
 instance).
 The django 2.2 version executes in a couple of seconds while the newer
 version executes in half an hour.

 I think (but am not completely sure, might be terribly wrong) that the
 issue was introduced here:
 https://github.com/django/django/pull/13300/files

 Environment:
 asgiref==3.7.1
 backports.zoneinfo==0.2.1
 Django==3.2.19
 model-bakery==1.11.0
 psycopg2-binary==2.8.6
 pytz==2023.3
 sqlparse==0.4.4
 typing_extensions==4.6.1

 Db environment:
 PostgreSQL 13.7 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1
 20180712 (Red Hat 7.3.1-6), 64-bit

 Bug does not manifest in django 2.2.12.
 Bug manifests in django 3.2 and django 4.2.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34597>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070188537179c3-15612b98-395c-4d46-814a-db7b517e619c-00%40eu-central-1.amazonses.com.