Re: [Django] #31202: Bulk update suffers from poor performance with large numbers of models and columns

Django Fri, 31 Jan 2025 10:24:09 -0800

#31202: Bulk update suffers from poor performance with large numbers of models 
and
columns
-------------------------------------+-------------------------------------
     Reporter:  Tom Forbes           |                    Owner:  Tom
         Type:                       |  Forbes
  Cleanup/optimization               |                   Status:  assigned
    Component:  Database layer       |                  Version:  dev
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Adam Sołtysik):

Even though the thread specifically mentions "large numbers of columns",
performance issues are noticeable even with something as simple as
`ManyToManyField`.

Let's say I have 1 million users to add to a group. Django's
`group.users.add(*user_ids)` takes 38 seconds, while the same SQL query
built directly in Python takes 10 seconds. Similarly, to remove all the
users, `group.users.remove(*user_ids)` takes 7.5 seconds, while a raw SQL
query takes 2 seconds. This is a ~4x performance difference (not even
considering how much of the time is used by the DB), and it's not the
biggest I've seen after rewriting some other queries. I'm struggling to
imagine what could be taking so long for Django to build those.

In our project we also tried using the `django-bulk-load` library, as
suggested above. It's certainly faster, but there are still some issues.
First, the library still requires creating Django object instances, which
is another known bottleneck (discussed e.g. in
https://forum.djangoproject.com/t/how-to-avoid-the-overhead-of-model-
instances-in-bulk-create/25538). Second, the `COPY FROM` approach actually
turns out to be slower than a direct `INSERT INTO` in our Postgres
database. Overall, our bare SQL queries ended up being 2x-3x faster than
operations performed with `django-bulk-load`, which seems to be worth the
slight increase in code length.
--
Ticket URL: <https://code.djangoproject.com/ticket/31202#comment:16>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-updates+unsubscr...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/01070194bd9ab9ed-984ae776-e100-4fd0-9573-c8684f60e175-000000%40eu-central-1.amazonses.com.

Re: [Django] #31202: Bulk update suffers from poor performance with large numbers of models and columns

Reply via email to