#31202: Bulk update suffers from poor performance with large numbers of models 
and
columns
-------------------------------------+-------------------------------------
     Reporter:  Tom Forbes           |                    Owner:  Tom
         Type:                       |  Forbes
  Cleanup/optimization               |                   Status:  assigned
    Component:  Database layer       |                  Version:  dev
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Comment (by Adam Sołtysik):

 Even though the thread specifically mentions "large numbers of columns",
 performance issues are noticeable even with something as simple as
 `ManyToManyField`.

 Let's say I have 1 million users to add to a group. Django's
 `group.users.add(*user_ids)` takes 38 seconds, while the same SQL query
 built directly in Python takes 10 seconds. Similarly, to remove all the
 users, `group.users.remove(*user_ids)` takes 7.5 seconds, while a raw SQL
 query takes 2 seconds. This is a ~4x performance difference (not even
 considering how much of the time is used by the DB), and it's not the
 biggest I've seen after rewriting some other queries. I'm struggling to
 imagine what could be taking so long for Django to build those.

 In our project we also tried using the `django-bulk-load` library, as
 suggested above. It's certainly faster, but there are still some issues.
 First, the library still requires creating Django object instances, which
 is another known bottleneck (discussed e.g. in
 https://forum.djangoproject.com/t/how-to-avoid-the-overhead-of-model-
 instances-in-bulk-create/25538). Second, the `COPY FROM` approach actually
 turns out to be slower than a direct `INSERT INTO` in our Postgres
 database. Overall, our bare SQL queries ended up being 2x-3x faster than
 operations performed with `django-bulk-load`, which seems to be worth the
 slight increase in code length.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/31202#comment:16>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070194bd9ab9ed-984ae776-e100-4fd0-9573-c8684f60e175-000000%40eu-central-1.amazonses.com.

Reply via email to