#31869: Improving data migration using `dumpdata` and `loaddata`
-------------------------------------+-------------------------------------
               Reporter:  Matthijs   |          Owner:  nobody
  Kooijman                           |
                   Type:  New        |         Status:  new
  feature                            |
              Component:  Core       |        Version:  3.1
  (Management commands)              |
               Severity:  Normal     |       Keywords:
           Triage Stage:             |      Has patch:  0
  Unreviewed                         |
    Needs documentation:  0          |    Needs tests:  0
Patch needs improvement:  0          |  Easy pickings:  0
                  UI/UX:  0          |
-------------------------------------+-------------------------------------
 At first glance, using `manage.py dumpdata` and `loaddata` together seems
 a great way to make a full copy of an existing django installation (e.g.
 for migrating to a different server, or getting a local copy of your
 production data, etc.).


 Documentation suggests this should be possible. An obvious way would be to
 do `dumpdata` on one system, followed by `flush` and `loaddata` on the
 other system.

 However when you try it, you get issues with duplicate keys in the
 contenttypes and similar tables, things like:

 {{{
 MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry 'someapp-
 somemodel' for key 'django_content_type_app_label_model_76bd3d3b_uniq'")
 }}}

 What seems to happen is that `flush`
 ([https://docs.djangoproject.com/en/dev/ref/django-admin/#flush as
 documented]) flushes all tables and then reruns "post-synchronization
 handlers", which create content-types and I think permissions and maybe
 other things as well. Since `dumpdata` does dump these tables, this
 creates a conflict.

 Currently, I think you can  prevent this by:
  - Making and importing a full database dump outside of Django (e.g. using
 mysqldump). This is a good way to guarantee a really identical copy
 (though there might be timezone issues with e.g. Mysql), but is often less
 convenient and does not work across database types (e.g. dumping a remote
 MySQL database to a local sqlite database).
  - Using natural keys when dumping. The
 [https://docs.djangoproject.com/en/dev/ref/django-admin/#dumpdata
 documentation for `dumpdata --natural-foreign`] suggests using natural
 keys when contenttypes and permissions are involved. I believe this works
 because the natural foreign keys allow associating any references to these
 tables to the autocreated versions in the original database. In addition,
 and I think the documentation does not make this explicit, you would also
 need to exclude the contenttypes, permissions and any other auto-created
 models from the dumpdata, or also add `--natural primary`, which I believe
 makes loaddata overwrite existing data based on the natural primary key
 rather than adding new data. [[BR]]
    Having to manually exclude models is quite cumbersome for a quick dump-
 and-load cycle. Also, if the dumped database would somehow contain *less*
 contenttypes, permissions, etc. than the autocreated ones, the newly
 loaded database would still contain the extra ones. More generally, the
 loaded database is not an identical copy of the original one.[[BR]]
    I also had some issues with this approach, due to circular references
 in my natural keys, but I think this has since been fixed in git.


 I wonder if we can make this process easier somewhow?

 One solution that springs to mind is to add a `flush --no-handlers` option
 (or something like that), to prevent running the "post synchronization
 handlers". This would (should) result in empty tables for all tables that
 are dumped by `dumpdata` (I think this means all tables empty, except for
 the migration table). Then doing a `dumpdata`, `flush --no-handlers` and
 `loaddata` could, I think, produce an exact copy of the database,
 including matching primary keys.

 Or are there any other existing ways to make this easier that I missed
 and/or could be (better) documented?

-- 
Ticket URL: <https://code.djangoproject.com/ticket/31869>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/059.bfb9c89b8456ac9e8efbe8c433fe559c%40djangoproject.com.

Reply via email to