#31869: Improving data migration using `dumpdata` and `loaddata`
-------------------------------------+-------------------------------------
Reporter: Matthijs | Owner: nobody
Kooijman |
Type: New | Status: new
feature |
Component: Core | Version: 3.1
(Management commands) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
At first glance, using `manage.py dumpdata` and `loaddata` together seems
a great way to make a full copy of an existing django installation (e.g.
for migrating to a different server, or getting a local copy of your
production data, etc.).
Documentation suggests this should be possible. An obvious way would be to
do `dumpdata` on one system, followed by `flush` and `loaddata` on the
other system.
However when you try it, you get issues with duplicate keys in the
contenttypes and similar tables, things like:
{{{
MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry 'someapp-
somemodel' for key 'django_content_type_app_label_model_76bd3d3b_uniq'")
}}}
What seems to happen is that `flush`
([https://docs.djangoproject.com/en/dev/ref/django-admin/#flush as
documented]) flushes all tables and then reruns "post-synchronization
handlers", which create content-types and I think permissions and maybe
other things as well. Since `dumpdata` does dump these tables, this
creates a conflict.
Currently, I think you can prevent this by:
- Making and importing a full database dump outside of Django (e.g. using
mysqldump). This is a good way to guarantee a really identical copy
(though there might be timezone issues with e.g. Mysql), but is often less
convenient and does not work across database types (e.g. dumping a remote
MySQL database to a local sqlite database).
- Using natural keys when dumping. The
[https://docs.djangoproject.com/en/dev/ref/django-admin/#dumpdata
documentation for `dumpdata --natural-foreign`] suggests using natural
keys when contenttypes and permissions are involved. I believe this works
because the natural foreign keys allow associating any references to these
tables to the autocreated versions in the original database. In addition,
and I think the documentation does not make this explicit, you would also
need to exclude the contenttypes, permissions and any other auto-created
models from the dumpdata, or also add `--natural primary`, which I believe
makes loaddata overwrite existing data based on the natural primary key
rather than adding new data. [[BR]]
Having to manually exclude models is quite cumbersome for a quick dump-
and-load cycle. Also, if the dumped database would somehow contain *less*
contenttypes, permissions, etc. than the autocreated ones, the newly
loaded database would still contain the extra ones. More generally, the
loaded database is not an identical copy of the original one.[[BR]]
I also had some issues with this approach, due to circular references
in my natural keys, but I think this has since been fixed in git.
I wonder if we can make this process easier somewhow?
One solution that springs to mind is to add a `flush --no-handlers` option
(or something like that), to prevent running the "post synchronization
handlers". This would (should) result in empty tables for all tables that
are dumped by `dumpdata` (I think this means all tables empty, except for
the migration table). Then doing a `dumpdata`, `flush --no-handlers` and
`loaddata` could, I think, produce an exact copy of the database,
including matching primary keys.
Or are there any other existing ways to make this easier that I missed
and/or could be (better) documented?
--
Ticket URL: <https://code.djangoproject.com/ticket/31869>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/059.bfb9c89b8456ac9e8efbe8c433fe559c%40djangoproject.com.