Greetings, Jazz Guitarists,

I've briefly talked about this with Markus and he mentioned that the 
subject was already brought up by Tyson Clugg but I think it deserves a 
proper discussion here.

I'm typing this from the comfort of Django: Under the Hood sprints so 
please excuse poor grammar and the somewhat chaotic explanations that 
follow. I'm very tired and English is not my mother tongue. This is not a 
DEP but merely a stream of consciousness I'd love to get some feedback on.

Here are some of the problems we face when dealing with migrations:

1. Dependency resolution that turns the migration dependency graph into an 
ordered list happens every time you try to create or execute a migration. 
If you have several hundred migrations it becomes quite slow. I'm talking 
multiple minutes kind of slow. As you can imagine working with multiple 
branches or perfecting your migrations quickly becomes a tedious task.

2. Dependency resolution is only stable as long as the migration set is 
frozen. Sometimes introducing a new migration is enough to break existing 
migrations by causing them to execute in a slightly different order. We 
often have to backtrack and edit existing migrations and enforce a strict 
resolution order by introducing arbitrary dependencies.

3. Removing an app from a project is a nightmare. You can't migrate to zero 
state unless the app is still there. There is no way to add "revert all 
migrations for app X" to the migration graph, it's something you need to 
run manually. There is no clean way to remove an app that was ever 
references in a relation. We were forced to do all kinds of hacks to get 
around this. Sometimes it's necessary to create an empty eggshell app with 
the same name and copy all migrations there then add necessary data 
migrations and finally migrations that remove all the models, indices, 
procedures etc. Sometimes people just leave a dead application in 
INSTALLED_APPS to not have to deal with this.

4. Squashing migrations is wonky at best. If you create a model in one 
migration, alter one of its fields in another and then finally drop the 
model sometime later, the squashed migration will have Django try to 
execute the alter first and complain about the table not being there. Also 
the only reason we need to squash migrations is to prevent problem 1 above 
from becoming exponentially worse. If migrations were only as slow as the 
underlying SQL commands, we'd likely never squash them.

5. There's no simple way to roll back all the migrations introduced after a 
particular point in time which is very useful when working with multiple 
feature branches. In my current project dropping the database means having 
to reimport over 200 MB of data snapshots. Switching branches requires me 
to look at branch diffs to determine which migrations to revert.

6. Conflict detection and resolution (migrate --merge) is a make-believe 
solution. It just trains people to execute the command without 
investigating whether their migration history still makes sense.


Some of these I need to dig deeper into and probably file proper tickets. 
For example I have an idea on how to fix 4 but it would make 1 even slower.

I took some time to get a good long look at what other ORMs are doing. The 
graph-based dependency solving approach is rather uncommon. Most systems 
treat migrations as part of the project rather than the packages it uses.


Possible solution (or "how I'd build it today if there was no existing code 
in Django core"):

a. Make migrations part of the project and not individual apps. This takes 
care of problem 3 above.

b. Prefix individual migration files with a UTC timestamp 
(20161105151023_add_foo) to provide a strict sorting order. This removes 
the depsolving requirement and takes care of 1 and 2. By eliminating those 
it makes 4 kind of obsolete as squashing migrations would become pointless.

c. Have reusable apps provide migration templates that Django then copies 
to my project when "makemigrations" is run.

d. Maintain a separate directory for each database connection.

e. Execute all migrations in alphabetical order (which means by timestamp 
first). When an unapplied migration is followed by an applied one, ask 
whether to attempt to just apply it or if the user wants to first unapply 
migrations that came after it. To me this would work better than 6.

f. Migrating to a timestamp solves 5.


Of course we do have migration support in core and it's not compatible with 
most of the above list. Any ideas? I think serializing the dependency 
solver state and reusing it between runs could be a pretty low hanging 
fruit (like "npm shrinkwrap" or yarn's lock file).

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/b5a14b65-05f0-4282-a741-e9e8bef213ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to