Hello everyone, I spent a couple hours last night trying to improve the migration squasher optimizer (migrations were taking almost 15 minutes in CI). I came up with a couple ideas for anyone interested in improvements: 1- Having an interactive mode for squashing would be interested. Currently, when squashing migrations, I do the following:
- Generate an initial squash - Edit it (namely, move around operations to get more optimizations to work) - remove the "replaces" tag, then rerun migration squashing to "re-optimize" - repeat until I get something I like, then add the original "replaces" tag It would be cool if instead, the process were (with a flag): - Generate an initial squash, but have the process wait for confirmation to "commit" this squash as final (though writing out the file) - Edit the file, and tell the process to try re-optimizing with the same file (getting around the "no-squash of squashes" rule) - Potentially, allow us to also step back For example, the "squashmigrations" command output could look like: generated 0001_squashed_mig.py > optmize migration[yN]? > <user inputs y> > regenerated 0001_squashed_mig.py > ( 20 operations -> 10 operations) > optimize migration[Ynr]? > <user inputs y> > regenerated 0001_squashed_mig.py > ( No change in operation count) > optimize migration[Ynr]? > <user inputs r> > rolled back to previous version > optimize migration[Ynr]? > <user inputs n> > Saved migration > A simpler version of this command would simply be to add an "optimizemigration" command that just reads in a single migration and optimizes the operations, without touching any of the squashiness. 2- The reducer might be a bit too pessimistic Currently, the optimizer lets "reduce" operations (that take 2 operations and return 0,1, or 2 operations, or None if nothing can be change) do whatever they want. Because of that, if you have [A, B, C ,D] and B depends on A, you can't reduce A and C because the reduction might remove A. In reality there are two kinds of reduction operations that we could be taking into account: - reducing "left". if you have [A, B, C, D] (for example, A is a CreateModel , C is an AddField for the same model), and you can reduce A and C into just A' (A with C), giving [A', B, D], then it doesn't matter that B depends on A. The thing that matters is if C depends on B (for example, C adds a foreign key to a model created in B). This is actually already encoded in the CreateMode + AddField reduction, but is perhaps a more general case. In a sense, reducing A and C "to the left" means that we're bringing A and C closer together only by moving C. This is a major part of the potential reductions that the current optimizer is missing. - reducing right. If you have [A, B, C, D] (for example, A is a CreateModel, C is a RemoveModel for the same model), and you can reduce A and C into just C' (C with A), giving [B, C', D], then it does matter that B depends on A. C can't depend on B (assuming causality holds in our universe) This is the current mechanism, essentially. If B depends on A, then you can't move A past B. Removing both operations is a special case of reducing right (You can make C' into a no-op). I had monkeypatched a special case of reducing left (taking CreateModel, AddField of different models and swapping them . For example [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), AddField(A.foo), CreateModel(B)]) and got decent results, but I think making the optimization code express these two concepts separately would catch even more of the optimizations I saw that the optimizer didn't. I hope some of this is useful Raphael -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6042857b-fd48-467a-badc-1cf5305ecac4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.