Hello everyone,

   I spent a couple hours last night trying to improve the migration 
squasher optimizer (migrations were taking almost 15 minutes in CI). I came 
up with a couple ideas for anyone interested in improvements:
 
 1- Having an interactive mode for squashing would be interested. 
Currently, when squashing migrations, I do the following: 

   - Generate an initial squash
   - Edit it (namely, move around operations to get more optimizations to 
   work)
   - remove the "replaces" tag, then rerun migration squashing to 
   "re-optimize"
   - repeat until I get something I like, then add the original "replaces" 
   tag

   It would be cool if instead, the process were (with a flag):

   - Generate an initial squash, but have the process wait for confirmation 
   to "commit" this squash as final (though writing out the file)
   - Edit the file, and tell the process to try re-optimizing with the same 
   file (getting around the "no-squash of squashes" rule)
   - Potentially, allow us to also step back

 For example, the "squashmigrations" command output could look like:

generated 0001_squashed_mig.py
> optmize migration[yN]? 
> <user inputs y>
> regenerated 0001_squashed_mig.py
> ( 20 operations -> 10 operations)
> optimize migration[Ynr]? 
> <user inputs y>
> regenerated 0001_squashed_mig.py
> ( No change in operation count)
> optimize migration[Ynr]? 
> <user inputs r>
> rolled back to previous version
> optimize migration[Ynr]? 
> <user inputs n>
> Saved migration
>


A simpler version of this command would simply be to add an 
"optimizemigration" command that just reads in a single migration and 
optimizes the operations, without touching any of the squashiness. 


 2- The reducer might be a bit too pessimistic

 Currently, the optimizer lets "reduce" operations (that take 2 operations 
and return 0,1, or 2 operations, or None if nothing can be change) do 
whatever they want. Because of that, if you have [A, B, C ,D] and B depends 
on A, you can't reduce A and C because the reduction might remove A.

 In reality there are two kinds of reduction operations that we could be 
taking into account:

   - reducing "left". if you have [A, B, C, D] (for example, A is a 
CreateModel , C is an AddField for the same model),  and you can reduce A 
and C into just A' (A with C), giving [A', B, D], then it doesn't matter 
that B depends on A. 

The thing that matters is if C depends on B (for example, C adds a foreign 
key to a model created in B). This is actually already encoded in the 
CreateMode + AddField reduction, but is perhaps a more general case.

In a sense, reducing A and C "to the left" means that we're bringing A and 
C closer together only by moving C. This is a major part of the potential 
reductions that the current optimizer is missing.

   - reducing right. If you have [A, B, C, D] (for example, A is a 
CreateModel, C is a RemoveModel for the same model), and you can reduce A 
and C into just C' (C with A), giving [B, C', D], then it does matter that 
B depends on A. C can't depend on B (assuming causality holds in our 
universe)

This is the current mechanism, essentially. If B depends on A, then you 
can't move A past B. 

 Removing both operations is a special case of reducing right (You can make 
C' into a no-op).

I had monkeypatched a special case of  reducing left (taking CreateModel, 
AddField of different models and swapping them . For example 
[CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), 
AddField(A.foo), CreateModel(B)]) and got decent results, but I think 
making the optimization code express these two concepts separately would 
catch even more of the optimizations I saw that the optimizer didn't.


I hope some of this is useful 
 
  Raphael

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6042857b-fd48-467a-badc-1cf5305ecac4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to