Re: Rethinking migrations

2016-11-08 Thread Andrew Godwin
On Tue, Nov 8, 2016 at 12:34 PM, Patryk Zawadzki 
wrote:

> I've just hit another problem related to custom fields.
>
> Currently migrations contain information about "rich" fields. If you use a
> custom field type, the migration code will currently import your field type
> from its Python module. This is highly problematic in case either the code
> moves or you later stop using that field type and want to remove the
> dependency.
>
> I am currently in the process of rewriting some of my existing migrations
> by hand to replace all instances of a custom field type with the type it
> actually uses for storage. This will eventually allow me to drop the
> dependency but it's not very nice.
>

This was a hard choice to make - I was obviously aware of the risks here,
but eventually chose the current system given that it's far easier to reset
the migrations and start over in the Django system than it was in South,
and that removing code is generally rarer than adding it in.


>
> Another problem is that for many custom field tapes makemigrations detects
> changes made to arguments that do no affect the database in any way (as
> they are returned by deconstruction).
>

This has to be done unless fields came with a list of keyword arguments
that were "known safe", and all subclasses of those fields also implemented
that method (in case you e.g. subclassed StringField and made the `choices`
kwarg actually use a MySQL ENUM)


>
> If we could ever break backwards compatibility, I'd suggest having field
> deconstruction only return the column type (and necessary arguments) it
> wants the schema editor to create. This would prevent the migrations from
> having external dependencies (which is a major win in itself).
>

That's not possible if you want to keep the migrations database-agnostic,
as the type of a column varies based on the backend (and sometimes other
things). If you want a system that is fixed to an exact database, at some
point it might be better to just use SQL.

(There is totally room for generating migrations as raw SQL and still
having them work in the current system, which would also get around the
field problem you describe)


>
> I'd also consider having apps.get_model() just use introspection to read
> the schema and return transient models with default field types for each
> underlying column type (so a custom JSONField would become a regular boring
> TextField inside migration code). This would save us tons of "rendering
> model states" time for the relatively small cost of having to cast certain
> columns to your preferred Python types inside a couple of data migrations.
>
>
This runs into issues when the schema you read does not give you enough
information - e.g. some field types (especially geospatial ones) are more
than just a column, there can also be a sequence, some indexes, constraints
etc. involved.

I wrote a more advanced introspection backend as part of the migrations
work, but you'd need to extend it even more and improve upon features like
foreign key implication before it would be possible to do this.

Andrew

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFwN1uq9d6h79uujMr%2BKtOTim_1HQ810qA9ZTu%2BnewF%2BdRZ%2B1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-08 Thread Patryk Zawadzki
I've just hit another problem related to custom fields.

Currently migrations contain information about "rich" fields. If you use a
custom field type, the migration code will currently import your field type
from its Python module. This is highly problematic in case either the code
moves or you later stop using that field type and want to remove the
dependency.

I am currently in the process of rewriting some of my existing migrations
by hand to replace all instances of a custom field type with the type it
actually uses for storage. This will eventually allow me to drop the
dependency but it's not very nice.

Another problem is that for many custom field tapes makemigrations detects
changes made to arguments that do no affect the database in any way (as
they are returned by deconstruction).

If we could ever break backwards compatibility, I'd suggest having field
deconstruction only return the column type (and necessary arguments) it
wants the schema editor to create. This would prevent the migrations from
having external dependencies (which is a major win in itself).

I'd also consider having apps.get_model() just use introspection to read
the schema and return transient models with default field types for each
underlying column type (so a custom JSONField would become a regular boring
TextField inside migration code). This would save us tons of "rendering
model states" time for the relatively small cost of having to cast certain
columns to your preferred Python types inside a couple of data migrations.

Cheers,

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CANw2pUG4quLkCUskBkTSoTLVVb7QsZVZgva0y4xOHui75_P1zA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-06 Thread charettes
> I assume that with a linear chain of migrations we'd only have to render 
model states when detecting database changes (makemigrations) and when 
executing RunPython code?

Right, but I think it should be possible to prevent the rendering of model 
states
in the autodetector. I'm planning to sprint on this today. There's not much 
we
can do for the RunPython case unless we manage to make model rendering
lazy. For example, apps.get_model('app.Foo') could lazily render the Foo 
model
and all its forward and reverse relationships. This seems hard to do but it 
could be
worth investigating once we've dealt with all the low hanging fruits.

> I run a mix of 1.9 and 1.10 and am aware of the recent optimisations as I 
helped with some of them during previous DUTHs.

Great, thanks for that!



Le dimanche 6 novembre 2016 08:46:18 UTC+1, Patryk Zawadzki a écrit :
>
> niedz., 6.11.2016, 00:58 użytkownik charettes  > napisał:
>
>> I have to agree with Marteen.
>>
>> From my experience what really slow down the migrate and makemigrations
>> command is the rendering of model states into concrete model classes. This
>> is something I concluded from my work on adding the plan object to 
>> pre_migrate
>> and post_migrate signals.
>>
>
> Yes, rendering model states is very slow but in our case ordering them is 
> also taking quite some time.
>
> I assume that with a linear chain of migrations we'd only have to render 
> model states when detecting database changes (makemigrations) and when 
> executing RunPython code?
>
> As soon as an operation accesses state.apps the rendering kicks in which
>> triggers the dynamic creation of multiple model classes and the 
>> computation
>> of reverse relationships. There are mechanisms in place to prevent the 
>> whole
>> project model classes from being rendered again when a model state is
>> altered but if the operation is performed on a model referenced by many
>> others the relationship chain might force a large number of them to be
>> rendered again causing massive slow downs.
>>
>> Markus Holtermann has been working on teaching the migration framework
>> how to perform database operations without relying on state.apps which 
>> should
>> solve the remaining performance issues of the migrate command. In the case
>> of makemigrations the last remaining issue in the master branch should be 
>> solved
>> by stopping to rely on state.apps in RenameModel.state_forwards[1].
>>
>> Patryk, many improvement landed in 1.9 and 1.10 to speed up the commands
>> dealing with migrations. Are you still seeing the same slowdown on these 
>> versions?
>>
>
> I run a mix of 1.9 and 1.10 and am aware of the recent optimisations as I 
> helped with some of them during previous DUTHs.
>
> Simon
>>
>> [1] https://github.com/django/django/pull/7468
>>
>>
>> Le dimanche 6 novembre 2016 00:32:04 UTC+1, Marten Kenbeek a écrit :
>>>
>>> On Saturday, November 5, 2016 at 4:53:49 PM UTC+1, Patryk Zawadzki wrote:

 1. Dependency resolution that turns the migration dependency graph into 
 an ordered list happens every time you try to create or execute a 
 migration. If you have several hundred migrations it becomes quite slow. 
 I'm talking multiple minutes kind of slow. As you can imagine working with 
 multiple branches or perfecting your migrations quickly becomes a tedious 
 task.

>>>
>>> Did the dependency resolution actually come up in benchmarks/profiles as 
>>> a bottleneck? When I optimized and benchmarked the dependency graph code, 
>>> it had no trouble ordering ~1000 randomly generated migrations with lots of 
>>> inter-app dependencies in less than a second. I'd be surprised if this had 
>>> any significant impact on the overall performance of migrations.
>>>
>>> An easy way to test this is the `showmigrations` command, which will 
>>> only generate the graph without any model state changes or model rendering 
>>> taking place. It does some other things, but nothing that should take in 
>>> the order of minutes.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django developers (Contributions to Django itself)" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-develop...@googlegroups.com .
>> To post to this group, send email to django-d...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/django-developers.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-developers/4a012e54-fae5-4bba-97a9-f323f38e53bc%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.

Re: Rethinking migrations

2016-11-06 Thread Patryk Zawadzki
niedz., 6.11.2016, 00:58 użytkownik charettes 
napisał:

> I have to agree with Marteen.
>
> From my experience what really slow down the migrate and makemigrations
> command is the rendering of model states into concrete model classes. This
> is something I concluded from my work on adding the plan object to
> pre_migrate
> and post_migrate signals.
>

Yes, rendering model states is very slow but in our case ordering them is
also taking quite some time.

I assume that with a linear chain of migrations we'd only have to render
model states when detecting database changes (makemigrations) and when
executing RunPython code?

As soon as an operation accesses state.apps the rendering kicks in which
> triggers the dynamic creation of multiple model classes and the computation
> of reverse relationships. There are mechanisms in place to prevent the
> whole
> project model classes from being rendered again when a model state is
> altered but if the operation is performed on a model referenced by many
> others the relationship chain might force a large number of them to be
> rendered again causing massive slow downs.
>
> Markus Holtermann has been working on teaching the migration framework
> how to perform database operations without relying on state.apps which
> should
> solve the remaining performance issues of the migrate command. In the case
> of makemigrations the last remaining issue in the master branch should be
> solved
> by stopping to rely on state.apps in RenameModel.state_forwards[1].
>
> Patryk, many improvement landed in 1.9 and 1.10 to speed up the commands
> dealing with migrations. Are you still seeing the same slowdown on these
> versions?
>

I run a mix of 1.9 and 1.10 and am aware of the recent optimisations as I
helped with some of them during previous DUTHs.

Simon
>
> [1] https://github.com/django/django/pull/7468
>
>
> Le dimanche 6 novembre 2016 00:32:04 UTC+1, Marten Kenbeek a écrit :
>
> On Saturday, November 5, 2016 at 4:53:49 PM UTC+1, Patryk Zawadzki wrote:
>
> 1. Dependency resolution that turns the migration dependency graph into an
> ordered list happens every time you try to create or execute a migration.
> If you have several hundred migrations it becomes quite slow. I'm talking
> multiple minutes kind of slow. As you can imagine working with multiple
> branches or perfecting your migrations quickly becomes a tedious task.
>
>
> Did the dependency resolution actually come up in benchmarks/profiles as a
> bottleneck? When I optimized and benchmarked the dependency graph code, it
> had no trouble ordering ~1000 randomly generated migrations with lots of
> inter-app dependencies in less than a second. I'd be surprised if this had
> any significant impact on the overall performance of migrations.
>
> An easy way to test this is the `showmigrations` command, which will only
> generate the graph without any model state changes or model rendering
> taking place. It does some other things, but nothing that should take in
> the order of minutes.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/4a012e54-fae5-4bba-97a9-f323f38e53bc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CANw2pUGrzkDD5hjD-PXh%3DX5GXoqReiebaxuB8%3DggivB4%2B9DTaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-05 Thread charettes
I have to agree with Marteen.

>From my experience what really slow down the migrate and makemigrations
command is the rendering of model states into concrete model classes. This
is something I concluded from my work on adding the plan object to 
pre_migrate
and post_migrate signals.

As soon as an operation accesses state.apps the rendering kicks in which
triggers the dynamic creation of multiple model classes and the computation
of reverse relationships. There are mechanisms in place to prevent the whole
project model classes from being rendered again when a model state is
altered but if the operation is performed on a model referenced by many
others the relationship chain might force a large number of them to be
rendered again causing massive slow downs.

Markus Holtermann has been working on teaching the migration framework
how to perform database operations without relying on state.apps which 
should
solve the remaining performance issues of the migrate command. In the case
of makemigrations the last remaining issue in the master branch should be 
solved
by stopping to rely on state.apps in RenameModel.state_forwards[1].

Patryk, many improvement landed in 1.9 and 1.10 to speed up the commands
dealing with migrations. Are you still seeing the same slowdown on these 
versions?

Simon

[1] https://github.com/django/django/pull/7468

Le dimanche 6 novembre 2016 00:32:04 UTC+1, Marten Kenbeek a écrit :
>
> On Saturday, November 5, 2016 at 4:53:49 PM UTC+1, Patryk Zawadzki wrote:
>>
>> 1. Dependency resolution that turns the migration dependency graph into 
>> an ordered list happens every time you try to create or execute a 
>> migration. If you have several hundred migrations it becomes quite slow. 
>> I'm talking multiple minutes kind of slow. As you can imagine working with 
>> multiple branches or perfecting your migrations quickly becomes a tedious 
>> task.
>>
>
> Did the dependency resolution actually come up in benchmarks/profiles as a 
> bottleneck? When I optimized and benchmarked the dependency graph code, it 
> had no trouble ordering ~1000 randomly generated migrations with lots of 
> inter-app dependencies in less than a second. I'd be surprised if this had 
> any significant impact on the overall performance of migrations.
>
> An easy way to test this is the `showmigrations` command, which will only 
> generate the graph without any model state changes or model rendering 
> taking place. It does some other things, but nothing that should take in 
> the order of minutes.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/4a012e54-fae5-4bba-97a9-f323f38e53bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-05 Thread Marten Kenbeek
On Saturday, November 5, 2016 at 4:53:49 PM UTC+1, Patryk Zawadzki wrote:
>
> 1. Dependency resolution that turns the migration dependency graph into an 
> ordered list happens every time you try to create or execute a migration. 
> If you have several hundred migrations it becomes quite slow. I'm talking 
> multiple minutes kind of slow. As you can imagine working with multiple 
> branches or perfecting your migrations quickly becomes a tedious task.
>

Did the dependency resolution actually come up in benchmarks/profiles as a 
bottleneck? When I optimized and benchmarked the dependency graph code, it 
had no trouble ordering ~1000 randomly generated migrations with lots of 
inter-app dependencies in less than a second. I'd be surprised if this had 
any significant impact on the overall performance of migrations.

An easy way to test this is the `showmigrations` command, which will only 
generate the graph without any model state changes or model rendering 
taking place. It does some other things, but nothing that should take in 
the order of minutes.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/90cd80d5-fc72-4611-b716-b6d1e5d80e43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-05 Thread Patryk Zawadzki
W dniu sobota, 5 listopada 2016 19:57:38 UTC+1 użytkownik Aymeric Augustin 
napisał:
>
> My solution is to throw away and remake all migrations on a regular basis. 
> Then I `TRUNCATE TABLE django_migrations` and `django-admin migrate 
> --fake`. Obviously this isn’t a great solution. 
>
> Since I work mostly on small projects with just a couple developers on 
> staff, doing this every few months suffices to keep the run time below one 
> minute (which is still quite annoying). 
>
> There’s a risk to lose important, manually generated migrations, typically 
> those that create indexes. I diff the schema before / after with apgdiff to 
> avoid such problems. 
>

That's the main problem we're facing. I'm currently leading a project that 
predates dinosaurs and when it was switched from South to Django, all the 
data migrations were just carried over from the old code. They are holy 
gifts from the elder gods and are rich with eldritch symbols. Nobody wants 
to have to copy and paste them every month or so when we decide to redo all 
of the migrations.

There's also the problem of having many long-running (weeks to months) 
feature branches that make it hard to find a point in time where all 
migrations can be safely discarded.

I can also imagine it's much harder to redo initial migrations in projects 
where two-way relations exist between certain applications.

Cheers,

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/88ed6c68-1732-49bf-8d6b-3c190729472a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-05 Thread Patryk Zawadzki
W dniu sobota, 5 listopada 2016 18:40:24 UTC+1 użytkownik Shai Berger 
napisał:
>
> > 2. Dependency resolution is only stable as long as the migration set is 
> > frozen. Sometimes introducing a new migration is enough to break 
> existing 
> > migrations by causing them to execute in a slightly different order. We 
> > often have to backtrack and edit existing migrations and enforce a 
> strict 
> > resolution order by introducing arbitrary dependencies. 
> > 
>
> So, you say you really have implicit dependencies between migrations -- 
> dependencies in substance, which aren't recorded as dependencies. This 
> seems 
> to indicate that you have a lot of manually-written migrations (data 
> migrations?), since the automatically-written ones do include relevant 
> dependencies. This seems odd -- it sounds like you're doing something out 
> of 
> the ordinary. 
>
> This would also explain some of your bad experience with squashing -- 
> indeed, 
> if you have many data migrations, squashing can become much less 
> effective. 
>

Let's not come to conclusions prematurely. Django only supports predicate 
dependencies. You can say "not earlier than after these are applied" but 
that does not mean "immediately after they are applied". Sometimes Django 
tries to run the migration much later. If you have your models scattered 
across a large number of applications (we use apps to gateway entire 
classes of related features) sometimes the late migrations tries to 
reference a column in another model that was long since removed by a much 
later added migration in its respective app.

> 3. Removing an app from a project is a nightmare. You can't migrate to 
> zero 
> > state unless the app is still there. There is no way to add "revert all 
> > migrations for app X" to the migration graph, it's something you need to 
> > run manually. There is no clean way to remove an app that was ever 
> > references in a relation. We were forced to do all kinds of hacks to get 
> > around this. Sometimes it's necessary to create an empty eggshell app 
> with 
> > the same name and copy all migrations there then add necessary data 
> > migrations and finally migrations that remove all the models, indices, 
> > procedures etc. Sometimes people just leave a dead application in 
> > INSTALLED_APPS to not have to deal with this. 
>
> Clear out (maybe even remove) models.py and type "makemigrations", and you 
> get 
> a migration that deletes everything. The answer to getting rid of the 
> historical migrations is squashing, but of course you first need squashing 
> to 
> work properly. 
>

I cannot clear out anything from an app that came from PyPI. That's why I 
mentioned creating fake empty apps that are just containers for their 
migration history. Squashing does nothing to help with that if you have 
another application reference any of those models. Squashing only helps you 
have fewer migrations. If the migrations were always in the correct order, 
the migration engine could collapse them automatically at execution time.

> 4. Squashing migrations is wonky at best. If you create a model in one 
> > migration, alter one of its fields in another and then finally drop the 
> > model sometime later, the squashed migration will have Django try to 
> > execute the alter first and complain about the table not being there. 
> Also 
> > the only reason we need to squash migrations is to prevent problem 1 
> above 
> > from becoming exponentially worse. If migrations were only as slow as 
> the 
> > underlying SQL commands, we'd likely never squash them. 
> > 
>
> If that's so, it's a bug you should report; it's also an issue you can 
> work- 
> around by editing the migration to remove the redundant operation. There 
> are   
> issues with squashing, to be sure, but I don't think this is one of the 
> serious ones. 
>

It's a bug that I will report at some point but I mostly encounter it in 
environments where I can't afford the time needed to properly debug.

> 6. Conflict detection and resolution (migrate --merge) is a make-believe 
> > solution. It just trains people to execute the command without 
> > investigating whether their migration history still makes sense. 
>
> It could be smarter, assuming it understood the content of migrations. We 
> could probably improve it to a point where, for most cases, it would 
> either 
> know to merge automatically or know that there really is a conflict. This 
> would 
> probably not help you if you have a lot of RunPython's in your migrations. 
>

Depends on the project. I really don't care about the framework trying to 
reason about the results of a git merge. Django does not have enough 
understanding of the code and the version control history to do the job of 
the person responsible for the merge. Getting stuff right 60% of the time 
is not reliable enough to depend on it but is reliable enough to get some 
people lazy.

> Some of these I need to dig deeper into and probably file proper 

Re: Rethinking migrations

2016-11-05 Thread Aymeric Augustin
Hello,

My solution is to throw away and remake all migrations on a regular basis. Then 
I `TRUNCATE TABLE django_migrations` and `django-admin migrate --fake`. 
Obviously this isn’t a great solution.

Since I work mostly on small projects with just a couple developers on staff, 
doing this every few months suffices to keep the run time below one minute 
(which is still quite annoying).

There’s a risk to lose important, manually generated migrations, typically 
those that create indexes. I diff the schema before / after with apgdiff to 
avoid such problems.

This is quite doable but outside the comfort zone of many developers: my 
clients prefer to have me do it even though I documented the steps in detail.

So… yeah, it would be nice to have something more practical, even if it 
requires trading off some purity in the design of migrations.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/84521630-23DC-44CF-B363-8FC913B10084%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: Rethinking migrations

2016-11-05 Thread Patryk Zawadzki
W dniu sobota, 5 listopada 2016 17:30:15 UTC+1 użytkownik Andrew Godwin 
napisał:
>
> Hello! I have opinions about this :)
>  
>
>> Possible solution (or "how I'd build it today if there was no existing 
>> code in Django core"):
>>
>> a. Make migrations part of the project and not individual apps. This 
>> takes care of problem 3 above.
>>
>
> It also means it's impossible for apps to ship migrations and define how 
> to upgrade from version to version. I realise that (c) below is part of a 
> proposed solution to this, but how do you propose to match up what's 
> already been run in the database without having names match (and then you 
> just have app migrations by another name)?
>

I would actually insist on keeping the names intact. It means that adding 
an external dependency could inject a migration with a date from the 
previous year but I think that's not a problem as it's guaranteed not to 
conflict with any other migrations.


>> b. Prefix individual migration files with a UTC timestamp 
>> (20161105151023_add_foo) to provide a strict sorting order. This removes 
>> the depsolving requirement and takes care of 1 and 2. By eliminating those 
>> it makes 4 kind of obsolete as squashing migrations would become pointless.
>>
>
> Unfortunately this does not help all the time as computers' clocks aren't 
> necessarily right or in sync, so it would merely be an approximation and 
> you'd still get the occasional clash.
>

You only need your own migrations to be ordered so you can safely assume 
the previous one to be applied before the one you're writing at the moment. 
For two unrelated changes the order pretty much does not matter.
 

> c. Have reusable apps provide migration templates that Django then copies 
>> to my project when "makemigrations" is run.
>>
>
> Would these be lined up with their own timestamp in the single serial 
> migration timeline? Would you have to make sure any of these templates from 
> any app update was copied across and put in the order before you used the 
> new columns?
>

I'd say use proper timestamps. This way two apps can depend on each other 
and the migrations will still get run in proper order.
 

> d. Maintain a separate directory for each database connection.
>>
>
> This I think might be a good idea, though I'd like to see a more 
> generalised idea of "migration sets" and you then then say which alias uses 
> which set (so you can share sets among more than one connection)
>

Agreed.
 

> e. Execute all migrations in alphabetical order (which means by timestamp 
>> first). When an unapplied migration is followed by an applied one, ask 
>> whether to attempt to just apply it or if the user wants to first unapply 
>> migrations that came after it. To me this would work better than 6.
>>
>
> This is basically what South used to do, and it worked reasonably well in 
> either being successful or exploding enough that people noticed. Given that 
> you're proposing per-project migrations, however, people are going to run 
> into this almost constantly, as they will clash significantly more than 
> per-app ones.
>

South was not perfect but I'd say the current solution is not better, it's 
just different. Some of my projects use a lot of long-running feature 
branches so I have an application where every other migration is a merge 
migration with accepted default values. We do try to make migrations 
backwards-compatible where needed but I don't think it's a common scenario 
to add conflicting changes on two feature branches. Most of our conflicts 
can be described as department A added a field they needed while department 
B added a data migration to fix a denormalized field.
 

> Of course we do have migration support in core and it's not compatible 
>> with most of the above list. Any ideas? I think serializing the dependency 
>> solver state and reusing it between runs could be a pretty low hanging 
>> fruit (like "npm shrinkwrap" or yarn's lock file).
>>
>
> I think not only could the dependency solver state be serialised but that 
> it would be a replacement for the datetimes-on-filename proposal in that 
> you could easily pull out a previously-serialised order from disk and then 
> work out what the new ones do.
>
> I am generally not keen on the idea of per-project migrations, though - it 
> makes what's in the database a property of the project, not the app, and 
> that's not how Django has worked traditionally. I think an effort to get a 
> more reliable, exposed global ordering of those individual app migrations 
> would go a long way towards the end goal without having to have migration 
> templates, upgrade instructions, and way more collisions between branches.
>

I do believe that the database is my property and I'd much rather see the 
project code hold reign over its structure. Some problems simply cannot be 
solved by submitting an upstream patch (project-specific or 
backend-specific indexes come to mind).
 

> At the end of the day, though, 

Re: Rethinking migrations

2016-11-05 Thread Shai Berger
Hi,

On Saturday 05 November 2016 17:53:49 Patryk Zawadzki wrote:
> 
> I'm typing this from the comfort of Django: Under the Hood sprints so
> please excuse poor grammar and the somewhat chaotic explanations that
> follow. I'm very tired and English is not my mother tongue. This is not a
> DEP but merely a stream of consciousness I'd love to get some feedback on.
> 

I am dealing with some similar issues, but I've reached very different 
conclusions. In much the same spirit, this is not very orderly.


> Here are some of the problems we face when dealing with migrations:
> 
> 1. Dependency resolution that turns the migration dependency graph into an
> ordered list happens every time you try to create or execute a migration.
> If you have several hundred migrations it becomes quite slow. I'm talking
> multiple minutes kind of slow. As you can imagine working with multiple
> branches or perfecting your migrations quickly becomes a tedious task.
> 

I've known this to happen, indeed.

> 2. Dependency resolution is only stable as long as the migration set is
> frozen. Sometimes introducing a new migration is enough to break existing
> migrations by causing them to execute in a slightly different order. We
> often have to backtrack and edit existing migrations and enforce a strict
> resolution order by introducing arbitrary dependencies.
> 

So, you say you really have implicit dependencies between migrations -- 
dependencies in substance, which aren't recorded as dependencies. This seems 
to indicate that you have a lot of manually-written migrations (data 
migrations?), since the automatically-written ones do include relevant 
dependencies. This seems odd -- it sounds like you're doing something out of 
the ordinary.

This would also explain some of your bad experience with squashing -- indeed, 
if you have many data migrations, squashing can become much less effective.

> 3. Removing an app from a project is a nightmare. You can't migrate to zero
> state unless the app is still there. There is no way to add "revert all
> migrations for app X" to the migration graph, it's something you need to
> run manually. There is no clean way to remove an app that was ever
> references in a relation. We were forced to do all kinds of hacks to get
> around this. Sometimes it's necessary to create an empty eggshell app with
> the same name and copy all migrations there then add necessary data
> migrations and finally migrations that remove all the models, indices,
> procedures etc. Sometimes people just leave a dead application in
> INSTALLED_APPS to not have to deal with this.

Clear out (maybe even remove) models.py and type "makemigrations", and you get 
a migration that deletes everything. The answer to getting rid of the 
historical migrations is squashing, but of course you first need squashing to 
work properly.

> 
> 4. Squashing migrations is wonky at best. If you create a model in one
> migration, alter one of its fields in another and then finally drop the
> model sometime later, the squashed migration will have Django try to
> execute the alter first and complain about the table not being there. Also
> the only reason we need to squash migrations is to prevent problem 1 above
> from becoming exponentially worse. If migrations were only as slow as the
> underlying SQL commands, we'd likely never squash them.
> 

If that's so, it's a bug you should report; it's also an issue you can work-
around by editing the migration to remove the redundant operation. There are  
issues with squashing, to be sure, but I don't think this is one of the 
serious ones.

> 5. There's no simple way to roll back all the migrations introduced after a
> particular point in time which is very useful when working with multiple
> feature branches. In my current project dropping the database means having
> to reimport over 200 MB of data snapshots. Switching branches requires me
> to look at branch diffs to determine which migrations to revert.
> 

Yes, this is a real issue, with one modification -- I'd much rather have a good 
way to migrate to a point-in-version-history than to a point-in-time.

This is even more than a development issue -- I've encountered a use-case for 
doing something like this in production: If I want to be able to export an 
object represented by a model (or set of models), by serializing it and saving 
the serialized version; and then I'd want to import it back in after the app 
has progressed -- if I'd want generic support for that, I'd need a way to 
migrate a database to the point where the object was exported, import it, and 
then roll the database forward to the "present".

> 6. Conflict detection and resolution (migrate --merge) is a make-believe
> solution. It just trains people to execute the command without
> investigating whether their migration history still makes sense.
> 

It could be smarter, assuming it understood the content of migrations. We 
could probably improve it to a point where, for most 

Re: Rethinking migrations

2016-11-05 Thread Andrew Godwin
Hello! I have opinions about this :)


> Possible solution (or "how I'd build it today if there was no existing
> code in Django core"):
>
> a. Make migrations part of the project and not individual apps. This takes
> care of problem 3 above.
>

It also means it's impossible for apps to ship migrations and define how to
upgrade from version to version. I realise that (c) below is part of a
proposed solution to this, but how do you propose to match up what's
already been run in the database without having names match (and then you
just have app migrations by another name)?


>
> b. Prefix individual migration files with a UTC timestamp
> (20161105151023_add_foo) to provide a strict sorting order. This removes
> the depsolving requirement and takes care of 1 and 2. By eliminating those
> it makes 4 kind of obsolete as squashing migrations would become pointless.
>

Unfortunately this does not help all the time as computers' clocks aren't
necessarily right or in sync, so it would merely be an approximation and
you'd still get the occasional clash.


>
> c. Have reusable apps provide migration templates that Django then copies
> to my project when "makemigrations" is run.
>

Would these be lined up with their own timestamp in the single serial
migration timeline? Would you have to make sure any of these templates from
any app update was copied across and put in the order before you used the
new columns?


>
> d. Maintain a separate directory for each database connection.
>

This I think might be a good idea, though I'd like to see a more
generalised idea of "migration sets" and you then then say which alias uses
which set (so you can share sets among more than one connection)


>
> e. Execute all migrations in alphabetical order (which means by timestamp
> first). When an unapplied migration is followed by an applied one, ask
> whether to attempt to just apply it or if the user wants to first unapply
> migrations that came after it. To me this would work better than 6.
>

This is basically what South used to do, and it worked reasonably well in
either being successful or exploding enough that people noticed. Given that
you're proposing per-project migrations, however, people are going to run
into this almost constantly, as they will clash significantly more than
per-app ones.


>
> Of course we do have migration support in core and it's not compatible
> with most of the above list. Any ideas? I think serializing the dependency
> solver state and reusing it between runs could be a pretty low hanging
> fruit (like "npm shrinkwrap" or yarn's lock file).
>

I think not only could the dependency solver state be serialised but that
it would be a replacement for the datetimes-on-filename proposal in that
you could easily pull out a previously-serialised order from disk and then
work out what the new ones do.

I am generally not keen on the idea of per-project migrations, though - it
makes what's in the database a property of the project, not the app, and
that's not how Django has worked traditionally. I think an effort to get a
more reliable, exposed global ordering of those individual app migrations
would go a long way towards the end goal without having to have migration
templates, upgrade instructions, and way more collisions between branches.

At the end of the day, though, there's a reason I made the schema editing
separate from the migration runners - you can re-use all the nasty work in
the schema editing interface and just replace the other part. This huge
change is the sort of thing I'd want to see working and proven before we
considered changing core, preferably as a third-party app, but of course
I'd like to talk through potential smaller changes first, rather than
throwing out the entire system.

Andrew

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFwN1uqLBdv-0LGB6T1FS7eSxxKwjpfrVibXUT7KS-v9WCYK9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Rethinking migrations

2016-11-05 Thread Patryk Zawadzki
Greetings, Jazz Guitarists,

I've briefly talked about this with Markus and he mentioned that the 
subject was already brought up by Tyson Clugg but I think it deserves a 
proper discussion here.

I'm typing this from the comfort of Django: Under the Hood sprints so 
please excuse poor grammar and the somewhat chaotic explanations that 
follow. I'm very tired and English is not my mother tongue. This is not a 
DEP but merely a stream of consciousness I'd love to get some feedback on.

Here are some of the problems we face when dealing with migrations:

1. Dependency resolution that turns the migration dependency graph into an 
ordered list happens every time you try to create or execute a migration. 
If you have several hundred migrations it becomes quite slow. I'm talking 
multiple minutes kind of slow. As you can imagine working with multiple 
branches or perfecting your migrations quickly becomes a tedious task.

2. Dependency resolution is only stable as long as the migration set is 
frozen. Sometimes introducing a new migration is enough to break existing 
migrations by causing them to execute in a slightly different order. We 
often have to backtrack and edit existing migrations and enforce a strict 
resolution order by introducing arbitrary dependencies.

3. Removing an app from a project is a nightmare. You can't migrate to zero 
state unless the app is still there. There is no way to add "revert all 
migrations for app X" to the migration graph, it's something you need to 
run manually. There is no clean way to remove an app that was ever 
references in a relation. We were forced to do all kinds of hacks to get 
around this. Sometimes it's necessary to create an empty eggshell app with 
the same name and copy all migrations there then add necessary data 
migrations and finally migrations that remove all the models, indices, 
procedures etc. Sometimes people just leave a dead application in 
INSTALLED_APPS to not have to deal with this.

4. Squashing migrations is wonky at best. If you create a model in one 
migration, alter one of its fields in another and then finally drop the 
model sometime later, the squashed migration will have Django try to 
execute the alter first and complain about the table not being there. Also 
the only reason we need to squash migrations is to prevent problem 1 above 
from becoming exponentially worse. If migrations were only as slow as the 
underlying SQL commands, we'd likely never squash them.

5. There's no simple way to roll back all the migrations introduced after a 
particular point in time which is very useful when working with multiple 
feature branches. In my current project dropping the database means having 
to reimport over 200 MB of data snapshots. Switching branches requires me 
to look at branch diffs to determine which migrations to revert.

6. Conflict detection and resolution (migrate --merge) is a make-believe 
solution. It just trains people to execute the command without 
investigating whether their migration history still makes sense.


Some of these I need to dig deeper into and probably file proper tickets. 
For example I have an idea on how to fix 4 but it would make 1 even slower.

I took some time to get a good long look at what other ORMs are doing. The 
graph-based dependency solving approach is rather uncommon. Most systems 
treat migrations as part of the project rather than the packages it uses.


Possible solution (or "how I'd build it today if there was no existing code 
in Django core"):

a. Make migrations part of the project and not individual apps. This takes 
care of problem 3 above.

b. Prefix individual migration files with a UTC timestamp 
(20161105151023_add_foo) to provide a strict sorting order. This removes 
the depsolving requirement and takes care of 1 and 2. By eliminating those 
it makes 4 kind of obsolete as squashing migrations would become pointless.

c. Have reusable apps provide migration templates that Django then copies 
to my project when "makemigrations" is run.

d. Maintain a separate directory for each database connection.

e. Execute all migrations in alphabetical order (which means by timestamp 
first). When an unapplied migration is followed by an applied one, ask 
whether to attempt to just apply it or if the user wants to first unapply 
migrations that came after it. To me this would work better than 6.

f. Migrating to a timestamp solves 5.


Of course we do have migration support in core and it's not compatible with 
most of the above list. Any ideas? I think serializing the dependency 
solver state and reusing it between runs could be a pretty low hanging 
fruit (like "npm shrinkwrap" or yarn's lock file).

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group,