Here is a revised proposal.

Abstract
------------------------------------------------------------------------------
A database migration helper has been one of the most long standing feature
requests in Django. Though Django has an excellent database creation helper,
when faced with schema design changes, developers have to resort to either
writing raw SQL and manually performing the migrations, or using third party
apps like South[1] and Nashvegas[2].

[1] http://south.aeracode.org/
[2] https://github.com/paltman/nashvegas/

Clearly Django will benefit from having a database migration helper as an
integral part of its codebase.

>From the summary on django-developers mailing list[3], the task of building
a
migrations framework will involve:
1. Add a db.backends module to provide an abstract interface to migration
   primitives (add column, add index, rename column, rename table, and so
on).
2. Add a contrib app that performs the high level accounting of "has
migration
   X been applied", and management commands to "apply all outstanding
   migrations"
3. Provide an API that allows end users to define raw-SQL migrations, or
   native Python migrations using the backend primitives.
4. Leave the hard task of determining dependencies, introspection of
database
   models and so on to the toolset contributed by the broader community.

[3] http://groups.google.com/​group/django-developers/msg/​cf379a4f353a37f8

I would like to work on the 1st step as part of this year's GSoC.


Implementation plan
------------------------------------------------------------------------------
The idea is to have a CRUD interface to database schema (with some
additional
utility functions for indexing etc.) with functions like:
* create_table
* rename_table
* delete_table
* add_column
and so on, which will have the *explicit* names of the table/column to be
modified as its parameter. It will be the responsibility of the higher level
API caller (will not be undertaken as part of GSoC) to translate model/field
names to explicit table/column names. These functions will be directly
responsible for modifying the schema, and any interaction with the database
schema will take place by calling these functions. Most of these functions
will come from South.

These API functions will also have a "dry-run" or test mode, in which they
will output raw SQL representation of the migration or display errors if
they
occur. This will be useful in:
1. The MySQL backend. MySQL does not have transaction support for schema
   modification and hence the migrations will be run in a dry run mode first
   so that any errors can be captured before altering the schema.
2. The django-admin commands sql and sqlall that return the SQL (for
creation
   and indexing) for an app. They will capture the SQL returned from the API
   running in dry run mode.

As for the future of the current Django creation API, it will have to be
refactored (not under GSoC) to make use of the 'create' part of our new CRUD
interface, for consistency purposes.

The GeoDjango backends will also have to be refactored to use the new API.
Since, they build upon the base code in db.backends, firstly db.backends
will
have to be refactored.

Last year xtrqt had written, documented and tested code for at least the
SQLite backend[4]. As per Andrew's suggestion, I would not be relying too
much
on that code but some parts can still be salvaged.

[4] https://groups.google.com/
​forum/?fromgroups#!searchin/​django-developers/xtrqt/
django-developers/pSICNJBJRy8/​Hl7frp-O-dMJ


Schedule and Goal
------------------------------------------------------------------------------
Week 1     : Discussion on API design and writing tests
Week 2-3   : Developing the base migration API
Week 4     : Developing extensions and overrides for PostgreSQL
Week 5-6   : Developing extensions and overrides for MySQL
Week 7-8.5 : Developing extensions and overrides for SQLite (may be shorter
or
             longer (by 0.5 week) depending on how much of xtrqt's code is
             considered acceptable)
Week 8.5-10: Writing documentaion and leftover regression tests, if any
Week 11-12 : Buffer weeks for the unexpected

I will consider my project to be successful when we have working, tested and
documented migration primitives for Postgres, MySQL and SQLite. If we can
develop a working fork of South to use these primitives, that will be a
strong
indicator of the project's success.


About me and my inspiration for the project
------------------------------------------------------------------------------
I am Kushagra Sinha, a pre-final year student at Institute of Technology
(about to be converted to an Indian Institute of Technology),
Banaras Hindu University, Varanasi, India.

I can be reached at:
Gmail: sinha.kushagra
Alternative email: kush [at] j4nu5 [dot] com
IRC: Nick j4nu5 on #django-dev and #django
Twitter: @j4nu5
github: j4nu5

I was happily using PHP for nearly all of my webdev work since my high
school
days (CakePHP being my framework of choice) till I was introduced to Django
a year and a half ago. Comparing Django with CakePHP (which is Ruby on Rails
inspired) I felt more attached to Django's philosophy than RoR's "hidden
magic"
approach. I have been in love ever since :)

Last year I had an internship at MobStac[5] (BusinessWorld magazine India's
hottest young startup[6]). Their stack is on Django+MySQL. I was involved in
a heavy database migration that involved their analytics platform. Since,
they
had not been using a migrations framework, the situation looked grim.
Fortunately, South came to the rescue and we were able to carry out the
migration but it left everyone a little frustrated and clearly in want of a
migrations framework built within Django itself.

[5] http://mobstac.com/
[6]
http://blog.mobstac.com/blog/2011/06/businessworld-declares-mobstac-indias-hottest-young-startup/


Experience
------------------------------------------------------------------------------
I have experience working in a high voltage database migration through my
internship as stated before. I am also familiar with Django's contribution
guidelines and have written a couple of patches[7]. One patch has been
accepted and the second got blocked by 1.4's feature freeze.
My other projects can be seen on my github[8]

[7] https://code.djangoproject.com/query?owner=~j4nu5
[8] https://github.com/j4nu5

On Mon, Mar 19, 2012 at 5:03 AM, Russell Keith-Magee <
[email protected]> wrote:

>
> On 18/03/2012, at 7:38 PM, Kushagra Sinha wrote:
>
> > Abstract
> >
> ------------------------------------------------------------------------------
> > A database migration helper has been one of the most long standing
> feature
> > requests in Django. Though Django has an excellent database creation
> helper,
> > when faced with schema design changes, developers have to resort to
> either
> > writing raw SQL and manually performing the migrations, or using third
> party
> > apps like South[1] and Nashvegas[2].
> >
> > Clearly Django will benefit from having a database migration helper as an
> > integral part of its codebase.
> >
> > From [3], the consensus seems to be on building a Ruby on Rails
> ActiveRecord
> > Migrations[4] like framework, which will essentially emit python code
> after
> > inspecting user models and current state of the database.
>
> Check the edit dates on that wiki -- most of the content on that page is
> historical, reflecting discussions that were happening over 3 years ago.
> There have been many more recent discussions.
>
> The "current consensus" (at least, the consensus of what the core team is
> likely to accept) is better reflected by the GSoC project that was
> accepted, but not completed last year. I posted to Django-developers about
> this a week or so ago [1]; there were some follow up conversations in that
> thread, too [2].
>
> [1] http://groups.google.com/group/django-developers/msg/cf379a4f353a37f8
> [2] http://groups.google.com/group/django-developers/msg/2f287e5e3dc9f459
>
> > The python code
> > generated will then be fed to a 'migrations API' that will actually
> handle the
> > task of migration. This is the approach followed by South (as opposed to
> > Nashvegas's approach of generating raw SQL migration files). This ensures
> > modularity, one of the trademarks of Django.
>
> I don't think you're going to be able to ignore raw SQL migrations quite
> that easily. Just like the ORM isn't able to express every query, there
> will be migrations that you can't express in any schema migration
> abstraction. Raw SQL migrations will always need to be an option (even if
> they're feature limited).
>
> > Third party developers can create
> > their own inspection and ORM versioning tools, provided the inspection
> tool
> > emits python code conforming to our new migrations API.
> >
> > To sum up, the complete migrations framework will need, at the highest
> level:
> > 1. A migrations API that accepts python code and actually performs the
> >    migrations.
>
> This is certainly needed. I'm a little concerned by your phrasing of an
> "API that accepts python code", though. An API is something that Python
> code can invoke, not the other way around. We're looking for
> django.db.backends.migration as an analog of django.db.backends.creation,
> not a code consuming utility library.
>
> > 2. An inspection tool that generates the appropriate python code after
> >    inspecting models and current state of database.
>
> The current consensus is that this shouldn't be Django's domain -- at
> least, not in the first instance. It might be appropriate to expose an API
> to extract the current model state in a Pythonic form, but a fully-fledged,
> user accessible "tool".
>
> > 3. A versioning tool to keep track of migrations. This will allow
> 'backward'
> >    migrations.
>
> If backward migrations is the only reason to have a versioning tool, then
> I'd argue you don't need versioning.
>
> However, that's not the only reason to have versioning, is it :-)
>
> > South's syncdb:
> > class Command(NoArgsCommand):
> >     def handle_noargs(self, migrate_all=False, **options):
>
> As a guide for the future -- large wads of code like this aren't very
> compelling as part of a proposal unless you're trying to demonstrate
> something specific. In this case, you're just duplicating some of South's
> internals -- "I'm going to take South's lead" is all you really needed to
> say.
>
> > If migrations become a core part of Django, every user app will have a
> > migration folder(module) under it, created at the time of issuing
> > django-admin.py startapp. Thus by modifying the startapp command to
> create a
> > migrations module for every app it creates, we will be able to use
> South's
> > syncdb code as is and will also save the user from issuing
> > schemamigration --initial for all his/her apps.
> >
> > Now that we have a guaranteed migrations history for every user app,
> migrate
> > command will also be more or less a copy of South's migrate command.
>
> What does this "history" look like? Are migrations named? Are they dated?
> Numbered? How do you handle dependencies? Ordering? Collisions between
> parallel development?
>
> *This* is the sort of thing a proposal should be elaborating.
> >
> > As much as I would have liked to use Django creation API's code for
> creating
> > and destroying models, we cannot. The reason for this is Django's
> creation API
> > uses its inspection tools to generate *SQL* which is then directly fed to
> > cursor.execute. What we need is a migrations API which gobbles up
> *python*
> > code generated by the inspection tool. Moreover deprecating/removing
> Django's
> > creation API to use the new migrations API everywhere will give rise to
> > performance issues since time will be wasted in generating python code
> and then
> > converting python to SQL for Django's core apps which will never have
> > migrations anyways.
>
> This sounds like a false economy to me. If we're talking about the core
> pipeline for handling a HTTP request, then every method call and
> abstraction counts. However, that's not what we're talking about. We're
> talking about utilities used to synchronize the database. They're called by
> manual invocation, infrequently, and *never* as part of the
> request/response cycle.
>
> Yes, there will probably be a slowdown -- but we get the benefit of a
> consistent interface to database creation. However, unless the slowdown to
> syncdb is such that it becomes *seriously* observable -- e.g., turns sycndb
> into a 1 minute operation, rather than a 1 second operation -- then you're
> advocating for duplicating code paths in order to maintain a false economy.
>
> > The creation API and code that depends on it (syncdb, sql,
> django.test.simple
> > and django.contrib.gis.db.backends) will be left as is.
> >
> > Therefore much of the code for our new migrations API will come from
> South.
>
> Again, the code snippet highlights nothing here. Anyone qualified to
> review your proposal is at least familiar with South, so there's no need to
> give a page long example of South's usage unless you're trying to say
> something specific about South's API and usage.
>
> > Schedule and Goal
> >
> ------------------------------------------------------------------------------
> > Week 1    : Discussion on API design and overriding django-admin startapp
> > Week 2-3  : Developing the base migration API
> > Week 4    : Developing migration extensions and overrides for PostgreSQL
> > Week 5    : Developing migration extensions and overrides for MySQL
> > Week 6    : Developing migration extensions and overrides for SQLite
> > Week 7    : Developing the inspection tools
> > Week 8    : Developing the ORM versioning tools and glue code
> > Week 9-10 : Writing tests/documentaion
> > Week 11-12: Buffer weeks for the unexpected, Oracle DB? and
> >             djago.contrib.gis.backends?
> >
>
> Week 13 - profit.
>
> Seriously, this is a very unconvincing timetable. What are you basing
> these estimates on?
>
> Some of the things that raise flags for me:
>
>  * What makes you think that MySQL, PostgreSQL and SQLite are all equally
> complex when it comes to migrations? SQLite doesn't let you rename a table.
> Tracking MySQL index changes is non-trivial.
>
> * On what basis do you assert that "developing inspection tools" --
> presumably for all three databases covered in weeks 4-6 -- will take 1 week?
>
>  * If you're not working on tests until week 9-10, how do you plan to
> establish that the work you do in week 1 actually works?
>
> > Note: Work on Oracle and GIS may not be possible as part of GSoC
> >
> > I will personally consider my project to be successful if I have created
> and
> > tested at least the base API + PostgreSQL extension and inspection +
> version
> > tools.
>
> If that's the case, then why does your schedule say you're going to
> complete MySQL and SQLite, and possibly Oracle as well?
>
> I can see that you're obviously enthused by this project, but as it
> stands, I can't say this is a very compelling proposal.
>
>  * It ignores the most recent activity in the area (last year's GSoC, in
> particular)
>
>  * It is extremely light in detail on how some very big details (like your
> "versioning tools" will work)
>
>  * The proposed schedule reads more like a list of things you know you
> need to do, not a detailed work breakdown backed by realistic estimates.
>
> Thanks for taking the time to submit this proposal. I'd encourage you to
> have a second swing at this. Read the recent discussions on the topic; take
> a look at last year's GSoC proposal; and spend some time elaborating on the
> details that I've highlighted.
>
> Yours,
> Russ Magee %-)
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to