Hi

I think this is a great idea! I'm just wondering whether there aren't existing 
projects that could help us with this (something like sqlalchemy-migrate...)

Cheers
David

----- Original Message -----
From: "Nathan Jones" <nat...@ncjones.com>
To: "pytrainer developers list" <pytrainer-devel@lists.sourceforge.net>
Sent: Friday, May 27, 2011 3:53:53 PM
Subject: [Pytrainer-devel] Automated data migration

Currently Pytrainer has limited automatic database schema migration
functionality. It handles adding of tables and columns but not
renaming or deleting and not data migration. It needs to handle data
migration so I can clean invalid data as part of an upgrade and
introduce strict value checking.

I'm proposing to implement automated data migration by defining an
upgrade script for each version that contains schema changes,
disallowing schema changes in branch versions and replacing the DB
version number with an application version number. Some explanation is
provided below.

Auto migration basically requires two things: that the schema version
is stored with the data; and that the application contains a list of
steps for upgrading between any consecutive versions. Thus, when the
application starts up, it can check the schema version of the existing
data and run the upgrade steps for all subsequent versions up to the
current version. This is pretty straight forward but there are a
couple of issues that need considering: upgrading branch versions and
upgrading configuration.

Upgrading of branch versions means upgrading from a more recent
release of an earlier version (the branched version) to an older
release of a later version. For example, if versions 1.8, 1.9 and
1.8.1 are released in that order then 1.8.1 is the branched version.
It should be possible for a user to upgrade to any later (higher
numbered) version, but, since the branch version is chronologically
more recent, the later version can not provide a specific upgrade path
for it like it can for all versions that are its direct ancestors.
Note that this scenario only causes a problem if the branch version
contains schema changes. This problem, as far as I can tell, has three
solutions:

1. Do not allow upgrading from a branch version to a later version
that was released chronologically earlier and every time a branch
version with schema changes is released, release a new latest version
which provides an upgrade path for the branch version.
2. Only allow branch versions to contain schema changes that are
back-ported from trunk and describe all schema changes with enough
granularity so that they can be applied only if not applied already.
3. Do not make schema changes in branch versions.

Option one gives the developer freedom to make any change they wish on
a branch version but sacrifices flexibility in upgrade paths as well
as making the upgrade paths more complex. Option two provides the best
balance between allowing some schema changes in branch versions and
having simple upgrade paths. Option three is the simplest to implement
and still guarantees an upgrade path to any later version.

Not allowing schema changes in branch versions is my preferred option.
It will mean that there needs to be discipline so as to not break
upgrade paths when branching. I think this is good enough for us
though since, in the history of Pytrainer, there has never been a
branch release anyway.

Configuration, just like data, also may need to be migrated
occasionally. It is not required for what I am trying to achieve at
the moment but I think it is highly likely in the future that we need
to move or delete parts the application configuration. Configuration
migration is exactly the same problem as data migration - the
persistence mechanism is really the only difference - so the
application configuration needs to contain a configuration version
number and each application version with configuration changes must
have a configuration upgrade script. I think it makes sense to
consolidate the data and configuration migration together. The
implication being that, instead of having separate database schema and
configuration version numbers, there is just a single application
version number used by both and each application version upgrade
script will contain a description of both the configuration changes
and data changes.

For anyone interested in what an automated data migration
implementation might look like, see the Trac source code:
 * http://trac.edgewall.org/browser//trunk/trac/env.py
 * http://trac.edgewall.org/browser//trunk/trac/upgrades

 - Nathan

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Pytrainer-devel mailing list
Pytrainer-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytrainer-devel

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Pytrainer-devel mailing list
Pytrainer-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytrainer-devel

Reply via email to