Currently Pytrainer has limited automatic database schema migration functionality. It handles adding of tables and columns but not renaming or deleting and not data migration. It needs to handle data migration so I can clean invalid data as part of an upgrade and introduce strict value checking.
I'm proposing to implement automated data migration by defining an upgrade script for each version that contains schema changes, disallowing schema changes in branch versions and replacing the DB version number with an application version number. Some explanation is provided below. Auto migration basically requires two things: that the schema version is stored with the data; and that the application contains a list of steps for upgrading between any consecutive versions. Thus, when the application starts up, it can check the schema version of the existing data and run the upgrade steps for all subsequent versions up to the current version. This is pretty straight forward but there are a couple of issues that need considering: upgrading branch versions and upgrading configuration. Upgrading of branch versions means upgrading from a more recent release of an earlier version (the branched version) to an older release of a later version. For example, if versions 1.8, 1.9 and 1.8.1 are released in that order then 1.8.1 is the branched version. It should be possible for a user to upgrade to any later (higher numbered) version, but, since the branch version is chronologically more recent, the later version can not provide a specific upgrade path for it like it can for all versions that are its direct ancestors. Note that this scenario only causes a problem if the branch version contains schema changes. This problem, as far as I can tell, has three solutions: 1. Do not allow upgrading from a branch version to a later version that was released chronologically earlier and every time a branch version with schema changes is released, release a new latest version which provides an upgrade path for the branch version. 2. Only allow branch versions to contain schema changes that are back-ported from trunk and describe all schema changes with enough granularity so that they can be applied only if not applied already. 3. Do not make schema changes in branch versions. Option one gives the developer freedom to make any change they wish on a branch version but sacrifices flexibility in upgrade paths as well as making the upgrade paths more complex. Option two provides the best balance between allowing some schema changes in branch versions and having simple upgrade paths. Option three is the simplest to implement and still guarantees an upgrade path to any later version. Not allowing schema changes in branch versions is my preferred option. It will mean that there needs to be discipline so as to not break upgrade paths when branching. I think this is good enough for us though since, in the history of Pytrainer, there has never been a branch release anyway. Configuration, just like data, also may need to be migrated occasionally. It is not required for what I am trying to achieve at the moment but I think it is highly likely in the future that we need to move or delete parts the application configuration. Configuration migration is exactly the same problem as data migration - the persistence mechanism is really the only difference - so the application configuration needs to contain a configuration version number and each application version with configuration changes must have a configuration upgrade script. I think it makes sense to consolidate the data and configuration migration together. The implication being that, instead of having separate database schema and configuration version numbers, there is just a single application version number used by both and each application version upgrade script will contain a description of both the configuration changes and data changes. For anyone interested in what an automated data migration implementation might look like, see the Trac source code: * http://trac.edgewall.org/browser//trunk/trac/env.py * http://trac.edgewall.org/browser//trunk/trac/upgrades - Nathan ------------------------------------------------------------------------------ vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ Pytrainer-devel mailing list Pytrainer-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytrainer-devel