Re: Schema migration process

John Meinel Thu, 12 Jun 2014 02:34:26 -0700

If I read the conversations on IRC, they were talking about changing the
backup to be just a POST to an HTTP endpoint, and you get back the contents
of the DB, which would be deleted when the backup completes. Though you
could still probably use whatever internal helpers spool the data to a temp
location to do the same for backup & restore.



On Thu, Jun 12, 2014 at 8:40 AM, Menno Smits <menno.sm...@canonical.com>
wrote:

> I've updated the schema migration document with the ideas that have come
> up in recent discussions.The scope of the schema migrations work has been
> reduced somewhat by making the upgrade step Apply/Rollback concept a
> separate project (database changes can be rolled back through the use of
> mongobackup/restore).
>
> I've raised a few issues in the comments about handling various failure
> modes. Input would be greatly appreciated.
>
> Nate: it would be good for you to have a look at this because we're
> planning on leaning on the new backup functionality quite a bit. Let me
> know if anything I'm proposing isn't compatible with what your team is
> working on.
>
>
> https://docs.google.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>
> On 6 June 2014 13:18, Menno Smits <menno.sm...@canonical.com> wrote:
>
>> After some fruitful discussions, Tim and I have come up with something
>> that I think is starting to look pretty good. There's a significant change
>> to how we handle backups and rollbacks that seems like the right direction.
>> I've tried to capture it all in a Google Doc as this email thread is
>> starting to get impractical. Feel free to add comments and edit.
>>
>>
>> https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>
>>
>> On 3 June 2014 13:34, Menno Smits <menno.sm...@canonical.com> wrote:
>>
>>> On 30 May 2014 01:47, John Meinel <j...@arbash-meinel.com> wrote:
>>>
>>>>
>>>>
>>>>> Building on John's thoughts, and adding Tim's and mine, here's what
>>>>> I've got so far::
>>>>>
>>>>> - Introduce a "database-version" key into the EnvironConfig document
>>>>> which tracks the Juju version that the database schema matches. More on
>>>>> this later.
>>>>>
>>>>
>>>> For clarity, I would probably avoid putting this key into
>>>> EnvironConfig, but instead have it in a separate document. That also makes
>>>> it easy to watch for just this value changing.
>>>>
>>>
>>> SGTM. I've got no strong opinion on this.
>>>
>>>
>>>>
>>>> Potentially, I would decouple the value in this key from the actual
>>>> agent versions. Otherwise you do null DB schema upgrades on every minor
>>>> release. Maybe that's sane, but it *feels* like they are too separate
>>>> issues. (what is the version of the DB schema is orthogonal to what version
>>>> of the code I'm running.) It may be that the clarity and simplification of
>>>> just one version wins out.
>>>>
>>>
>>> I think it makes sense to just use the Juju version for the DB schema
>>> version. When you think about it, the DB schema is actually quite tightly
>>> coupled to the code version so why introduce another set of numbers to
>>> track? I'm thinking that if there's no schema upgrade steps required for a
>>> software given version then the DB is left alone except that the schema
>>> version number gets bumped.
>>>
>>>
>>>> - Introduce a MasterStateServer upgrade target which marks upgrade
>>>>> steps which are only to run on the master state server. Also more below.
>>>>>
>>>>
>>>> This is just a compiled-in list of steps to apply, right?
>>>>
>>>
>>> Yes. I was thinking that schema upgrade steps would be defined in the
>>> same place and way that other upgrade steps are currently defined so that
>>> they could even be interleaved with other kinds of upgrade steps.
>>>
>>> What I'm proposing here is that where we currently have 2 types of
>>> upgrade targets - AllMachines and StateServer - we introduce a third target
>>> called MasterStateServer which would be primarily (exclusively?) used for
>>> schema migration steps.
>>>
>>>
>>>>> - Non-master JobManageEnviron machine agents run their upgrade steps
>>>>> as usual and then watch for EnvironConfig changes. They don't consider the
>>>>> upgrade to be complete (and therefore let their other workers start) until
>>>>> database-version matches agent-version. This prevents the new version of
>>>>> the state server agents from running before the schema migrations for the
>>>>> new software version have run.
>>>>>
>>>>
>>>> I'm not sure if schema should be done before or after other upgrade
>>>> steps. Given we're really stopping the world here, it might be prudent to
>>>> just wait to do your upgrade steps until you know that the DB upgrade has
>>>> been done.
>>>>
>>>
>>> As mentioned above, with what I'm thinking there is no real distinction
>>> between schema migration steps and other types of upgrade steps so there's
>>> no concept of schema migrations happening before or after other upgrade
>>> steps.
>>>
>>>   *Observations/Questions/Issues*
>>>>
>>>>>
>>>>> - There are a lot of moving parts here. What could be made simpler?
>>>>>
>>>>> - What do we do if the master mongo database or host fails during the
>>>>> upgrade? Is it a goal for one of the other state servers take over and run
>>>>> the schema upgrades itself and let the upgrade finish? If so, is this a
>>>>> must-have up-front requirement or a nice-to-have?
>>>>>
>>>>
>>>> Some thoughts:
>>>>
>>>
>>>
>>>> 1. If the actual master mongo DB fails, that will cause reelection,
>>>> which should cause all of the servers to get their connections to Mongo
>>>> bounced, and then they'll notice that there is a new master who is
>>>> responsible for applying the database changes.
>>>>
>>>
>>>  We will have to do some testing to ensure that this scenario actually
>>> works. Maybe I'm over thinking it, but my gut says there's there's plenty
>>> to go wrong here.
>>>
>>> 2. If it is just the master Juju process that fails, I don't think there
>>>> is any great expectation that a different process running the same code is
>>>> going to succeed, is there?
>>>>
>>>
>>> Agreed.
>>>
>>>
>>>> 3. There is also a fair possibility that the schema migration we've
>>>> written won't work with real data in the wild. (we assumed this field was
>>>> never written, but suddenly it is, etc). We've talked about the ability to
>>>> have Upgrade roll back, and maybe we could consider that here. Some
>>>> possible steps are:
>>>>
>>>>
>>>>    1. Copy the db to another location
>>>>    2. Try to apply the schema updates (either in place or only to the
>>>>    backup)
>>>>    3. If upgrade fails, roll back to the old version, and update the
>>>>    AgentVersion in environ config so that the other agents will try to
>>>>    "upgrade" themselves back to the old version. This would also be a 
>>>> reason
>>>>    to do the DB schema before actually applying any other upgrade steps. We
>>>>    probably want some sort of "could not upgrade because of" tracking 
>>>> here, so
>>>>    that it can be reported to the user
>>>>
>>>>
>>> I like this and it should work as long as there's enough storage
>>> available to make a copy of the database. I'm not exactly clear on how we
>>> would revert to the backup instance if the migration fails but I'm sure
>>> this can be made to work. It might be enough for the first iteration if we
>>> initially make some kind of backup that the user has access to that they
>>> can restore from manually.
>>>
>>> As you mention, this would benefit from the DB schema steps being
>>> separate from the other upgrade steps. I have no real issue with this other
>>> than having them separate will probably mean more change to the existing
>>> upgrades package. This voids some of the things I've said earlier in this
>>> email :-)  I'll think some more about how this could look.
>>>
>>> 4. As long as we do some sort of "backup before applying the change" we
>>>> allow users a way to recover the system if something failed. If we have
>>>> proper Backup support integrated into core, one option is that we just
>>>> trigger a backup and then upgrade in place, if stuff breaks, we at least
>>>> have *something* that should be recoverable.
>>>>
>>>
>>> It's a pity that the full Backup feature isn't there yet as this could
>>> be a nice way to get a first version of schema migrations working quickly.
>>>
>>>>
>>>>
>>>>
>>>>> - Upgrade steps currently have access to State but I think this
>>>>> probably won't be sufficient to perform many types of schema migrations
>>>>> (i.e. accessing defunct fields, removing fields, adding indexes etc). Do 
>>>>> we
>>>>> want to extend State to provide a number of schema migration helpers or do
>>>>> we expose mongo connections directly to the upgrade steps?
>>>>>
>>>>
>>>> I believe the existing Upgrade logic actually has access to the API not
>>>> to State itself, so we'll need something there. The State object has raw
>>>> mongo collections on it (environs, charms, etc).
>>>>
>>>
>>> The existing upgrade logic has access to both the API and State (the
>>> latter only on state machines obviously, that arg is nil otherwise) so
>>> that's already done.
>>>
>>>
>>>> DB Schema (IMO) inherently is going to be at the raw DB level, vs
>>>> changes in the abstract objects. (I expect that it will be defined in terms
>>>> of Apply this function to all entities in this collection, rather than
>>>> iterate over Machine objects and set data on them.)
>>>> I could be wrong, but it does seem like we'll want the syntax of db
>>>> schema changes to be on mgo.Collection objects, and not on State objects.
>>>>
>>>
>>> I completely agree that we need schema migrations to work in the mongodb
>>> world and not via application level objects. Some schema migration tasks
>>> just won't make sense at the application object level.
>>>
>>> State doesn't expose its mgo collections to the outside though so how
>>> would a schema migration step interact with them, especially for tasks such
>>> as adding new collections or indexes? Do we add a bunch of schema migration
>>> helper methods on to State (e.g. AddCollection(), AddIndex(),
>>> ApplyToCollection() etc) or do we add a single method which exposes the
>>> mongo database object (clearly marked as exclusively there for use by
>>> schema upgrade steps), or do we have schema migration steps pass a function
>>> that takes a mongo DB object to act on? We already expose the mongo session
>>> with MongoSession() so there is some precedent for this.
>>>
>>>
>>>>
>>>>> - There is a possibility that a non-master state server won't upgrade,
>>>>> blocking the master from completing the upgrade. Should there be a timeout
>>>>> before the master gives up on state servers upgrading themselves and
>>>>> performs its own upgrade steps anyway?
>>>>>
>>>>
>>>> Arguably this is a better case for "rollback" than "just move forward".
>>>>
>>>
>>> Ok - sounds good.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> - Given the order of documents a juju system stores, it's likely that
>>>>> the schema migration steps will be quite quick, even for a large
>>>>> installation.
>>>>>
>>>>>
>>>> "order of magnitude" right?
>>>>
>>>
>>> Yes - sorry that wasn't very clear.
>>>
>>>
>>>> Yeah, we're talking megabytes, GB being really large, not many GB of
>>>> data.
>>>>
>>>
>>> Great.
>>>
>>> Thanks for the excellent feedback.
>>>
>>> - Menno
>>>
>>>
>>
>

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Schema migration process

Reply via email to