On 5/21/15 12:28 PM, Jay Pipes wrote:
Hi Mike,

I had some initial concerns around the online db schema migration work as you do below. However, once I realized one big thing, those concerns were allayed.

Here's the big thing that you're missing: no data migrations are allowed any more in DB migration scripts.

Yes, that sounds exceedingly weird, I know, but hear me out in the comments inline... :) I've trimmed your original email for brevity.


And this is where you're missing the important piece :) While it *is* true that the expand phase would combine the *DDL* schema migrations into just a final table structure as shown above, it is actually *not* true that the knowledge is lost of how the model schemas changed.

The key to all of this is that each change to the DB schema would be in a patch that contained corresponding changes to the nova.objects.* classes that are used to represent the data. And it is *these* object classes that have object versioning and model changes tracked in them.

Ah. OK, I didn't even have to read the rest of that to understand (well, at least to a minimal degree, as I think more about it) what you mean. You can do data migrations via the API, which is versioned, and in between expand and contract, both APIs are available. (Goes to read the next several paragraphs to confirm that's what you mean).



So, for instance, let's say the widget class was represented in a nova.objects.widget.Widget model in the following code, before the patch that added the #1 schema migration:
...
Over time, the modified_timestamp DB field will be populated and there will be no need to run a single data migrationin any SQLAlchemy-migrate or Alembic script...

OK, I think my understanding is correct, though I'm not sure what "over time" means.... is there a "migrate the data via API version 1.0 -> 2.0" step somewhere? Or is the idea that, "over time" definitely is finished before "contract" is ever run, right?


OK, now here comes another developer and she adds DB schema migration #2 which adds the status_code field to the widgets table. Note that the widget_status table will still be around! The contract phase, which would remove the widget_status table isn't going to be run until the DBA has determined that all data migrations that happen in the Nova objects have occurred...
OK, not knowing the specifics, you just said, "all data migrations that happen in the Nova objects", so OK, there *are* data migrations that someone has to run, somewhere. They are between API version numbers, and the headache Nova has is that they have to constantly update these objects to maintain cross-API-version compatibility whenever the schema changes, OK.

The Widget class would look like this in the patch that adds the #2 DB schema migration:


 class Widget(base.NovaPersistentObject):
     VERSION = '1.1'
     fields = {
         'id': fields.IntegerField(),
         'name': fields.StringField(),
         'modified_timestamp': fields.DatetimeField(),
         'status_code': fields.IntegerField()
     }

     @staticmethod
     def _from_db_object(context, widget, db_obj):
         # db_obj is what comes back from the DB API call
         # widget_get(). The SQL for this call will look like this:
         #
         # SELECT id, modified_date, modified_timestamp,
         #   w.status_code, ws.status_code as ws_status_code
         # FROM widgets w JOIN widget_status ws ON id = widget_id
         # WHERE w.id = ?
         #
         # The returned data will look like this:
         # {'id': 1, 'name': 'my widget',
         #  'modified_date': '20150601',
         #  'modified_timestamp': None or datetime,
         #  'status_code': None or integer,
         #  'ws_status_code': None or integer
         for f in ('id', 'name'):
             widget[f] = db_obj[f]
         ts = db_obj['modified_timestamp']
         if ts is None:
             ts = datetime.datetime.strftime(db_obj['modified_date'])
         widget['modified_timestamp'] = ts
         sc = db_obj['status_code']
         if sc is None:
             sc = db_obj['ws_status_code']

OK now I...well I thought I really get it, but still seems like something I'm missing because this seems still very wrong: You not only can't do data migrations in the traditional way anymore, you have to write *all of your model code to be compatible with all versions of the schema at all times*. That is, moving tables, or even columns, around and such, very difficult (if not impossible?). For example. Your _from_db_object() above calls into a method that has to emit a JOIN from "widget" to "widget_status". Meanwhile, the migration we really want to do is that we "DROP" widget_status in the "contract" phase. My understanding is, a Nova application that is in the "migrate" stage can be sent through the "contract" stage with no code changes, correct? Then how can you possibly write the model code that queries for "widget JOIN widget_status" when on one side of "contract", "widget_status" exists, and on the other, "widget_status" does *not* exist? The Nova objects layer doesn't do the SQL queries, it calls into sqlalchemy/api.py ultimately.

Put another way, if I want to do a migration that someday, "over time", eventually, etc., will drop the "widget_status" table and merge it into the "widget" table, how do I write the "get_widget()" DB API call ? Same for just dropping a column. If we have to write code that will return the value of the old table / old column while it exists, how does that part work? Somewhere, there has to be code that in one case says: "query(Widget).join(WidgetStatus)" and in other case says "query(Widget)". You have to maintain *both* SQL queries at the same time?


Also, I'd note that the "online schema migrations" concept that other projects are undoubtedly going to look at apparently has an *enormous* dependency on the fact that a project uses oslo.objects to maintain all data *completely*. This is not at all mentioned in the blueprint which IMO is a pretty giant omission.



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to