On 05/23/2017 01:10 PM, Octave J. Orgeron wrote:
Comments below..

On 5/21/2017 1:38 PM, Monty Taylor wrote:

For example: An HA strategy using slave promotion and a VIP that points at the current write master paired with an application incorrectly configured to do such a thing can lead to writes to the wrong host after a failover event and an application that seems to be running fine until the data turns up weird after a while.

This is definitely a more complicated area that becomes more and more specific to the clustering technology being used. Galera vs. MySQL Cluster is a good example. Galera has an active/passive architecture where the above issues become a concern for sure.

This is not my understanding; Galera is multi-master and if you lose a node, you don't lose any committed transactions; the writesets are validated as acceptable by, and pushed out to all nodes before your commit succeeds. There's an option to make it wait until all those writesets are fully written to disk as well, but even with that option flipped off, if you COMMIT to one node then that node explodes, you lose nothing. your writesets have been verified as will be accepted by all the other nodes.

active/active is the second bullet point on the main homepage: http://galeracluster.com/products/



In the "active" approach, we still document expectations, but we also validate them. If they are not what we expect but can be changed at runtime, we change them overriding conflicting environmental config, and if we can't, we hard-stop indicating an unsuitable environment. Rather than providing helper tools, we perform the steps needed ourselves, in the order they need to be performed, ensuring that they are done in the manner in which they need to be done.

This might be a trickier situation, especially if the database(s) are in a separate or dedicated environment that the OpenStack service processes don't have access to. Of course for SQL commands, this isn't a problem. But changing the configuration files and restarting the database may be a harder thing to expect.

nevertheless the HA setup within tripleo does do this, currently using Pacemaker and resource agents. This is within the scope of at least parts of Openstack.



In either approach the OpenStack service has to be able to talk to both old and new versions of the schema. And in either approach we need to make sure to limit the schema change operations to the set that can be accomplished in an online fashion. We also have to be careful to not start writing values to new columns until all of the nodes have been updated, because the replication stream can't replicate the new column value to nodes that don't have the new column.

This is another area where something like MySQL Cluster (NDB) would operate differently because it's an active/active architecture. So limiting the number of online changes while a table is locked across the cluster would be very important. There is also the timeouts for the applications to consider, something that could be abstracted again with oslo.db.

So the DDL we do on Galera, to confirm but also clarify Monty's point, is under the realm of "total order isolation", which means it's going to hold up the whole cluster while DDL is applied to all nodes. Monty says this disqualifies it as an "online upgrade", which is because if you emitted DDL that had to run default values into a million rows then your whole cluster would temporarily have to wait for that to happen; we handle that by making sure we don't do migrations with that kind of data requirement and while yes, the DB has to wait for a schema change to apply, they are at least very short (in theory). For practical purposes, it is *mostly* an "online" style of migration because all the services that talk to the database can keep on talking to the database without being stopped, upgraded to new software version, and restarted, which IMO is what's really hard about "online" upgrades. It does mean that services will just have a little more latency while operations proceed. Maybe we need a new term called "quasi-online" or something like that.

Facebook has released a Python version of their "online" schema migration tool for MySQL which does the full blown "create a new, blank table" approach, e.g. which contains the newer version of the schema, so that nothing at all stops or slows down at all. And then to manage between the two tables while everything is running it also makes a "change capture" table to keep track of what's going on, and then to wire it all together it uses...triggers! https://github.com/facebookincubator/OnlineSchemaChange/wiki/How-OSC-works. Crazy Facebook kids. How we know that "make two more tables and wire it all together with new triggers" in fact is more performant than just, "add a column to the table", I'm not sure how/when they make that determination. I don't see an Openstack cluster as quite the same thing as hosting a site like Facebook so I lean towards the more liberal interpretation of "online upgrades".




* Versions

It's worth noting that behavior for schema updates and other things change over time with backend database version. We set minimum versions of other things, like libvirt and OVS - so we might also want to set minimum versions for what we can support in the database. That way we can know for a given release of OpenStack what DDL operations are safe to use for a rolling upgrade and what are not. That means detecting such a version and potentially refusing to perform an upgrade if the version isn't acceptable. That reduces the operator's ability to choose what version of the database software to run, but increases our ability to be able to provide tooling and operations that we can be confident will work.

Validating the MySQL database version is a good idea. The features do change over time. A good example is how in 5.7, you'll get warnings about duplicate indexes being dropped in a future release which will definitely affect multiple services today.


== Summary ==

These are just a couple of examples - but I hope they're at least mildly useful to explain some of the sorts of issues at hand - and why I think we need to clarify what our intent is separate from the issue of what databases we "support".

Some operations have one and only one "right" way to be done. For those operations if we take an 'active' approach, we can implement them once and not make all of our deployers and distributors each implement and run them. However, there is a cost to that. Automatic and prescriptive behavior has a higher dev cost that is proportional to the number of supported architectures. This then implies a need to limit deployer architecture choices.

On the other hand, taking an 'external' approach allows us to federate the work of supporting the different architectures to the deployers. This means more work on the deployer's part, but also potentially a greater amount of freedom on their part to deploy supporting services the way they want. It means that some of the things that have been requested of us - such as easier operation and an increase in the number of things that can be upgraded with no-downtime - might become prohibitively costly for us to implement.

I honestly think that both are acceptable choices we can make and that for any given topic there are middle grounds to be found at any given moment in time.

BUT - without a decision as to what our long-term philosophical intent in this space is that is clear and understandable to everyone, we cannot have successful discussions about the impact of implementation choices, since we will not have a shared understanding of the problem space or the solutions we're talking about.

For my part - I hear complaints that OpenStack is 'difficult' to operate and requests for us to make it easier. This is why I have been advocating some actions that are clearly rooted in an 'active' worldview.

Finally, this is focused on the database layer but similar questions arise in other places. What is our philosophy on prescriptive/active choices on our part coupled with automated action and ease of operation vs. expanded choices for the deployer at the expense of configuration and operational complexity. For now let's see if we can answer it for databases, and see where that gets us.

Thanks for reading.

Monty

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to