Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/23/2017 07:07 PM, Boris Pavlovic wrote: And how can someone, that is trying to deploy OpenStack, understand/find the right config for db? Or it's Ops tasks and community doesn't care about them? Neither. It's the ops responsibility to understand database configuration fundamentals (since, you know, they are operating a database...). It's the dev community's responsibility to document necessary configuration settings, provide reasonable defaults and advise on how/when ops should adjust those defaults. It's a give and take. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
Zane, > This is your periodic reminder that we have ~50 applications sharing the > same database and not only do none of them know how the deployer will > configure the database, most will not even have an idea which set of > assumptions the other ~49 are making about how the deployer will configure > the database. And how can someone, that is trying to deploy OpenStack, understand/find the right config for db? Or it's Ops tasks and community doesn't care about them? I would better give to Ops one config and say that everything should work with it, and find the way to align everybody in community and make it default for all projects. Best regards, Boris Pavlovic On Tue, May 23, 2017 at 2:18 PM, Zane Bitterwrote: > On 21/05/17 15:38, Monty Taylor wrote: > >> One might argue that HA strategies are an operator concern, but in >> reality the set of workable HA strategies is tightly constrained by how >> the application works, and the pairing an application expecting one HA >> strategy with a deployment implementing a different one can have >> negative results ranging from unexpected downtime to data corruption. >> > > This is your periodic reminder that we have ~50 applications sharing the > same database and not only do none of them know how the deployer will > configure the database, most will not even have an idea which set of > assumptions the other ~49 are making about how the deployer will configure > the database. > > (Ditto for RabbitMQ.) > > - ZB > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 21/05/17 15:38, Monty Taylor wrote: One might argue that HA strategies are an operator concern, but in reality the set of workable HA strategies is tightly constrained by how the application works, and the pairing an application expecting one HA strategy with a deployment implementing a different one can have negative results ranging from unexpected downtime to data corruption. This is your periodic reminder that we have ~50 applications sharing the same database and not only do none of them know how the deployer will configure the database, most will not even have an idea which set of assumptions the other ~49 are making about how the deployer will configure the database. (Ditto for RabbitMQ.) - ZB __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/23/2017 03:16 PM, Edward Leafe wrote: On May 23, 2017, at 1:43 PM, Jay Pipeswrote: [1] Witness the join constructs in Golang in Kubernetes as they work around etcd not being a relational data store: Maybe it’s just me, but I found that Go code more understandable than some of the SQL we are using in the placement engine. :) I assume that the SQL in a relational engine is faster than the same thing in code, but is that difference significant? For extremely large data sets I think that the database processing may be rate limiting, but is that the case here? Sometimes it seems that we are overly obsessed with optimizing data handling when the amount of data is relatively small. A few million records should be fast enough using just about anything. When you write your app fresh, put some data into it, a few hundred rows, not at all. Pull it all into memory and sort/filter all you want, SQL is too hard. Push it to production! works great. send the customer your bill. 6 months later. Customer has 10K rows. The tools their contractor wrote seem a little sticky.Not sure when that happened? A year later. Customer is at 300K rows, nowhere near "a few million" records. Application regularly crashes when asked to search and filter results. Because Python interpreter uses a fair amount of memory for a result set, multiplied by the overhead of Python object() / dict() per row == 100's / 1000's of megs of memory to have 30 objects in memory all at once. Multiply by dozens of threads / processes handling concurrent requests, Python interpreter rarely returns memory. Then add latency of fetching 300K rows over the wire, converting to objects. Concurrent requests pile up because they're slower; == more processes, == more memory. New contractor is called in to rewrite the whole thing in MongoDB. Now it's fast again! Proceed to chapter 2, "So you decided to use MongoDB" :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/23/2017 03:16 PM, Edward Leafe wrote: On May 23, 2017, at 1:43 PM, Jay Pipeswrote: [1] Witness the join constructs in Golang in Kubernetes as they work around etcd not being a relational data store: Maybe it’s just me, but I found that Go code more understandable than some of the SQL we are using in the placement engine. :) Err, apples, oranges. The Golang code is doing a single JOIN operation. The placement API is doing dozens of join operations, aggregate operations, and more. I assume that the SQL in a relational engine is faster than the same thing in code, but is that difference significant? For extremely large data sets I think that the database processing may be rate limiting, but is that the case here? Sometimes it seems that we are overly obsessed with optimizing data handling when the amount of data is relatively small. A few million records should be fast enough using just about anything. You are more than welcome to implement the placement API in etcd or Cassandra, Ed. :) Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On May 23, 2017, at 1:43 PM, Jay Pipeswrote: > [1] Witness the join constructs in Golang in Kubernetes as they work around > etcd not being a relational data store: Maybe it’s just me, but I found that Go code more understandable than some of the SQL we are using in the placement engine. :) I assume that the SQL in a relational engine is faster than the same thing in code, but is that difference significant? For extremely large data sets I think that the database processing may be rate limiting, but is that the case here? Sometimes it seems that we are overly obsessed with optimizing data handling when the amount of data is relatively small. A few million records should be fast enough using just about anything. -- Ed Leafe __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On Tue, 23 May 2017, Jay Pipes wrote: Err, in my experience, having a *completely* dumb persistence layer -- i.e. one that tries to assuage the differences between, say, relational and non-relational stores -- is a recipe for disaster. The developer just ends up writing join constructs in that business layer instead of using a relational data store the way it is intended to be used. Same for aggregate operations. [1] Now, if what you're referring to is "don't use vendor-specific extensions in your persistence layer", then yes, I agree with you. If you've commited to doing an RDBMS then, yeah, stick with relational, but dumb relational. Since that's where we are [3] in OpenStack, then we should go with that. [3] Of course sometimes I'm sad that we made that commitment and instead we had an abstract storage interface, an implementation of which was stupid text files on disk, another which was generic sqlalchemy, and another which was raw SQL extracted wholesale from the mind of jaypipes, optimized for Drizzle 8.x. But then I'm often sad about completely unrealistic things. -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/23/2017 07:23 AM, Chris Dent wrote: That "higher dev cost" is one of my objections to the 'active' approach but it is another implication that worries me more. If we limit deployer architecture choices at the persistence layer then it seems very likely that we will be tempted to build more and more power and control into the persistence layer rather than in the so-called "business" layer. In my experience this is a recipe for ossification. The persistence layer needs to be dumb and replaceable. Err, in my experience, having a *completely* dumb persistence layer -- i.e. one that tries to assuage the differences between, say, relational and non-relational stores -- is a recipe for disaster. The developer just ends up writing join constructs in that business layer instead of using a relational data store the way it is intended to be used. Same for aggregate operations. [1] Now, if what you're referring to is "don't use vendor-specific extensions in your persistence layer", then yes, I agree with you. Best, -jay [1] Witness the join constructs in Golang in Kubernetes as they work around etcd not being a relational data store: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/deployment_controller.go#L528-L556 Instead of a single SQL statement: SELECT p.* FROM pods AS p JOIN deployments AS d ON p.deployment_id = d.id WHERE d.name = $name; the deployments controller code has to read every Pod message from etcd and loop through each Pod message, returning a list of Pods that match the deployment searched for. Similarly, Kubenetes API does not support any aggregate (SUM, GROUP BY, etc) functionality. Instead, clients are required to perform these kinds of calculations/operations in memory. This is because etcd, being an (awesome) key/value store is not designed for aggregate operations (just as Cassandra or CockroachDB do not allow most aggregate operations). My point here is not to denigrate Kubernetes. Far from it. They (to date) have a relatively shallow relational schema and doing join and index maintenance [2] operations in client-side code has so far been a cost that the project has been OK carrying. The point I'm trying to make is that the choice of data store semantics (relational or not, columnar or not, eventually-consistent or not, etc) *does make a difference* to the architecture of a project, its deployment and the amount of code that the project needs to keep to properly handle its data schema. There's no way -- in my experience -- to make a "persistence layer" that papers over these differences and ends up being useful. [2] In Kubernetes, all services are required to keep all relevant data in memory: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/principles.md This means that code that maintains a bunch of in-memory indexes of various data objects ends up being placed into every component, Here's an example of this in the kubelet (the equivalent-ish of the nova-compute daemon) pod manager, keeping an index of pods and mirrored pods in memory: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L104-L114 https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L159-L181 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/23/2017 01:10 PM, Octave J. Orgeron wrote: Comments below.. On 5/21/2017 1:38 PM, Monty Taylor wrote: For example: An HA strategy using slave promotion and a VIP that points at the current write master paired with an application incorrectly configured to do such a thing can lead to writes to the wrong host after a failover event and an application that seems to be running fine until the data turns up weird after a while. This is definitely a more complicated area that becomes more and more specific to the clustering technology being used. Galera vs. MySQL Cluster is a good example. Galera has an active/passive architecture where the above issues become a concern for sure. This is not my understanding; Galera is multi-master and if you lose a node, you don't lose any committed transactions; the writesets are validated as acceptable by, and pushed out to all nodes before your commit succeeds. There's an option to make it wait until all those writesets are fully written to disk as well, but even with that option flipped off, if you COMMIT to one node then that node explodes, you lose nothing. your writesets have been verified as will be accepted by all the other nodes. active/active is the second bullet point on the main homepage: http://galeracluster.com/products/ In the "active" approach, we still document expectations, but we also validate them. If they are not what we expect but can be changed at runtime, we change them overriding conflicting environmental config, and if we can't, we hard-stop indicating an unsuitable environment. Rather than providing helper tools, we perform the steps needed ourselves, in the order they need to be performed, ensuring that they are done in the manner in which they need to be done. This might be a trickier situation, especially if the database(s) are in a separate or dedicated environment that the OpenStack service processes don't have access to. Of course for SQL commands, this isn't a problem. But changing the configuration files and restarting the database may be a harder thing to expect. nevertheless the HA setup within tripleo does do this, currently using Pacemaker and resource agents.This is within the scope of at least parts of Openstack. In either approach the OpenStack service has to be able to talk to both old and new versions of the schema. And in either approach we need to make sure to limit the schema change operations to the set that can be accomplished in an online fashion. We also have to be careful to not start writing values to new columns until all of the nodes have been updated, because the replication stream can't replicate the new column value to nodes that don't have the new column. This is another area where something like MySQL Cluster (NDB) would operate differently because it's an active/active architecture. So limiting the number of online changes while a table is locked across the cluster would be very important. There is also the timeouts for the applications to consider, something that could be abstracted again with oslo.db. So the DDL we do on Galera, to confirm but also clarify Monty's point, is under the realm of "total order isolation", which means it's going to hold up the whole cluster while DDL is applied to all nodes. Monty says this disqualifies it as an "online upgrade", which is because if you emitted DDL that had to run default values into a million rows then your whole cluster would temporarily have to wait for that to happen; we handle that by making sure we don't do migrations with that kind of data requirement and while yes, the DB has to wait for a schema change to apply, they are at least very short (in theory). For practical purposes, it is *mostly* an "online" style of migration because all the services that talk to the database can keep on talking to the database without being stopped, upgraded to new software version, and restarted, which IMO is what's really hard about "online" upgrades. It does mean that services will just have a little more latency while operations proceed. Maybe we need a new term called "quasi-online" or something like that. Facebook has released a Python version of their "online" schema migration tool for MySQL which does the full blown "create a new, blank table" approach, e.g. which contains the newer version of the schema, so that nothing at all stops or slows down at all. And then to manage between the two tables while everything is running it also makes a "change capture" table to keep track of what's going on, and then to wire it all together it uses...triggers! https://github.com/facebookincubator/OnlineSchemaChange/wiki/How-OSC-works. Crazy Facebook kids. How we know that "make two more tables and wire it all together with new triggers" in fact is more performant than just, "add a column to the table", I'm not sure how/when they make that determination. I don't see
Re: [openstack-dev] [tc] Active or passive role with our database layer
Comments below.. On 5/21/2017 1:38 PM, Monty Taylor wrote: Hi all! As the discussion around PostgreSQL has progressed, it has come clear to me that there is a decently deep philosophical question on which we do not currently share either definition or agreement. I believe that the lack of clarity on this point is one of the things that makes the PostgreSQL conversation difficult. I believe the question is between these two things: * Should OpenStack assume the existence of an external database service that it treat as an black-box on the other side of a connection string? * Should OpenStack take an active and/or opinionated role in managing the database service? A potentially obvious question about that (asked by Mike Bayer in a different thread) is: "what do you mean by managing?" What I mean by managing is doing all of the things you can do related to database operational controls short of installing the software, writing the basic db config files to disk and stopping and starting the services. It means being much more prescriptive about what types of config we support, validating config settings that cannot be overridden at runtime and refusing to operate if they are unworkable. I think it's helpful and important for us to have automation tooling like tripleo, puppet, etc. that can stand up a MySQL database. But we also have to realize that as shops mature, they will deploy more complicated database topologies, clustered configurations, and replication scenarios. So I think we shouldn't go overboard with being prescriptive. We also have to realize that in the enterprise space, databases are usually deployed and managed by a separate database team, which means less control over that layer. So we shouldn't force people into this model. We should provide best practice documentation, examples (tripleo, puppet, ansible, etc.), and leave it up to the operator. Why would we want to be 'more active'? When managing and tuning databases, there are some things that are driven by the environment and some things that are driven by the application. Things that are driven by the environment include things like the amount of RAM actually available, whether or not the machines running the database are dedicated or shared, firewall settings, selinux settings and what versions of software are available. This is a good example of an area that we should focus on documenting best practices and leave it to the operator to implement. Guidelines around cpu, memory, security settings, tunables, etc. are what's needed here. Today, there isn't really any guidance or best practices on even sizing the database(s) for a given deployment size. Things that are driven by the application are things like character set and collation, schema design, data types, schema upgrade and HA strategies. These are things that we can have a bit more control or direction on. One might argue that HA strategies are an operator concern, but in reality the set of workable HA strategies is tightly constrained by how the application works, and the pairing an application expecting one HA strategy with a deployment implementing a different one can have negative results ranging from unexpected downtime to data corruption. For example: An HA strategy using slave promotion and a VIP that points at the current write master paired with an application incorrectly configured to do such a thing can lead to writes to the wrong host after a failover event and an application that seems to be running fine until the data turns up weird after a while. This is definitely a more complicated area that becomes more and more specific to the clustering technology being used. Galera vs. MySQL Cluster is a good example. Galera has an active/passive architecture where the above issues become a concern for sure. While MySQL Cluster (NDB) is an active/active architecture, so losing a node only effects any uncommitted transactions, that could easily be addressed with a retry. These topologies will become more complicated as people start looking at cross regional replication and DR. For the areas in which the characteristics of the database are tied closely to the application behavior, there is a constrained set of valid choices at the database level. Sometimes that constrained set only has one member. The approach to those is what I'm talking about when I ask the question about "external" or "active". In the "external" approach, we document the expectations and then write the code assuming that the database is set up appropriately. We may provide some helper tools, such as 'nova-manage db sync' and documentation on the sequence of steps the operator should take. In the "active" approach, we still document expectations, but we also validate them. If they are not what we expect but can be changed at runtime, we change them overriding conflicting environmental config, and if we can't,
Re: [openstack-dev] [tc] Active or passive role with our database layer
On Tue, 23 May 2017, Sean Dague wrote: Do you have an example of an Open Source project that (after it was widely deployed) replaced their core storage engine for their existing users? That's not the point here. The point is that new deployments may choose to use a different one and old ones can choose to change if they like (but don't have to) if storage is abstracted. The notion of a "core storage engine" is not something that I see as currently existing in OpenStack. It is clear it is something that at least you and likely several other people would like to see. But it is most definitely not something we have now and as I responded to Monty, getting there from where we are now would be a huge undertaking with as yet unproven value [1]. I do get that when building more targeted things, this might be a value, but I don't see that as a useful design constraint for OpenStack. Completely the opposite from my point of view. When something is as frameworky as OpenStack is (perhaps accidently and probably unfortunately) then _of course_ replaceable DBs are the norm, expected, useful and potentially required to satisfy more use cases. Adding specialization (tier 1?) is probably something we want and want to encourage but it is not something we should build into the "core" of the "product". But there's that philosophical disagreement again. I'm not sure we can resolve that. What I'm hoping is that by starting the ball rolling other people will join in and people like you and me can step out of the way. [1] Of the issues described elsewhere in the thread the only one which seems to be a bit sticking point is the trigger thing, and there's significant disagreement on that being "okay". -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/23/2017 07:23 AM, Chris Dent wrote: >> Some operations have one and only one "right" way to be done. For >> those operations if we take an 'active' approach, we can implement >> them once and not make all of our deployers and distributors each >> implement and run them. However, there is a cost to that. Automatic >> and prescriptive behavior has a higher dev cost that is proportional >> to the number of supported architectures. This then implies a need to >> limit deployer architecture choices. > > That "higher dev cost" is one of my objections to the 'active' > approach but it is another implication that worries me more. If we > limit deployer architecture choices at the persistence layer then it > seems very likely that we will be tempted to build more and more > power and control into the persistence layer rather than in the > so-called "business" layer. In my experience this is a recipe for > ossification. The persistence layer needs to be dumb and > replaceable. Why? Do you have an example of an Open Source project that (after it was widely deployed) replaced their core storage engine for their existing users? I do get that when building more targeted things, this might be a value, but I don't see that as a useful design constraint for OpenStack. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On Sun, 21 May 2017, Monty Taylor wrote: As the discussion around PostgreSQL has progressed, it has come clear to me that there is a decently deep philosophical question on which we do not currently share either definition or agreement. I believe that the lack of clarity on this point is one of the things that makes the PostgreSQL conversation difficult. Good analysis. I think this does hit to at least some of the core differences, maybe even most. And as with so many other things we do in OpenStack, because we have landed somewhere in the middle between the two positions we find ourselves in a pickle (see, for example, the different needs for and attitudes to orchestration underlying this thread [1]). You're right to say we need to pick one and move in that direction but our standard struggles with reaching agreement across the entire community, especially on an opinionated position, will need to be overcome. Writing about it to make it visible is a good start. In the "external" approach, we document the expectations and then write the code assuming that the database is set up appropriately. We may provide some helper tools, such as 'nova-manage db sync' and documentation on the sequence of steps the operator should take. In the "active" approach, we still document expectations, but we also validate them. If they are not what we expect but can be changed at runtime, we change them overriding conflicting environmental config, and if we can't, we hard-stop indicating an unsuitable environment. Rather than providing helper tools, we perform the steps needed ourselves, in the order they need to be performed, ensuring that they are done in the manner in which they need to be done. I think there's a middle ground here which is "externalize but validate" which is: * document expectations * validate them * do _not_ change at runtime, but tell people what's wrong Some operations have one and only one "right" way to be done. For those operations if we take an 'active' approach, we can implement them once and not make all of our deployers and distributors each implement and run them. However, there is a cost to that. Automatic and prescriptive behavior has a higher dev cost that is proportional to the number of supported architectures. This then implies a need to limit deployer architecture choices. That "higher dev cost" is one of my objections to the 'active' approach but it is another implication that worries me more. If we limit deployer architecture choices at the persistence layer then it seems very likely that we will be tempted to build more and more power and control into the persistence layer rather than in the so-called "business" layer. In my experience this is a recipe for ossification. The persistence layer needs to be dumb and replaceable. On the other hand, taking an 'external' approach allows us to federate the work of supporting the different architectures to the deployers. This means more work on the deployer's part, but also potentially a greater amount of freedom on their part to deploy supporting services the way they want. It means that some of the things that have been requested of us - such as easier operation and an increase in the number of things that can be upgraded with no-downtime - might become prohibitively costly for us to implement. That's not necessarily the case. Consider that in an external approach, where the persistence layer is opaque to the application, it means that third parties (downstream consumers, the market, the invisible hand, etc) have the option to do all kinds of wacky stuff. Probably avec containers™. In that model, the core functionality is simple and adequate but not deluxe. Deluxe is an after-market add on. BUT - without a decision as to what our long-term philosophical intent in this space is that is clear and understandable to everyone, we cannot have successful discussions about the impact of implementation choices, since we will not have a shared understanding of the problem space or the solutions we're talking about. Yes. For my part - I hear complaints that OpenStack is 'difficult' to operate and requests for us to make it easier. This is why I have been advocating some actions that are clearly rooted in an 'active' worldview. If OpenStack were more of a monolith instead of a system with 3 to many different databases, along with some optional number of other ways to do other kinds of (short term) persistence, I would find the 'active' model a good option. If we were to start over I'd say let's do that. But as it stands implementing actually useful 'active' management of the database feels like a very large amount of work that will take so long that by the time we complete it it will be not just out of date but also limit us. External but validate feels much more viable. What we really want is that people can get reasonably good results without trying that hard and great (but also various) results
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/21/2017 10:09 PM, Mike Bayer wrote: >> >> A similar issue lurks with the fact that MySQL unicode storage is >> 3-byte by default and 4-byte is opt-in. We could take the 'external' >> approach and document it and assume the operator has configured their >> my.cnf with the appropriate default, or taken an 'active' approach >> where we override it in all the models and make migrations to get us >> from 3 to 4 byte. > > let's force MySQL to use utf8mb4! Although I am curious what is the > actual use case we want to hit here (which gets into, zzzeek is ignorant > as to which unicode glyphs actually live in 4-byte utf8 characters). There are sets of existing CJK ideographs in the 4-byte range, and the reality is that all the world's languages are still not encoded in unicode, so more Asian languages probably land in here in the future. We've had specific bug reports in Nova on this, but it's actually a lot to dig out of because that db migration seems expensive. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] Active or passive role with our database layer
On 05/21/2017 03:38 PM, Monty Taylor wrote: documentation on the sequence of steps the operator should take. In the "active" approach, we still document expectations, but we also validate them. If they are not what we expect but can be changed at runtime, we change them overriding conflicting environmental config, and if we can't, we hard-stop indicating an unsuitable environment. Rather than providing helper tools, we perform the steps needed ourselves, in the order they need to be performed, ensuring that they are done in the manner in which they need to be done. we do this in places like tripleo. The MySQL configs and such are checked into the source tree, it includes details like innodb_file_per_table, timeouts used by haproxy, etc. I know tripleo is not like the service itself like Nova but it's also not exactly something we hand off to the operators to figure out from scratch either. We do some of it in oslo.db as well. We set things like MySQL SQL_MODE. We try to make sure the unicode-ish flags are set up and that we're using utf-8 encoding. Some examples: * Character Sets / Collations We currently enforce at testing time that all database migrations are explicit about InnoDB. We also validate in oslo.db that table character sets have the string 'utf8' in them. (only on MySQL) We do not have any check for case-sensitive or case-insensitive collations (these affect sorting and comparison operations) Because we don't, different server config settings or different database backends for different clouds can actually behave differently through the REST API. To deal with that: First we'd have to decide whether case sensitive or case insensitive was what we wanted. If we decided we wanted case sensitive, we could add an enforcement of that in oslo.db, and write migrations to get from case insensitive indexes to case sensitive indexes on tables where we detected that a case insensitive collation had been used. If we decided we wanted to stick with case insensitive we could similarly add code to enforce it on MySQL. To enforce it actively on PostgresSQL, we'd need to either switch our code that's using comparisons to use the sqlalchemy case-insensitive versions explicitly, or maybe write some sort of overloaded driver for PG that turns all comparisons into case-insensitive, which would wrap both sides of comparisons in lower() calls (which has some indexing concerns, but let's ignore that for the moment) We could also take the 'external' approach and just document it, then define API tests and try to tie the insensitive behavior in the API to Interop Compliance. I'm not 100% sure how a db operator would remediate this - but PG has some fancy computed index features - so maybe it would be possible. let's make the case sensitivity explicitly enforced! A similar issue lurks with the fact that MySQL unicode storage is 3-byte by default and 4-byte is opt-in. We could take the 'external' approach and document it and assume the operator has configured their my.cnf with the appropriate default, or taken an 'active' approach where we override it in all the models and make migrations to get us from 3 to 4 byte. let's force MySQL to use utf8mb4! Although I am curious what is the actual use case we want to hit here (which gets into, zzzeek is ignorant as to which unicode glyphs actually live in 4-byte utf8 characters). * Schema Upgrades The way you roll out online schema changes is highly dependent on your database architecture. Just limiting to the MySQL world: If you do Galera, you can do roll them out in Total Order or Rolling fashion. Total Order locks basically everything while it's happening, so isn't a candidate for "online". In rolling you apply the schema change to one node at a time. If you do that, the application has to be able to deal with both forms of the table, and you have to deal with ensuring that data can replicate appropriately while the schema change is happening. Galera replicates DDL operations. If I add a column on a node, it pops up on the other nodes too in a similar way as transactions are replicated, e.g. nearly synchronous. I would *assume* it has to do this in the context of it's usual transaction ordering, even though MySQL doesn't do transactional DDL, so that if the cluster sees transaction A, schema change B, transaction C that depends on B, that ordering is serialized appropriately.However, even if it doesn't do that, the rolling upgrades we do don't start the services talking to the new schema structures until the DDL changes are complete, and Galera is near-synchronous replication. Also speaking to the "active" question, we certainly have all kinds of logic in Openstack (the optimistic update strategy in particular) that take "Galera" into account. And of course we have Galera config inside of tripleo. So that's kind of the "active" approach, I think. If you do DRBD
[openstack-dev] [tc] Active or passive role with our database layer
Hi all! As the discussion around PostgreSQL has progressed, it has come clear to me that there is a decently deep philosophical question on which we do not currently share either definition or agreement. I believe that the lack of clarity on this point is one of the things that makes the PostgreSQL conversation difficult. I believe the question is between these two things: * Should OpenStack assume the existence of an external database service that it treat as an black-box on the other side of a connection string? * Should OpenStack take an active and/or opinionated role in managing the database service? A potentially obvious question about that (asked by Mike Bayer in a different thread) is: "what do you mean by managing?" What I mean by managing is doing all of the things you can do related to database operational controls short of installing the software, writing the basic db config files to disk and stopping and starting the services. It means being much more prescriptive about what types of config we support, validating config settings that cannot be overridden at runtime and refusing to operate if they are unworkable. Why would we want to be 'more active'? When managing and tuning databases, there are some things that are driven by the environment and some things that are driven by the application. Things that are driven by the environment include things like the amount of RAM actually available, whether or not the machines running the database are dedicated or shared, firewall settings, selinux settings and what versions of software are available. Things that are driven by the application are things like character set and collation, schema design, data types, schema upgrade and HA strategies. One might argue that HA strategies are an operator concern, but in reality the set of workable HA strategies is tightly constrained by how the application works, and the pairing an application expecting one HA strategy with a deployment implementing a different one can have negative results ranging from unexpected downtime to data corruption. For example: An HA strategy using slave promotion and a VIP that points at the current write master paired with an application incorrectly configured to do such a thing can lead to writes to the wrong host after a failover event and an application that seems to be running fine until the data turns up weird after a while. For the areas in which the characteristics of the database are tied closely to the application behavior, there is a constrained set of valid choices at the database level. Sometimes that constrained set only has one member. The approach to those is what I'm talking about when I ask the question about "external" or "active". In the "external" approach, we document the expectations and then write the code assuming that the database is set up appropriately. We may provide some helper tools, such as 'nova-manage db sync' and documentation on the sequence of steps the operator should take. In the "active" approach, we still document expectations, but we also validate them. If they are not what we expect but can be changed at runtime, we change them overriding conflicting environmental config, and if we can't, we hard-stop indicating an unsuitable environment. Rather than providing helper tools, we perform the steps needed ourselves, in the order they need to be performed, ensuring that they are done in the manner in which they need to be done. Some examples: * Character Sets / Collations We currently enforce at testing time that all database migrations are explicit about InnoDB. We also validate in oslo.db that table character sets have the string 'utf8' in them. (only on MySQL) We do not have any check for case-sensitive or case-insensitive collations (these affect sorting and comparison operations) Because we don't, different server config settings or different database backends for different clouds can actually behave differently through the REST API. To deal with that: First we'd have to decide whether case sensitive or case insensitive was what we wanted. If we decided we wanted case sensitive, we could add an enforcement of that in oslo.db, and write migrations to get from case insensitive indexes to case sensitive indexes on tables where we detected that a case insensitive collation had been used. If we decided we wanted to stick with case insensitive we could similarly add code to enforce it on MySQL. To enforce it actively on PostgresSQL, we'd need to either switch our code that's using comparisons to use the sqlalchemy case-insensitive versions explicitly, or maybe write some sort of overloaded driver for PG that turns all comparisons into case-insensitive, which would wrap both sides of comparisons in lower() calls (which has some indexing concerns, but let's ignore that for the moment) We could also take the 'external' approach and just document it, then