Excerpts from Amrith Kumar's message of 2017-07-12 06:14:28 -0500: > All: > > First, let me thank all of you who responded and provided feedback > on what I wrote. I've summarized what I heard below and am posting > it as one consolidated response rather than responding to each > of your messages and making this thread even deeper. > > As I say at the end of this email, I will be setting up a session at > the Denver PTG to specifically continue this conversation and hope > you will all be able to attend. As soon as time slots for PTG are > announced, I will try and pick this slot and request that you please > attend. > > ---- > > Thierry: naming issue; call it Hoard if it does not have a migration > path. > > ---- > > Kevin: use a container approach with k8s as the orchestration > mechanism, addresses multiple issues including performance. Trove to > provide containers for multiple components which cooperate to provide > a single instance of a database or cluster. Don't put all components > (agent, monitoring, database) in a single VM, decoupling makes > migraiton and upgrades easier and allows trove to reuse database > vendor supplied containers. Performance of databases in VM's poor > compared to databases on bare-metal. > > ---- > > Doug Hellmann: > > > Does "service VM" need to be a first-class thing? Akanda creates > > them, using a service user. The VMs are tied to a "router" which is > > the billable resource that the user understands and interacts with > > through the API. > > Amrith: Doug, yes because we're looking not just for service VM's but all > resources provisioned by a service. So, to Matt's comment about a > blackbox DBaaS, the VM's, storage, snapshots, ... they should all be > owned by the service, charged to a users quota but not visible to the > user directly.
I still don't understand. If you have entities that represent the DBaaS "host" or "database" or "database backup" or whatever, then you put a quota on those entities and you bill for them. If the database actually runs in a VM or the backup is a snapshot, those are implementation details. You don't want to have to rewrite your quota management or billing integration if those details change. Doug > > ---- > > Jay: > > > Frankly, I believe all of these types of services should be built > > as applications that run on OpenStack (or other) > > infrastructure. In other words, they should not be part of the > > infrastructure itself. > > > > There's really no need for a user of a DBaaS to have access to the > > host or hosts the DB is running on. If the user really wanted > > that, they would just spin up a VM/baremetal server and install > > the thing themselves. > > and subsequently in follow-up with Zane: > > > Think only in terms of what a user of a DBaaS really wants. At the > > end of the day, all they want is an address in the cloud where they > > can point their application to write and read data from. > > ... > > At the end of the day, I think Trove is best implemented as a hosted > > application that exposes an API to its users that is entirely > > separate from the underlying infrastructure APIs like > > Cinder/Nova/Neutron. > > Amrith: Yes, I agree, +1000 > > ---- > > Clint (in response to Jay's proposal regarding the service making all > resources multi-tenant) raised a concern about having multi-tenant > shared resources. The issue is with ensuring separation between > tenants (don't want to use the word isolation because this is database > related). > > Amrith: yes, definitely a concern and one that we don't have today > because each DB is a VM of its own. Personally, I'd rather stick with > that construct, one DB per VM/container/baremetal and leave that be > the separation boundary. > > ---- > > Zane: Discomfort over throwing out working code, grass is greener on > the other side, is there anything to salvage? > > Amrith: Yes, there is certainly a 'grass is greener with a rewrite' > fallacy. But, there is stuff that can be salvaged. The elements are > still good, they are separable and can be used with the new > project. Much of the controller logic however will fall by the > wayside. > > In a similar vein, Clint asks about the elements that Trove provides, > "how has that worked out". > > Amrith: Honestly, not well. Trove only provided reference elements > suitable for development use. Never really production hardened > ones. For example, the image elements trove provides don't bake the > guest agent in; they assume that at VM launch, the guest agent code > will be slurped (technical term) from the controller and > launched. Great for debugging, not great for production. That is > something that should change. But, equally, I've heard disagreements > saying that slurping the guest agent at runtime is clever and good > in production. > > ---- > > Zane: consider using Mistral for workflow. > > > The disadvantage, obviously, is that it requires the cloud to offer > > Mistral as-a-Service, which currently doesn't include nearly as many > > clouds as I'd like. > > Amrith: Yes, as we discussed, we are in agreement with both parts of > this recommendation. > > Zane, Jay and Dims: a subtle distinction between Tessmaster and Magnum > (I want a database figure out the lower layers, vs. I want a k8s > cluster). > > ---- > > Zane: Fun fact: Trove started out as a *complete fork* of Nova(!). > > Amrith: Not fun at all :) Never, ever, ever, ever f5g do that > again. Yeah, sure, if you can have i18n, and k8s, I can have f5g :) > > ---- > > Thierry: > > > We generally need to be very careful about creating dependencies > > between OpenStack projects. > > ... > > I understand it's a hard trade-off: you want to reuse functionality > > rather than reinvent it in every project... we just need to > > recognize the cost of doing that. > > Amrith: Yes, this is part of my concern re: Mistral, and earlier in > trove's life on depending on Manila for Oracle RAC. Clint raised a > similar concern about the dependency on Heat. > > In response, Kevin: > > > That view of dependencies is why Kubernetes development is outpacing > > OpenStacks and some users are leaving IMO. Not trying to be mean > > here but trying to shine some light on this issue. > > I disagree, but that's a topic for another email thread and maybe not > even an email thread but an in-person conversation with suitable > beverages. It is a religious discussion which is best handled in a > different forum; such as the emacs-vi forum. > > ---- > > I wrote: > > > - A guest agent running on a tenant instance, with connectivity to a > > shared management message bus is a security loophole; encrypting > > traffic, per-tenant-passwords, and any other scheme is merely > > lipstick on a security hole > > Clint asks: > > This is a broad statement, and I'm not sure I understand the actual > risk you're presenting here as "a security loophole". > > How else would you administer a database server than through some > kind of agent? Whether that agent is a python daemon of our making, > sshd, or whatever kubernetes component lets you change things, > they're all administrative pieces that sit next to the resource. > > Amrith: The issue is that the guest agent (currently) running in a > tenants context needs to establish a connection to a shared rabbitmq > server running in the service (control plane) context. I am fine with > a guest agent running in the control plan establishing a connection > into a guest VM if required, not the other way around. > > ---- > > Clint makes a distinction between a database cluster within an > OpenStack deployment and an uber database cluster spanning clouds, > recommending that the latter is best left to a tertiary > orchestrator. Further, these are two distinct things, pick one and do > it well. > > Amrith: A valid approach and one that will allow Trove to focus on the > high value single OpenStack deployment of a db cluster (and to Jay's > point, do it well). > > ---- > > Consensus: > > Trove should expose (what Matt Fischer calls) BlackBox DB, not storage + > compute. > > Address rabbitmq security concerns differently; move guest agent off > instance. > > Don't reinvent the orchestration piece. > > Fewer DB's better support > > Clusters are first class citizens, not an afterthought > > Clusters spanning regions and openstack deployments > > Restart the service VM's discussion: > https://review.openstack.org/#/c/438134/ > > ---- > > Several people emailed me privately and said they (or their companies) > would like to invest resources in Trove. Some indicated that they (or > their companies) would like to invest resources in Trove if the > commitment was towards a certain direction or technology choice. > Others have offered resources if the direction would be to provide > an AWS compatible API. > > To anyone who wants to contribute resources to a project, please do > it. Big companies considering contributing one or two people to a > project and making it seem like a big decision is really an indication > of a lack of seriousness. If the project is really valuable to you, > you'd have put people on it already. The fact that you haven't speaks > volumes. > > To those who want to place pre-conditions on technology choice, I have > no (good) words for you. > > Thanks to all who participated, I appreciate all the input. I will be > setting up a session at the Denver PTG to specifically continue this > conversation and hope you will all be able to attend. As soon as time > slots for PTG are announced, I will try and pick this slot and request > that you please attend. > > Thanks, > > -amrith > > > > > On Sun, Jun 18, 2017 at 6:35 AM, Amrith Kumar <amrith.ku...@gmail.com> > wrote: > > > Trove has evolved rapidly over the past several years, since integration > > in IceHouse when it only supported single instances of a few databases. > > Today it supports a dozen databases including clusters and replication. > > > > The user survey [1] indicates that while there is strong interest in the > > project, there are few large production deployments that are known of (by > > the development team). > > > > Recent changes in the OpenStack community at large (company realignments, > > acquisitions, layoffs) and the Trove community in particular, coupled with > > a mounting burden of technical debt have prompted me to make this proposal > > to re-architect Trove. > > > > This email summarizes several of the issues that face the project, both > > structurally and architecturally. This email does not claim to include a > > detailed specification for what the new Trove would look like, merely the > > recommendation that the community should come together and develop one so > > that the project can be sustainable and useful to those who wish to use it > > in the future. > > > > TL;DR > > > > Trove, with support for a dozen or so databases today, finds itself in a > > bind because there are few developers, and a code-base with a significant > > amount of technical debt. > > > > Some architectural choices which the team made over the years have > > consequences which make the project less than ideal for deployers. > > > > Given that there are no major production deployments of Trove at present, > > this provides us an opportunity to reset the project, learn from our v1 and > > come up with a strong v2. > > > > An important aspect of making this proposal work is that we seek to > > eliminate the effort (planning, and coding) involved in migrating existing > > Trove v1 deployments to the proposed Trove v2. Effectively, with work > > beginning on Trove v2 as proposed here, Trove v1 as released with Pike will > > be marked as deprecated and users will have to migrate to Trove v2 when it > > becomes available. > > > > While I would very much like to continue to support the users on Trove v1 > > through this transition, the simple fact is that absent community > > participation this will be impossible. Furthermore, given that there are no > > production deployments of Trove at this time, it seems pointless to build > > that upgrade path from Trove v1 to Trove v2; it would be the proverbial > > bridge from nowhere. > > > > This (previous) statement is, I realize, contentious. There are those who > > have told me that an upgrade path must be provided, and there are those who > > have told me of unnamed deployments of Trove that would suffer. To this, > > all I can say is that if an upgrade path is of value to you, then please > > commit the development resources to participate in the community to make > > that possible. But equally, preventing a v2 of Trove or delaying it will > > only make the v1 that we have today less valuable. > > > > We have learned a lot from v1, and the hope is that we can address that in > > v2. Some of the more significant things that I have learned are: > > > > - We should adopt a versioned front-end API from the very beginning; > > making the REST API versioned is not a ‘v2 feature’ > > > > - A guest agent running on a tenant instance, with connectivity to a > > shared management message bus is a security loophole; encrypting traffic, > > per-tenant-passwords, and any other scheme is merely lipstick on a security > > hole > > > > - Reliance on Nova for compute resources is fine, but dependence on Nova > > VM specific capabilities (like instance rebuild) is not; it makes things > > like containers or bare-metal second class citizens > > > > - A fair portion of what Trove does is resource orchestration; don’t > > reinvent the wheel, there’s Heat for that. Admittedly, Heat wasn’t as far > > along when Trove got started but that’s not the case today and we have an > > opportunity to fix that now > > > > - A similarly significant portion of what Trove does is to implement a > > state-machine that will perform specific workflows involved in implementing > > database specific operations. This makes the Trove taskmanager a stateful > > entity. Some of the operations could take a fair amount of time. This is a > > serious architectural flaw > > > > - Tenants should not ever be able to directly interact with the underlying > > storage and compute used by database instances; that should be the default > > configuration, not an untested deployment alternative > > > > - The CI should test all databases that are considered to be ‘supported’ > > without excessive use of resources in the gate; better code modularization > > will help determine the tests which can safely be skipped in testing changes > > > > - Clusters should be first class citizens not an afterthought, single > > instance databases may be the ‘special case’, not the other way around > > > > - The project must provide guest images (or at least complete tooling for > > deployers to build these); while the project can’t distribute operating > > systems and database software, the current deployment model merely impedes > > adoption > > > > - Clusters spanning OpenStack deployments are a real thing that must be > > supported > > > > This might sound harsh, that isn’t the intent. Each of these is the > > consequence of one or more perfectly rational decisions. Some of those > > decisions have had unintended consequences, and others were made knowing > > that we would be incurring some technical debt; debt we have not had the > > time or resources to address. Fixing all these is not impossible, it just > > takes the dedication of resources by the community. > > > > I do not have a complete design for what the new Trove would look like. > > For example, I don’t know how we will interact with other projects (like > > Heat). Many questions remain to be explored and answered. > > > > Would it suffice to just use the existing Heat resources and build > > templates around those, or will it be better to implement custom Trove > > resources and then orchestrate things based on those resources? > > > > Would Trove implement the workflows required for multi-stage database > > operations by itself, or would it rely on some other project (say Mistral) > > for this? Is Mistral really a workflow service, or just cron on steroids? I > > don’t know the answer but I would like to find out. > > > > While we don’t have the answers to these questions, I think this is a > > conversation that we must have, one that we must decide on, and then as a > > community commit the resources required to make a Trove v2 which delivers > > on the mission of the project; “To provide scalable and reliable Cloud > > Database as a Service provisioning functionality for both relational and > > non-relational database engines, and to continue to improve its > > fully-featured and extensible open source framework.”[2] > > > > Thanks, > > > > -amrith > > > > > > [1] https://www.openstack.org/assets/survey/April2017SurveyReport.pdf > > [2] https://wiki.openstack.org/wiki/Trove#Mission_Statement > > > > > > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev