Re: [openstack-dev] [tc] [all] TC Report 18-26

Zane Bitter Tue, 17 Jul 2018 18:13:54 -0700

On 17/07/18 10:44, Thierry Carrez wrote:

Finally found the time to properly read this...

For anybody else who found the wall of text challenging, I distilled thelongest part into a blog post:


https://www.zerobanana.com/archive/2018/07/17#openstack-layer-model-limitations

Zane Bitter wrote:
[...]
We chose to add features to Nova to compete with vCenter/oVirt, andnot to add features the would have enabled OpenStack as a whole tocompete with more than just the compute provisioning subset ofEC2/Azure/GCP.
Could you give an example of an EC2 action that would be beyond the"compute provisioning subset" that you think we should have built intoNova ?


Automatic provision/rotation of application credentials.
Reliable, user-facing event notifications.

Collection of usage data suitable for autoscaling, billing, and whateverit is that Watcher does.

Meanwhile, the other projects in OpenStack were working on buildingthe other parts of an AWS/Azure/GCP competitor. And our vagueone-sentence mission statement allowed us all to maintain the delusionthat we were all working on the same thing and pulling in the samedirection, when in truth we haven't been at all.
Do you think that organizing (tying) our APIs along [micro]services,rather than building a sanely-organized user API on top of asanely-organized set of microservices, played a role in that divide ?

TBH, not really. If I were making a list of contributing factors I wouldprobably put 'path dependence' at #1, #2 and #3.

At the start of this discussion, Jay posted on IRC a list of things thathe thought shouldn't have been in the Nova API[1]:


- flavors
- shelve/unshelve
- instance groups
- boot from volume where nova creates the volume during boot
- create me a network on boot
- num_instances > 1 when launching
- evacuate
- host-evacuate-live
- resize where the user 'confirms' the operation
- force/ignore host
- security groups in the compute API
- force delete server
- restore soft deleted server
- lock server
- create backup

Some of those are trivially composable in higher-level services (e.g.boot from volume where nova creates the volume, get me a network,security groups). I agree with Jay that in retrospect it would have beencleaner to delegate those to some higher level than the Nova API (or,equivalently, for some lower-level API to exist within what is nowNova). And maybe if we'd had a top-level API like that we'd have beenmore aware of the ways that the lower-level ones lacked legibility fororchestration tools (oaktree is effectively an example of a top-levelAPI like this, I'm sure Monty can give us a list of complaints ;)

But others on the list involve operations at a low level that don'tappear to me to be composable out of simpler operations. (Maybe Jay hasa shorter list of low-level APIs that could be combined to implement allof these, I don't know.) Once we decided to add those features, it wasinevitable that they would reach right the way down through the stack tothe lowest level.

There's nothing _organisational_ stopping Nova from creating an internalAPI (it need not even be a ReST API) for the 'plumbing' parts, with aseparate layer that does orchestration-y stuff. That they're not doingso suggests to me that they don't think this is the silver bullet formanaging complexity.

What would have been a silver bullet is saying 'no' to a bunch of thosefeatures, preferably starting with 'restore soft deleted server'(!!) andshelve/unshelve(?!). When AWS got feature requests like that they didn'tsay 'we'll have to add that in a higher-level API', they said 'if yourapplication needs that then cloud is not for you'. We were neverprepared to say that.

[1]http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-06-26.log.html#t2018-06-26T15:30:33

We can decide that we want to be one, or the other, or both. But if wedon't all decide together then a lot of us are going to continuewasting our time working at cross-purposes.
If you are saying that we should choose between being vCenter or AWS, Iwould definitely say the latter.


Agreed.

But I'm still not sure I see this issuein such a binary manner.

I don't know that it's still a viable option to say 'AWS' now. Given ourinstalled base of users and our commitment to not breaking them, ourpractical choices may well be between 'vCenter' or 'both'.

It's painful because had we chosen 'AWS' at the beginning then we couldhave avoided the complexity hit of many of those features listed above,and spent our complexity budget on cloud features instead. Now we arelocked in to supporting that legacy complexity forever, and it hasreportedly maxed out our complexity budget to the point where people arereluctant to implement any cloud features, and unable to refactor tomake them easier.

Astute observers will note that this is a *textbook* case of theInnovator's Dilemma.

Imagine if (as suggested above) we refactored the compute node and giveit a user API, would that be one, the other, both ?

In itself, it would have no effect. But if the refactor made the codeeasier to maintain, it might increase the willingness to move from oneto both.

Or just a saneaddition to improve what OpenStack really is today: a set of openinfrastructure components providing different services with each theirAPI, with slight gaps and overlaps between them ?

If nothing else, it would make it possible for somebody (probably Jay ;)to write a simpler compute API without any legacy cruft. Then at leastwhen the Nova API's lunch gets eaten it might be by something inOpenStack rather than something like kubevirt.

Personally, I'm not very interested in discussing what OpenStack couldhave been if we started building it today. I'm much more interested indiscussing what to add or change in order to make it usable for more usecases while continuing to serve the needs of our existing users.

It feels strange to argue against this, because it's the exact samephilosophy of bottom-up incremental change that I've pushed for many,many years.

However, I'm increasingly of the opinion that in some circumstances -particularly when some of your fundamental assumptions have changed, oryou realise you had the wrong model of the problem - it's more helpfulto step back and imagine how things would look if you were designingfrom scratch. And only _then_ look for incremental ways to get closer tothat design. Skipping that step tends to lead to either (a) patchworksolutions that lack conceptual integrity, or (b) giving up and stickingwith what you have. And often both, now that I think about it.

And I'mnot convinced that's an either/or choice...


I said specifically that it's an either/or/and choice.

So it's not a binary choice but it's very much a ternary choice IMHO.The middle ground, where each project - or even each individualcontributor within a project - picks an option independently andproceeds on the implicit assumption that everyone else chose the sameoption (although - spoiler alert - they didn't)... that's not a goodplace to be.


cheers,
Zane.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] [all] TC Report 18-26

Reply via email to