On 17/07/18 10:44, Thierry Carrez wrote:
Finally found the time to properly read this...

For anybody else who found the wall of text challenging, I distilled the longest part into a blog post:

https://www.zerobanana.com/archive/2018/07/17#openstack-layer-model-limitations

Zane Bitter wrote:
[...]
We chose to add features to Nova to compete with vCenter/oVirt, and not to add features the would have enabled OpenStack as a whole to compete with more than just the compute provisioning subset of EC2/Azure/GCP.

Could you give an example of an EC2 action that would be beyond the "compute provisioning subset" that you think we should have built into Nova ?

Automatic provision/rotation of application credentials.
Reliable, user-facing event notifications.
Collection of usage data suitable for autoscaling, billing, and whatever it is that Watcher does.

Meanwhile, the other projects in OpenStack were working on building the other parts of an AWS/Azure/GCP competitor. And our vague one-sentence mission statement allowed us all to maintain the delusion that we were all working on the same thing and pulling in the same direction, when in truth we haven't been at all.

Do you think that organizing (tying) our APIs along [micro]services, rather than building a sanely-organized user API on top of a sanely-organized set of microservices, played a role in that divide ?

TBH, not really. If I were making a list of contributing factors I would probably put 'path dependence' at #1, #2 and #3.

At the start of this discussion, Jay posted on IRC a list of things that he thought shouldn't have been in the Nova API[1]:

- flavors
- shelve/unshelve
- instance groups
- boot from volume where nova creates the volume during boot
- create me a network on boot
- num_instances > 1 when launching
- evacuate
- host-evacuate-live
- resize where the user 'confirms' the operation
- force/ignore host
- security groups in the compute API
- force delete server
- restore soft deleted server
- lock server
- create backup

Some of those are trivially composable in higher-level services (e.g. boot from volume where nova creates the volume, get me a network, security groups). I agree with Jay that in retrospect it would have been cleaner to delegate those to some higher level than the Nova API (or, equivalently, for some lower-level API to exist within what is now Nova). And maybe if we'd had a top-level API like that we'd have been more aware of the ways that the lower-level ones lacked legibility for orchestration tools (oaktree is effectively an example of a top-level API like this, I'm sure Monty can give us a list of complaints ;)

But others on the list involve operations at a low level that don't appear to me to be composable out of simpler operations. (Maybe Jay has a shorter list of low-level APIs that could be combined to implement all of these, I don't know.) Once we decided to add those features, it was inevitable that they would reach right the way down through the stack to the lowest level.

There's nothing _organisational_ stopping Nova from creating an internal API (it need not even be a ReST API) for the 'plumbing' parts, with a separate layer that does orchestration-y stuff. That they're not doing so suggests to me that they don't think this is the silver bullet for managing complexity.

What would have been a silver bullet is saying 'no' to a bunch of those features, preferably starting with 'restore soft deleted server'(!!) and shelve/unshelve(?!). When AWS got feature requests like that they didn't say 'we'll have to add that in a higher-level API', they said 'if your application needs that then cloud is not for you'. We were never prepared to say that.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-06-26.log.html#t2018-06-26T15:30:33

We can decide that we want to be one, or the other, or both. But if we don't all decide together then a lot of us are going to continue wasting our time working at cross-purposes.

If you are saying that we should choose between being vCenter or AWS, I would definitely say the latter.

Agreed.

But I'm still not sure I see this issue in such a binary manner.

I don't know that it's still a viable option to say 'AWS' now. Given our installed base of users and our commitment to not breaking them, our practical choices may well be between 'vCenter' or 'both'.

It's painful because had we chosen 'AWS' at the beginning then we could have avoided the complexity hit of many of those features listed above, and spent our complexity budget on cloud features instead. Now we are locked in to supporting that legacy complexity forever, and it has reportedly maxed out our complexity budget to the point where people are reluctant to implement any cloud features, and unable to refactor to make them easier.

Astute observers will note that this is a *textbook* case of the Innovator's Dilemma.

Imagine if (as suggested above) we refactored the compute node and give it a user API, would that be one, the other, both ?

In itself, it would have no effect. But if the refactor made the code easier to maintain, it might increase the willingness to move from one to both.

Or just a sane addition to improve what OpenStack really is today: a set of open infrastructure components providing different services with each their API, with slight gaps and overlaps between them ?

If nothing else, it would make it possible for somebody (probably Jay ;) to write a simpler compute API without any legacy cruft. Then at least when the Nova API's lunch gets eaten it might be by something in OpenStack rather than something like kubevirt.

Personally, I'm not very interested in discussing what OpenStack could have been if we started building it today. I'm much more interested in discussing what to add or change in order to make it usable for more use cases while continuing to serve the needs of our existing users.

It feels strange to argue against this, because it's the exact same philosophy of bottom-up incremental change that I've pushed for many, many years.

However, I'm increasingly of the opinion that in some circumstances - particularly when some of your fundamental assumptions have changed, or you realise you had the wrong model of the problem - it's more helpful to step back and imagine how things would look if you were designing from scratch. And only _then_ look for incremental ways to get closer to that design. Skipping that step tends to lead to either (a) patchwork solutions that lack conceptual integrity, or (b) giving up and sticking with what you have. And often both, now that I think about it.

And I'm not convinced that's an either/or choice...

I said specifically that it's an either/or/and choice.

So it's not a binary choice but it's very much a ternary choice IMHO. The middle ground, where each project - or even each individual contributor within a project - picks an option independently and proceeds on the implicit assumption that everyone else chose the same option (although - spoiler alert - they didn't)... that's not a good place to be.

cheers,
Zane.

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to