Re: [openstack-dev] [TripleO] Should we have a TripleO API, or simply use Mistral?

Jiří Stránský Wed, 20 Jan 2016 02:07:37 -0800

On 18.1.2016 19:49, Tzu-Mainn Chen wrote:

----- Original Message -----

On Thu, 2016-01-14 at 16:04 -0500, Tzu-Mainn Chen wrote:


----- Original Message -----

On Wed, Jan 13, 2016 at 04:41:28AM -0500, Tzu-Mainn Chen wrote:

Hey all,

I realize now from the title of the other TripleO/Mistral thread
[1] that
the discussion there may have gotten confused.  I think using
Mistral for
TripleO processes that are obviously workflows - stack
deployment, node
registration - makes perfect sense.  That thread is exploring
practicalities
for doing that, and I think that's great work.

What I inappropriately started to address in that thread was a
somewhat
orthogonal point that Dan asked in his original email, namely:

"what it might look like if we were to use Mistral as a
replacement for the
TripleO API entirely"

I'd like to create this thread to talk about that; more of a
'should we'
than 'can we'.  And to do that, I want to indulge in a thought
exercise
stemming from an IRC discussion with Dan and others.  All, please
correct
me
if I've misstated anything.

The IRC discussion revolved around one use case: deploying a Heat
stack
directly from a Swift container.  With an updated patch, the Heat
CLI can
support this functionality natively.  Then we don't need a
TripleO API; we
can use Mistral to access that functionality, and we're done,
with no need
for additional code within TripleO.  And, as I understand it,
that's the
true motivation for using Mistral instead of a TripleO API:
avoiding custom
code within TripleO.

That's definitely a worthy goal... except from my perspective,
the story
doesn't quite end there.  A GUI needs additional functionality,
which boils
down to: understanding the Heat deployment templates in order to
provide
options for a user; and persisting those options within a Heat
environment
file.

Right away I think we hit a problem.  Where does the code for
'understanding
options' go?  Much of that understanding comes from the
capabilities map
in tripleo-heat-templates [2]; it would make sense to me that
responsibility
for that would fall to a TripleO library.

Still, perhaps we can limit the amount of TripleO code.  So to
give API
access to 'getDeploymentOptions', we can create a Mistral
workflow.

   Retrieve Heat templates from Swift -> Parse capabilities map

Which is fine-ish, except from an architectural perspective
'getDeploymentOptions' violates the abstraction layer between
storage and
business logic, a problem that is compounded because
'getDeploymentOptions'
is not the only functionality that accesses the Heat templates
and needs
exposure through an API.  And, as has been discussed on a
separate TripleO
thread, we're not even sure Swift is sufficient for our needs;
one possible
consideration right now is allowing deployment from templates
stored in
multiple places, such as the file system or git.


Actually, that whole capabilities map thing is a workaround for a
missing
feature in Heat, which I have proposed, but am having a hard time
reaching
consensus on within the Heat community:

https://review.openstack.org/#/c/196656/

Given that is a large part of what's anticipated to be provided by
the
proposed TripleO API, I'd welcome feedback and collaboration so we
can move
that forward, vs solving only for TripleO.

Are we going to have duplicate 'getDeploymentOptions' workflows
for each
storage mechanism?  If we consolidate the storage code within a
TripleO
library, do we really need a *workflow* to call a single
function?  Is a
thin TripleO API that contains no additional business logic
really so bad
at that point?


Actually, this is an argument for making the validation part of the
deployment a workflow - then the interface with the storage
mechanism
becomes more easily pluggable vs baked into an opaque-to-operators
API.

E.g, in the long term, imagine the capabilities feature exists in
Heat, you
then have a pre-deployment workflow that looks something like:

1. Retrieve golden templates from a template store
2. Pass templates to Heat, get capabilities map which defines
features user
must/may select.
3. Prompt user for input to select required capabilites
4. Pass user input to Heat, validate the configuration, get a
mapping of
required options for the selected capabilities (nested validation)
5. Push the validated pieces ("plan" in TripleO API terminology) to
a
template store

This is a pre-deployment validation workflow, and it's a superset
of the
getDeploymentOptions feature you refer to.

Historically, TripleO has had a major gap wrt workflow, meaning
that we've
always implemented it either via shell scripts (tripleo-incubator)
or
python code (tripleo-common/tripleo-client, potentially TripleO
API).

So I think what Dan is exploring is, how do we avoid reimplementing
a
workflow engine, when a project exists which already does that.

My gut reaction is to say that proposing Mistral in place of a
TripleO API
is to look at the engineering concerns from the wrong
direction.  The
Mistral alternative comes from a desire to limit custom TripleO
code at all
costs.  I think that is an extremely dangerous attitude that
leads to
compromises and workarounds that will quickly lead to a shaky
code base
full of design flaws that make it difficult to implement or
extend any
functionality cleanly.


I think it's not about limiting TripleO code at all costs, it's
about
learning from past mistakes, where long-term TripleO specific
workarounds
for gaps in other projects have become serious technical debt.

For example, the old merge.py approach to template composition was
a
workaround for missing heat features, then Tuskar was another
workaround
(arguably) for missing heat features, and now we're again proposing
a
long-term workaround for some missing heat features, some of which
are
already proposed (referring to the API for capabilities
resolution).


This is an important point, thanks for bringing it up!

I think that I might have a different understanding of the lessons to
be
learned from Tuskar's limitations.  There were actually two issues
that
arose.  The first was that Tuskar was far too specific in how it
tried to
manipulated Heat pieces.  The second - and more serious, from my
point of
view - was that there literally was no way for an API-based GUI to
perform the tasks it needed to in order to do the correct
manipulation
(environment selection), because there was no Heat API in place for
doing
so.

My takeaway from the first issue was that any potential TripleO API
in
the future needed to be very low-level, a light skimming on top of
the
OpenStack services it uses.  The plan creation process that the
tripleo-common library spec describes is that: it's just a couple of
methods designed to allow a user to create an environment file, which
can then be used for deploying the overcloud.

My takeaway from the second issue was a bit more complicated.  A
required feature was missing, and although the proper functionality
needed to enable it in Heat was identified, it was unclear (and
remains
unclear) whether that feature truly belonged in Heat.  What does a
GUI
do then?  The GUI could take a cycle off, which is essentially what
happened here; I don't think that's a reasonable solution.  We could
hope that we arrive at a 100% foolproof and immutable deployment
solution
in the future, arriving at a point where no new features would ever
be
needed; I don't think that's a practical hope.

The third solution that came to mind was the idea of creating the
TripleO API.  It gives us a place to add in missing features if
needed.
And I think it also gives us a useful layer of indirection.  The
consumers of TripleO want a stable API, so that a new release doesn't
force them to do a massive update of their code; the TripleO API
would
provide that, allowing us to switch code behind the scenes (say, if
the capabilities feature lands in Heat).


I think the above example would work equally well in a generic workflow
sort of tool. You could image that the inputs to the workflow remain
the same... but rather than running our own code in some interim step
we simply call Heat directly for the capabilities map feature.

So regardless of whether we build our own API or use a generic workflow
too I think we still have what I would call a "release valve" to let us
inject some custom code (actions) into the workflow. Like we discussed
last week on IRC I would like to minimize the number of custom actions
we have (with an eye towards things living in the upstream OpenStack
projects) but it is fine to do this either way and would work equally
well w/ Mistral and TripleO API.


I think I kinda view TripleO as a 'best practices' project.  Using
OpenStack is a confusing experience, with a million different options
and choices to make.  TripleO provides users with an excellent guide.
But the problem is that best practices change, and I think that
perceived instability is dangerous for adoption of TripleO.

So having a TripleO library and its associated API be a 'best
practices'
library makes sense to me.  It gives consumers a stable platform upon
which to use TripleO, while allowing us to be flexible behind the
scenes.
The 'best practice' for Heat capabilities right now is a workaround,
because it hasn't been judged to be suitable to go into Heat itself.
If that changes, we get to shift as well - and all of these changes
are
invisible to the API consumer.



I mentioned this in my "Driving workflows with Mistral" thread but with
regards to stability I view say Heat's v1 API or Mistral's v2 API as
both being way more stable that what we could ever achieve with TripleO
API. The real trick to API stability with something like Heat or
Mistral is how we manage the inputs and outputs to Stacks and Workflows
themselves. So long as we are mindful of this I can't image an end user
(say a GUI writer or whoever) would really care whether they POST to
Mistral or something we've created. The nice thing about using other
OpenStack projects like Heat or Mistral is that they very likely have
better community and documentation around these things as well that we
would ever have.

The more I look at using Mistral for some of the cases that have been
brought up the more it seems to make sense for a lot of the workflows
we need. I don't believe we can achieve better stability by creating
what sounds more and more like a shim/proxy API rather than using the
versioned API's that OpenStack already provides.

There may be some corner cases where a "GUI helper" API comes into play
for some sort of caching or something. I'm not blocking anyone from
creating these sorts of features if they need them. And again if it is
something that could be added to an upstream OpenStack project like
Heat or Mistral I would look there first. So perhaps Zaqar for
websockets instead of rolling our own, this sort of thing.

What does concern me is that we are overstating what TripleO API should
actually contain should we choose to pursue it. Initially it was
positioned as the "TripleO workflow API". I think we now agree that we
probably shouldn't put all of our workflows behind it. So if our stance
has changed would it make sense to compile a new list of what we
believe belongs behind our own TripleO API vs. what we consider
workflows.



I wonder if it would be helpful to get operator feedback here - show them
  the advantages/disadvantages of both options and to get a sense of what
might be useful/necessary for them to use TripleO effectively?

(I'm going off on a tangent a bit, but please bear with me, i'm usingall that to support the point in the end. The implications of building aTripleO API touch on various topics.)

Yes i think we should gather operator feedback. We already got some, butwe should gather more whenever possible.

One kind of (negative) feedback i've heard is that overcloud managementis too much of a "blackbox" compared to what operators are used to. Thefeedback i recall was that it's hard to tell what is going to happenwhen running an overcloud stack update, and that we cannot re-executethe software config management independently.

Building another umbrella API to rule the already largely umbrella-likedeployment process (think what all responsibilities lie within thetripleo-heat-templates codebase, and within the single 'overcloud' Heatstack) would probably make matters more blackboxy and go further in thedirection of "i feel like i don't know what's happening to my cloud wheni use the management tool".

What i think could improve the situation for operators is trying tochunk up what we already have into smaller, more independently operableparts. The split-stack approach already discussed on the TripleO meetingand on #tripleo could help with this. Essentially separating ourhardware management from our software config management. Being able tore-apply software configuration without being afraid of having nodesaccidentally re-provisioned from scratch.

In general i think TripleO could be a little more "UNIXy" - composed ofsmaller parts that make sense on their own, transparent to the operator,more modular and modifiable, and in effect more receptive of how varyingare the real world deployment environments (various Neutron and Cinderplugins, Keystone backends, composable set of services, custom nodetypes etc.).

Workflow persisted in a data-like fashion is probably more modifiable bythe operator than Python code of a REST API. We've seen hard assumptionscause problems in the past. (Think the unoverridable CLI parametersissue we used to have, and how we had to move to a model of "CLIprovides its values, but you can always override them or provideadditional ones with an environment file if needed", which we now useextensively). I'm a bit concerned that building a new REST API on top ofeverything would impose new rigid assumptions that could cause more harmthan good in the end. I'm concerned that it would be usable only forvery basic deployments, while the world of real deployments has its ownpace and requirements not fitting the "best practices" as defined by theAPI, having to bypass the API far too often and slowly pushing it intoabandonment over time.

My mind is probably biased towards the the operator feedback thatresonated with me the most, i've heard pro-blackbox opinions too (thoughnot from operators yet IIRC). So take what i wrote just as my 2 cents,but i think it's necessary to consider the above issues when thinkingabout the implications of building a TripleO API.

Regarding the non-workflow kind of features we need for empowering GUI,wouldn't those be useful for normal (tenant) Heat stack deployments inthe overcloud too? It sounds to me that features like "driving a Heatstack deployment with the same powers from CLI or GUI", "updating aCLI-created stack from GUI and vice versa", "understanding/parsing whatare the configuration options of my Heat templates" are all featuresthat are not specific to TripleO, and could be useful for tenant Heatstacks too. So perhaps these should be implemented in Heat? If thatcan't happen fast enough, then we might need to put some workarounds inplace for now, but it might be better if we didn't advertise those as astable solution.



Jirka


Mainn

Dan


Mainn

I think the correct attitude is to simply look at the problem
we're
trying to solve and find the correct architecture.  For these
get/set
methods that the API needs, it's pretty simple: storage -> some
logic ->
a REST API.  Adding a workflow engine on top of that is unneeded,
and I
believe that means it's an incorrect solution.


What may help is if we can work through the proposed API spec, and
identify which calls can reasonably be considered workflows vs
those where
it's really just proxying an API call with some logic?

When we have a defined list of "not workflow" API requirements,
it'll
probably be much easier to rationalize over the value of a bespoke
API vs
mistral?


Steve

___________________________________________________________________
_______
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_____________________________________________________________________
_____
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs
cribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Should we have a TripleO API, or simply use Mistral?

Reply via email to