[openstack-dev] [TripleO][Kolla][Heat][Higgins][Magnum][Kuryr] Gap analysis: Heat as a k8s orchestrator

Zane Bitter Fri, 27 May 2016 15:34:05 -0700

I spent a bit of time exploring the idea of using Heat as an externalorchestration layer on top of Kubernetes - specifically in the case ofTripleO controller nodes but I think it could be more generally usefultoo - but eventually came to the conclusion it doesn't work yet, andprobably won't for a while. Nevertheless, I think it's helpful todocument a bit to help other people avoid going down the same path, andalso to help us focus on working toward the point where it _is_possible, since I think there are other contexts where it would beuseful too.

We tend to refer to Kubernetes as a "Container Orchestration Engine" butit does not actually do any orchestration, unless you count juststarting everything at roughly the same time as 'orchestration'. Which Iwouldn't. You generally handle any orchestration requirements betweenservices within the containers themselves, possibly using externalservices like etcd to co-ordinate. (The Kubernetes project refer to thisas "choreography", and explicitly disclaim any attempt at orchestration.)

What Kubernetes *does* do is more like an actively-managed version ofHeat's SoftwareDeploymentGroup (emphasis on the _Group_). Brief recap:SoftwareDeploymentGroup is a type of ResourceGroup; you give it a map ofresource names to server UUIDs and it creates a SoftwareDeployment foreach server. You have to generate the list of servers somehow to give it(the easiest way is to obtain it from the output of anotherResourceGroup containing the servers). If e.g. a server goes down youhave to detect that externally, and trigger a Heat update that removesit from the templates, redeploys a replacement server, and regeneratesthe server list before a replacement SoftwareDeployment is created. Inconstrast, Kubernetes is running on a cluster of servers, can use rulesto determine where to run containers, and can very quickly redeploywithout external intervention in response to a server or containerfalling over. (It also does rolling updates, which Heat can also doalbeit in a somewhat hacky way when it comes to SoftwareDeployments -which we're planning to fix.)

So this seems like an opportunity: if the dependencies between servicescould be encoded in Heat templates rather than baked into the containersthen we could use Heat as the orchestration layer following thedependency-based style I outlined in [1]. (TripleO is already moving inthis direction with the way that composable-roles usesSoftwareDeploymentGroups.) One caveat is that fully using this stylelikely rules out for all practical purposes the current Pacemaker-basedHA solution. We'd need to move to a lighter-weight HA solution, but Iknow that TripleO is considering that anyway.

What's more though, assuming this could be made to work for a Kubernetescluster, a couple of remappings in the Heat environment file should getyou an otherwise-equivalent single-node non-HA deployment basically forfree. That's particularly exciting to me because there are definitelydeployments of TripleO that need HA clustering and deployments thatdon't and which wouldn't want to pay the complexity cost of runningKubernetes when they don't make any real use of it.

So you'd have a Heat resource type for the controller cluster that mapsto either an OS::Nova::Server or (the equivalent of) an OS::Magnum::Bay,and a bunch of software deployments that map to either aOS::Heat::SoftwareDeployment that calls (I assume) docker-composedirectly or a Kubernetes Pod resource to be named later.

The first obstacle is that we'd need that Kubernetes Pod resource inHeat. Currently there is no such resource type, and the OpenStack APIthat would be expected to provide that API (Magnum's /containerendpoint) is being deprecated, so that's not a long-term solution.[2]Some folks from the Magnum community may or may not be working on aseparate project (which may or may not be called Higgins) to do that.It'd be some time away though.

An alternative, though not a good one, would be to create a Kubernetesresource type in Heat that has the credentials passed in somehow. I'mvery against that though. Heat is just not good at handling credentialsother than Keystone ones. We haven't ever created a resource type likethis before, except for the Docker one in /contrib that serves as aprime example of what *not* to do. And if it doesn't make sense to wrapan OpenStack API around this then IMO it isn't going to make any moresense to wrap a Heat resource around it.

A third option might be a SoftwareDeployment, possibly on one of thecontroller nodes themselves, that calls the k8s client. (We could createa software deployment hook to make this easy.) That would suffer fromall of the same issues that TripleO currently has about having to choosea server on which to deploy though.

The secondary obstacle is networking. TripleO has some prettycomplicated networking requirements (specifically network isolation forthe various services) that for now can't be supported when deploying acluster with Magnum. The Kuryr project is working on improved networkingfor Magnum, but I don't know whether this is a use-case that would becovered.

There's also the issue that IIUC Magnum operates its Neutron L3 agentsin such a way that connectivity to the user nodes is guaranteed only ifMagnum itself is running in an HA cloud. This is a problematicassumption in general, but it's particularly problematic in the case ofthe TripleO *undercloud*, which is not HA and which we very much do notwant to be in the networking path for the overcloud controller nodes.Again, I don't know if this will be resolved by Kuryr or when.

Magnum does offer the option to pass a custom template, and I assumethat would allow us to set up the networking the way we want it.However, TripleO uses all kinds of tricks with the environment andparameters, so there'd quite likely need to be some enhancements to bothHeat (in order to access the current environment from within a template)and Magnum (to pass an environment along with the template) to support that.

At that point it's a legitimate question to ask what exactly Magnum isbuying us if TripleO has to maintain its own Kubernetes deploymenttemplates anyway. I can think of only two things: an easier transitionlater if we do believe that the networking stuff will be resolved, andthe /containers API. And the /containers API is being deprecated.

In that sense, the Magnum/Higgins split could be a good thing for theHeat+Kubernetes use case in the long term - if we had aKeystone-authenticated API that can allow Heat to make use of any k8scluster, not just those deployed via Magnum, then Magnum could be cutout of the loop in those cases where networking issues preclude its use.

In the short term, though, there seems to be a number of obstacles.Perhaps some of the folks involved in the relevant projects couldcomment on when/if those are likely to be resolved.


cheers,
Zane.

[1]http://lists.openstack.org/pipermail/openstack-dev/2016-March/090055.html

[2]https://etherpad.openstack.org/p/newton-magnum-unified-abstraction

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO][Kolla][Heat][Higgins][Magnum][Kuryr] Gap analysis: Heat as a k8s orchestrator

Reply via email to