The etcd operator is an awesome addition for the stability of any service 
that relies on it, including k8s itself. If etcd is safe, the cluster is 
safe.


Since etcd can run outside kubernetes, the problem of keeping etcd safe is 
really an independent problem from kubernetes. The project I'm working on 
(Rook) depends on etcd and has a requirement to run in both a kubernetes 
environment and a standalone environment. We have started implementing what 
amounts to a *very* basic etcd operator that will manage the health of the 
etcd cluster, but want to replace it with your much more complete operator. 
We would benefit now and going forward from the etcd operator.


What would it take to factor out the management of etcd from the dependency 
on kubernetes? Looking at the code, it seems we could define an interface, 
or interfaces, that define how the operator interacts with a generalized 
cluster. Methods such as "enumerate etcd members ", "start instance", "stop 
instance", and other operations that kubernetes takes care of. The etcd 
operator would become a library to be used by different types of clusters. 
In different environments where etcd runs, the clusters would benefit from 
a common implementation of monitoring etcd health, growing/shrinking the 
membership, backup/restore, and more. 


This means that all references to kubernetes would be factored out to a new 
package. For the k8s scenario, the etcd-operator would be initialized with 
the kubernetes cluster implementation. In the Rook scenario, the etcd 
operator would be initialized with the Rook cluster implementation.


Any reason the operator couldn't run outside kubernetes given this 
abstraction? 


Another level of abstraction to consider is the operator pattern. In our 
clusters, we effectively have a Ceph operator that manages the distributed 
storage subsystems. Currently the etcd and prometheus operators don't 
appear to share any common operator library.  Is there a planned operator 
library or is the k8s management all they are expected to have in common? 
Perhaps this abstraction would become obvious with the other refactoring 
suggested for etcd, but it might be different. Thoughts on this? 


Thanks!

Travis Nielsen

https://github.com/rook/rook

Reply via email to