Excerpts from Fox, Kevin M's message of 2014-10-14 17:40:16 -0700:
I'm not arguing that everything should be managed by one systemd, I'm
just saying, for certain types of containers, a single docker container
with systemd in it might be preferable to trying to slice it unnaturally
into several containers.
Can you be more concrete? Most of the time things that need to be in
the same machine tend to have some kind of controller already. Meanwhile
it is worth noting that you can have one _image_, but several containers
running from that one image. So if you're trying to run a few pieces of
Neutron, for instance, you can have multiple containers each from that
one neutron image.
Systemd has invested a lot of time/effort to be able to relaunch failed
services, support spawning and maintaining unix sockets and services
across them, etc, that you'd have to push out of and across docker
containers. All of that can be done, but why reinvent the wheel? Like you
said, pacemaker can be made to make it all work, but I have yet to see
a way to deploy pacemaker services anywhere near as easy as systemd+yum
makes it. (Thanks be to redhat. :)
There are some of us who are rather annoyed that systemd tries to do
this in such a naive way and assumes everyone will want that kind of
management. It's the same naiveté that leads people to think if they
make their app server systemd service depend on their mysql systemd
service that this will eliminate startup problems. Once you have more
than one server, it doesn't work.
Kubernetes adds a distributed awareness of the containers that makes it
uniquely positioned to do most of those jobs much better than systemd
can.
The answer seems to be, its not "dockerish". Thats ok. I just wanted to
understand the issue for what it is. If there is a really good reason for
not wanting to do it, or that its just "not the way things are done". I've
had kind of the opposite feeling regarding docker containers. Docker use
to do very bad things when killing the container. nasty if you wanted
your database not to go corrupt. killing pid 1 is a bit sketchy then
forcing the container down after 10 seconds was particularly bad. having
something like systemd in place allows the database to be notified, then
shutdown properly. Sure you can script up enough shell to make this work,
but you have to do some difficult code, over and over again... Docker
has gotten better more recently but it still makes me a bit nervous
using it for statefull things.
What I think David was saying was that the process you want to run under
systemd is the pid 1 of the container. So if killing that would be bad,
it would also be bad to stop the systemd service, which would do the
same thing: send it SIGTERM. If that causes all hell to break loose, the
stateful thing isn't worth a dime, because it isn't crash safe.
As for recovery, systemd can do the recovery too. I'd argue at this
point in time, I'd expect systemd recovery to probably work better
then some custom shell scripts when it comes to doing the right thing
recovering at bring up. The other thing is, recovery is not just about
pid1 going away. often it sticks around and other badness is going
on. Its A way to know things are bad, but you can't necessarily rely on
it to know the container's healty. You need more robust checks for that.
I think one thing people like about Kubernetes is that when a container
crashes, and needs to be brought back up, it may actually be brought
up on a different, less busy, more healthy host. I could be wrong, or
that might be in the "FUTURE" section. But the point is, recovery and
start-up are not things that always want to happen on the same box.