On 2/14/18 11:58 AM, Daniel Alvarez Sanchez wrote:
On Wed, Feb 14, 2018 at 5:40 AM, Brian Haley <haleyb....@gmail.com
<mailto:haleyb....@gmail.com>> wrote:
On 02/13/2018 05:08 PM, Armando M. wrote:
On 13 February 2018 at 14:02, Brent Eagles <beag...@redhat.com
<mailto:beag...@redhat.com> <mailto:beag...@redhat.com
<mailto:beag...@redhat.com>>> wrote:
Hi,
The neutron agents are implemented in such a way that key
functionality is implemented in terms of haproxy, dnsmasq,
keepalived and radvd configuration. The agents manage
instances of
these services but, by design, the parent is the top-most
(pid 1).
On baremetal this has the advantage that, while control plane
changes cannot be made while the agents are not available, the
configuration at the time the agents were stopped will work
(for
example, VMs that are restarted can request their IPs, etc). In
short, the dataplane is not affected by shutting down the
agents.
In the TripleO containerized version of these agents, the
supporting
processes (haproxy, dnsmasq, etc.) are run within the agent's
container so when the container is stopped, the supporting
processes
are also stopped. That is, the behavior with the current
containers
is significantly different than on baremetal and
stopping/restarting
containers effectively breaks the dataplane. At the moment
this is
being considered a blocker and unless we can find a
resolution, we
may need to recommend running the L3, DHCP and metadata
agents on
baremetal.
I didn't think the neutron metadata agent was affected but just the
ovn-metadata agent? Or is there a problem with the UNIX domain
sockets the haproxy instances use to connect to it when the
container is restarted?
That's right. In ovn-metadata-agent we spawn haproxy inside the
q-ovnmeta namespace
and this is where we'll find a problem if the process goes away. As you
said, neutron
metadata agent is basically receiving the proxied requests from
haproxies residing
in either q-router or q-dhcp namespaces on its UNIX socket and sending
them to Nova.
There's quite a bit to unpack here: are you suggesting that
running these services in HA configuration doesn't help either
with the data plane being gone after a stop/restart? Ultimately
this boils down to where the state is persisted, and while
certain agents rely on namespaces and processes whose ephemeral
nature is hard to persist, enough could be done to allow for a
non-disruptive bumping of the afore mentioned services.
Armando - https://review.openstack.org/#/c/542858/
<https://review.openstack.org/#/c/542858/> (if accepted) should help
with dataplane downtime, as sharing the namespaces lets them
persist, which eases what the agent has to configure on the restart
of a container (think of what the l3-agent needs to create for 1000
routers).
But it doesn't address dnsmasq being unavailable when the dhcp-agent
container is restarted like it is today. Maybe one way around that
is to run 2+ agents per network, but that still leaves a regression
from how it works today. Even with l3-ha I'm not sure things are
perfect, might wind-up with two masters sometimes.
I've seen one suggestion of putting all these processes in their own
container instead of the agent container so they continue to run, it
just might be invasive to the neutron code. Maybe there is another
option?
I had some idea based on that one to reduce the impact on neutron code
and its dependency on
containers. Basically, we would be running dnsmasq, haproxy, keepalived,
radvd, etc
in separate containers (it makes sense as they have independent
lifecycles) and we would drive
+1 for that separation
those through the docker socket from neutron agents. In order to reduce
this dependency, I
thought of having some sort of 'rootwrap-daemon-docker' which takes the
Let's please avoid using 'docker' in names, could it be rootwrap-cri or
rootwrap-engine-moby or something?
commands and
checks if it has to spawn the process in a separate container (for
example, iptables wouldn't
be the case) and if so, it'll use the docker socket to do it.
We'll also have to monitor the PID files on those containers to respawn
them in case they
die.
IMHO, this is far from the containers philosophy since we're using host
networking,
privileged access, sharing namespaces, relying on 'sidecar'
containers... but I can't think of
a better way to do it.
This still looks fitting well into the k8s pods concept [0], with
healthchecks and shared namespaces and logical coupling of sidecars,
which is the agents and helping daemons running in namespaces. I hope it
does.
[0] https://kubernetes.io/docs/concepts/workloads/pods/pod/
-Brian
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev