Re: Old instances continue to accept connections after graceful reload

Bryan Talbot Fri, 05 Feb 2016 20:09:27 -0800

On Fri, Feb 5, 2016 at 7:07 PM, Bryan Talbot <bryan.tal...@ijji.com> wrote:


> I think you're just attempting to reload haproxy too fast. There are race
> conditions in getting the list of running pids and passing them into
> haproxy -- that list changes before the next proxy is started.
>

After further investigation, I think this answer is only partially correct.
Reloads are happening too fast, but the -wait option for consul wasn't
fully effective.

You've probably seen this old bug on this issue that seems to affect consul
after version 0.10.0
https://github.com/hashicorp/consul-template/issues/442

I don't use consul but took this opportunity to try it out. It seems to
have issues delivering signals -- at least for haproxy -- as those bug
reports attest.

To avoid the signal handling issue I got your setup to work using consul
but with a processes structured so that consul doesn't manage the haproxy
processes but just send HUP signals to trigger a reload. For me, this is
working even when starting and stopping 10s of backend processes per second.

The process tree in the docker container now looks like this (line wrapping
is going to mess this up):

docker top haproxy xf
PID                 TTY                 STAT                TIME
     COMMAND
14272               ?                   Ss                  0:00
     \_ /bin/sh /entrypoint.sh
14286               ?                   Sl                  0:00
     | \_ consul-template -config=/tmp/haproxy.ctmpl.cfg -log-level=debug
14287               ?                   S                   0:00
     | \_ /usr/local/sbin/haproxy-systemd-wrapper -f
/usr/local/etc/haproxy/haproxy.cfg -p /run/haproxy.pid
15380               ?                   S                   0:00
     | \_ /usr/local/sbin/haproxy -f /usr/local/etc/haproxy/haproxy.cfg -p
/run/haproxy.pid -Ds -sf 184
15381               ?                   Ss                  0:00
     | \_ /usr/local/sbin/haproxy -f /usr/local/etc/haproxy/haproxy.cfg -p
/run/haproxy.pid -Ds -sf 184




Since it uses haproxy-systemd-wrapper to manage the haproxy processes and
the alpine haproxy package doesn't include it, I used my own haproxy docker
contain which does include the wrapper. Below are the files from your
config with changes that make this work for me.



$> cat Dockerfile

FROM fingershock/haproxy-base:1.6.3
MAINTAINER Pure Storage version: 0.1

ENV CONSUL_TEMPLATE_VERSION=0.12.2
ENV CONSUL_URL=
https://releases.hashicorp.com/consul-template/${CONSUL_TEMPLATE_VERSION}/consul-template_${CONSUL_TEMPLATE_VERSION}_linux_amd64.zip

# Download consul-template
RUN ( wget --no-verbose --no-check-certificate ${CONSUL_URL} -O
/tmp/consul_template.zip && \
      unzip -d /tmp/ -o /tmp/consul_template.zip && \
      mv /tmp/consul-template /usr/bin && \
      rm -rf /tmp/* )

COPY *.http /etc/haproxy/errors/

COPY haproxy.ctmpl.cfg /tmp/
COPY haproxy.ctmpl /tmp/
COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]





$> cat entrypoint.sh
#!/bin/sh

# This script starts haproxy and consul.
# It then remains around as PID1 and exits when a signal to terminate is
received. It will also
# exit if any child processes die (haproxy-systemd-wrapper, consul).

#set -o xtrace
set -o errexit
if [ -n "$BASH_VERSION" ]; then
    # These are not POSIX and won't work (or are not needed) with dash
shell but are needed for bash
    # enable Monitor so job control is active and we get SIGCHLD (when
using bash)
    set -o monitor
    set -o pipefail
fi

if [ $# -ne 0 ]; then
  exec "$@"
fi


# generate initial configuration
consul-template -config=/tmp/haproxy.ctmpl.cfg -log-level=debug -once

# and watch for changes
consul-template -config=/tmp/haproxy.ctmpl.cfg -log-level=debug &

# start haproxy using systemd-wrapper as that process should not exit and
manages haproxy daemons
/usr/local/sbin/haproxy-systemd-wrapper -f
/usr/local/etc/haproxy/haproxy.cfg -p /run/haproxy.pid &


# exit for normal termination signals
quit()
{
    exit 0
}
# exit if haproxy or consul stops running
childquit()
{
    exit 1
}

trap childquit CHLD
trap quit INT QUIT TERM

wait




$> cat haproxy.ctmpl.cfg
template {
  source = "/tmp/haproxy.ctmpl"
  destination = "/usr/local/etc/haproxy/haproxy.cfg"
  command = "killall -s HUP haproxy-systemd-wrapper || true"
}



-Bryan




>
> Your test case is reloading haproxy about 10 times per second. There are
> several reports on this list about that being an issue. AFAIK, the issue
> isn't with the proxy code itself, but just with the way that the list of
> running processes are collected and signaled.
>
> Anyway, for your case, adding a -wait=5s option to consul-template makes
> the problem go away for me.
>
> Maybe you can get away with reloading every 1 second, but multiple times
> per second is not likely to work reliably.
>
>
> -Bryan
>
>
> On Fri, Feb 5, 2016 at 6:11 PM, Maciej Katafiasz <
> mkatafi...@purestorage.com> wrote:
>
>> I will try with explicit -reap, but we have actually investigated that
>> as one possible cause, and consul-template reports automatically
>> engaging reap mode as it sees itself as PID 1, and then reaps the
>> processes that actually exit correctly. The problem is that old
>> haproxy processes continue to run and process requests when they're
>> not supposed to, which is different to dead, but unreaped processes
>> that you'd see with faulty PID 1. As I said, most of the "aha!"
>> moments we've had with this problem turned out to be spurious
>> correlations and the issue ultimately came back. It's just that
>> sometimes, for no articulable reason, it works fine, and then the next
>> time it doesn't.
>>
>> Cheers,
>>
>> On 5 February 2016 at 16:59, Cyril Bonté <cyril.bo...@free.fr> wrote:
>> > Hi,
>> >
>> >
>> > Le 06/02/2016 01:03, Maciej Katafiasz a écrit :
>> >>
>> >> On 5 February 2016 at 16:02, Maciej Katafiasz
>> >> <mkatafi...@purestorage.com> wrote:
>> >>>
>> >>> Link to the tarball:
>> >>> https://purestorage.app.box.com/s/nnzqueais46plzd9xfisnmkeab7j9s0y
>> >>>
>> >>> I will be sending it as an attachment in a separate mail as a followup
>> >>> to this one, in case the mailing list software scrubs attachments
>> >>> and/or considers them spam.
>> >>
>> >>
>> >> And here's the tarball as an attachment
>> >
>> >
>> > This looks to be concern more consul inside a docker container than a
>> > haproxy itself. But this may explain some similar reports made by other
>> > users recently.
>> >
>> > Use the -reap option with consul-template, and haproxy will reload
>> > correctly.
>> >
>> > As quick example :
>> > $ docker run --name=haproxy -d --net=host haproxy-bugtest
>> consul-template
>> > -config=/tmp/haproxy.ctmpl.cfg -log-level=debug -reap
>> >
>> > $ docker exec -ti haproxy sh
>> > # ps aux
>> > ...
>> > 16 root       0:00 haproxy -f /etc/haproxy/haproxy.cfg -d -p
>> > /var/run/haproxy.pid -sf
>> > ...
>> >
>> > # haproxy -f /etc/haproxy/haproxy.cfg -d -p /var/run/haproxy.pid -sf 16
>> &>
>> > /tmp/debug.log&
>> >
>> > # ps aux
>> > ...
>> >    27 root       0:00 haproxy -f /etc/haproxy/haproxy.cfg -d -p
>> > /var/run/haproxy.pid -sf 16
>> > ...
>> > => No more PID 16
>> >
>> >
>> >
>> >
>> > --
>> > Cyril Bonté
>>
>>
>

Re: Old instances continue to accept connections after graceful reload

Reply via email to