Re: mworker: seamless reloads broken since 1.8.1
Hello, I retrieve this old thread since we are getting the issue again with: # haproxy -vv HA-Proxy version 1.8.4-1deb90d 2018/02/08 I am trying to see whether I can reproduce it easily. Best, -- William
Re: mworker: seamless reloads broken since 1.8.1
Hi Pierre, On Wed, Jan 24, 2018 at 03:07:54PM +0100, Pierre Cheynier wrote: > Willy, I confirm that it works well again running the following version: > > $ haproxy -v > HA-Proxy version 1.8.3-945f4cf 2018/01/23 > > Added nbthread again, reloads are transparents. Excellent, many thanks for confirming! Willy
Re: mworker: seamless reloads broken since 1.8.1
On 23/01/2018 19:29, Willy Tarreau wrote: > Pierre, please give a try to the latest 1.8 branch or the next nightly > snapshot tomorrow morning. It addresses the aforementionned issue, and > I hope it's the same you're facing. > > Cheers, > Willy Willy, I confirm that it works well again running the following version: $ haproxy -v HA-Proxy version 1.8.3-945f4cf 2018/01/23 Added nbthread again, reloads are transparents. Thanks, Pierre signature.asc Description: OpenPGP digital signature
Re: mworker: seamless reloads broken since 1.8.1
On Tue, Jan 23, 2018 at 06:43:51PM +0100, Willy Tarreau wrote: > I'm switching to this now after having dealt with the polling fixes, > I'll try to have something testable this evening or tomorrow. Pierre, please give a try to the latest 1.8 branch or the next nightly snapshot tomorrow morning. It addresses the aforementionned issue, and I hope it's the same you're facing. Cheers, Willy
Re: mworker: seamless reloads broken since 1.8.1
Hi Pierre, On Wed, Jan 17, 2018 at 05:03:18PM +0100, Pierre Cheynier wrote: > Hi, > > On 08/01/2018 14:32, Pierre Cheynier wrote: > > I retried this morning, I confirm that on 1.8.3, using > (...) > > I get RSTs (not seamless reloads) when I introduce the global/nbthread > > X, after a systemctl haproxy restart. > > Any news on that ? > > I saw one mworker commit ("execvp failure depending on argv[0]") but I > guess it's completely independent. In another thread with Marc Fournier, we've identified a real issue with the way threads start the listeners and close the mworker pipe. It causes all sort of random behaviours, like closing just created listeners. That could very possibly match what you're seeing. I'm switching to this now after having dealt with the polling fixes, I'll try to have something testable this evening or tomorrow. Cheers, Willy
Re: mworker: seamless reloads broken since 1.8.1
Hi, On 08/01/2018 14:32, Pierre Cheynier wrote: > I retried this morning, I confirm that on 1.8.3, using (...) > I get RSTs (not seamless reloads) when I introduce the global/nbthread > X, after a systemctl haproxy restart. Any news on that ? I saw one mworker commit ("execvp failure depending on argv[0]") but I guess it's completely independent. Thanks, Pierre
Re: mworker: seamless reloads broken since 1.8.1
Hi, On 08/01/2018 10:24, Lukas Tribus wrote: > > FYI there is a report on discourse mentioning this problem, and the > poster appears to be able to reproduce the problem without nbthread > paramter as well: > > https://discourse.haproxy.org/t/seamless-reloads-dont-work-with-systemd/1954 > > > Lukas I retried this morning, I confirm that on 1.8.3, using $ haproxy -vv HA-Proxy version 1.8.3-205f675 2017/12/30 Copyright 2000-2017 Willy Tarreau Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -DTCP_USER_TIMEOUT=18 OPTIONS = USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_TFO=1 USE_NS=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 I get RSTs (not seamless reloads) when I introduce the global/nbthread X, after a systemctl haproxy restart. Pierre
Re: mworker: seamless reloads broken since 1.8.1
Hello, On Fri, Jan 5, 2018 at 4:44 PM, William Lallemand wrote: > I'm able to reproduce, looks like it happens with the nbthread parameter only, > I'll try to find the problem in the code. FYI there is a report on discourse mentioning this problem, and the poster appears to be able to reproduce the problem without nbthread paramter as well: https://discourse.haproxy.org/t/seamless-reloads-dont-work-with-systemd/1954 Lukas
Re: mworker: seamless reloads broken since 1.8.1
On 05/01/2018 16:44, William Lallemand wrote: > I'm able to reproduce, looks like it happens with the nbthread parameter only, Exact, I observe the same. At least I have a workaround for now to perform the upgrade. > I'll try to find the problem in the code. > Thanks ! Pierre
Re: mworker: seamless reloads broken since 1.8.1
On Fri, Jan 05, 2018 at 03:52:22PM +0100, Pierre Cheynier wrote: > OK so now that I've applied all of Lukas recos (I kept the -x added ) : > > * I don't see any ALERT log anymore.. Only the WARNs > I'm still seing a few of them in journalctl. Maybe you don't see those emitted by the workers, there is still room for improvement there. I'm taking notes. > Jan 05 14:47:12 hostname systemd[1]: Reloaded HAProxy Load Balancer. > Jan 05 14:47:12 hostname haproxy[59888]: [WARNING] 004/144712 (59888) : > Former worker 61331 exited with code 0 > Jan 05 14:47:25 hostname haproxy[59888]: [WARNING] 004/144712 (59888) : > Reexecuting Master process > Jan 05 14:47:26 hostname systemd[1]: Reloaded HAProxy Load Balancer. > Jan 05 14:47:26 hostname haproxy[59888]: [WARNING] 004/144726 (59888) : > Former worker 61355 exited with code 0 > > * I still observe the same issue (here doing an ab during a > rolling/upgrade of my test app => consequently triggering N reloads on > HAProxy as long as the app instances are created/destroyed). > > $ ab -n10 http://test-app.tld/ > (..) > Benchmarking test-app.tld (be patient) > apr_socket_recv: Connection reset by peer (104) > Total of 3031 requests completed > I'm able to reproduce, looks like it happens with the nbthread parameter only, I'll try to find the problem in the code. -- William Lallemand
Re: mworker: seamless reloads broken since 1.8.1
>> Hi, >> >>> Your systemd configuration is not uptodate. >>> >>> Please: >>> - make sure haproxy is compiled with USE_SYSTEMD=1 >>> - update the unit file: start haproxy with -Ws instead of -W (ExecStart) >>> - update the unit file: use Type=notify instead of Type=forking >> In fact that should work with this configuration too. > OK, I have to admit that we started experiments on 1.8-dev2, at that > time I had to do that to make it work. > And true, we build the RPM and so didn't notice there was some updates > after the 1.8.0 release for the systemd unit file provided in contrib/. > Currently recompiling, bumping the release on CI / dev environment etc... >> >>> We always ship an uptodate unit file in >>> contrib/systemd/haproxy.service.in (just make sure you maintain the >>> $OPTIONS variable, otherwise you are missing the -x call for the >>> seamless reload). >> You don't need the -x with -W or -Ws, it's added automaticaly by the master >> during a reload. > Interesting. Is this new ? Because I noticed it was not the case at some > point. >>> Run "systemctl daemon-reload" after updating the unit file and >>> completely stop the old service (don't reload after updating the unit >>> file), to make sure you have a "clean" situation. >>> >>> I don't see how this systemd thing would affect the actual seamless >>> reload (systemd shouldn't be a requirement), but lets fix it >>> nonetheless before continuing the troubleshooting. Maybe the >>> regression only affects non-systemd mode. >> Shouldn't be a problem, but it's better to use -Ws with systemd. >> >> During a reload, if the -x fail, you should have this kind of errors: >> >> [WARNING] 004/135908 (12013) : Failed to connect to the old process socket >> '/tmp/sock4' >> [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process! >> >> Are you seeing anything like this? > Yes, in > 1.8.0. If I rollback to 1.8.0 it's fine on this aspect. > > I'll give updates after applying Lukas recommendations. > > Pierre > OK so now that I've applied all of Lukas recos (I kept the -x added ) : * I don't see any ALERT log anymore.. Only the WARNs Jan 05 14:47:12 hostname systemd[1]: Reloaded HAProxy Load Balancer. Jan 05 14:47:12 hostname haproxy[59888]: [WARNING] 004/144712 (59888) : Former worker 61331 exited with code 0 Jan 05 14:47:25 hostname haproxy[59888]: [WARNING] 004/144712 (59888) : Reexecuting Master process Jan 05 14:47:26 hostname systemd[1]: Reloaded HAProxy Load Balancer. Jan 05 14:47:26 hostname haproxy[59888]: [WARNING] 004/144726 (59888) : Former worker 61355 exited with code 0 * I still observe the same issue (here doing an ab during a rolling/upgrade of my test app => consequently triggering N reloads on HAProxy as long as the app instances are created/destroyed). $ ab -n10 http://test-app.tld/ (..) Benchmarking test-app.tld (be patient) apr_socket_recv: Connection reset by peer (104) Total of 3031 requests completed Pierre signature.asc Description: OpenPGP digital signature
Re: mworker: seamless reloads broken since 1.8.1
> Hi, > >>> $ cat /usr/lib/systemd/system/haproxy.service >>> [Unit] >>> Description=HAProxy Load Balancer >>> After=syslog.target network.target >>> >>> [Service] >>> EnvironmentFile=/etc/sysconfig/haproxy >>> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q >>> ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS >>> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q >>> ExecReload=/bin/kill -USR2 $MAINPID >>> Type=forking >>> KillMode=mixed >>> Restart=always >> Your systemd configuration is not uptodate. >> >> Please: >> - make sure haproxy is compiled with USE_SYSTEMD=1 >> - update the unit file: start haproxy with -Ws instead of -W (ExecStart) >> - update the unit file: use Type=notify instead of Type=forking > In fact that should work with this configuration too. OK, I have to admit that we started experiments on 1.8-dev2, at that time I had to do that to make it work. And true, we build the RPM and so didn't notice there was some updates after the 1.8.0 release for the systemd unit file provided in contrib/. Currently recompiling, bumping the release on CI / dev environment etc... > >> We always ship an uptodate unit file in >> contrib/systemd/haproxy.service.in (just make sure you maintain the >> $OPTIONS variable, otherwise you are missing the -x call for the >> seamless reload). > You don't need the -x with -W or -Ws, it's added automaticaly by the master > during a reload. Interesting. Is this new ? Because I noticed it was not the case at some point. >> Run "systemctl daemon-reload" after updating the unit file and >> completely stop the old service (don't reload after updating the unit >> file), to make sure you have a "clean" situation. >> >> I don't see how this systemd thing would affect the actual seamless >> reload (systemd shouldn't be a requirement), but lets fix it >> nonetheless before continuing the troubleshooting. Maybe the >> regression only affects non-systemd mode. > Shouldn't be a problem, but it's better to use -Ws with systemd. > > During a reload, if the -x fail, you should have this kind of errors: > > [WARNING] 004/135908 (12013) : Failed to connect to the old process socket > '/tmp/sock4' > [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process! > > Are you seeing anything like this? Yes, in > 1.8.0. If I rollback to 1.8.0 it's fine on this aspect. I'll give updates after applying Lukas recommendations. Pierre signature.asc Description: OpenPGP digital signature
Re: mworker: seamless reloads broken since 1.8.1
Hi, > > $ cat /usr/lib/systemd/system/haproxy.service > > [Unit] > > Description=HAProxy Load Balancer > > After=syslog.target network.target > > > > [Service] > > EnvironmentFile=/etc/sysconfig/haproxy > > ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q > > ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS > > ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q > > ExecReload=/bin/kill -USR2 $MAINPID > > Type=forking > > KillMode=mixed > > Restart=always > > Your systemd configuration is not uptodate. > > Please: > - make sure haproxy is compiled with USE_SYSTEMD=1 > - update the unit file: start haproxy with -Ws instead of -W (ExecStart) > - update the unit file: use Type=notify instead of Type=forking In fact that should work with this configuration too. > We always ship an uptodate unit file in > contrib/systemd/haproxy.service.in (just make sure you maintain the > $OPTIONS variable, otherwise you are missing the -x call for the > seamless reload). You don't need the -x with -W or -Ws, it's added automaticaly by the master during a reload. > Run "systemctl daemon-reload" after updating the unit file and > completely stop the old service (don't reload after updating the unit > file), to make sure you have a "clean" situation. > > I don't see how this systemd thing would affect the actual seamless > reload (systemd shouldn't be a requirement), but lets fix it > nonetheless before continuing the troubleshooting. Maybe the > regression only affects non-systemd mode. Shouldn't be a problem, but it's better to use -Ws with systemd. During a reload, if the -x fail, you should have this kind of errors: [WARNING] 004/135908 (12013) : Failed to connect to the old process socket '/tmp/sock4' [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process! Are you seeing anything like this? -- William Lallemand
Re: mworker: seamless reloads broken since 1.8.1
Hello Pierre, On Fri, Jan 5, 2018 at 11:48 AM, Pierre Cheynier wrote: > Hi list, > > We've recently tried to upgrade from 1.8.0 to 1.8.1, then 1.8.2, 1.8.3 > on a preprod environment and noticed that the reload is not so seamless > since 1.8.1 (easily getting TCP RSTs while reloading). > > Having a short look on the haproxy-1.8 git remote on the changes > affecting haproxy.c, c2b28144 can be eliminated, so 3 commits remains: > > * 3ce53f66 MINOR: threads: Fix pthread_setaffinity_np on FreeBSD. (5 > weeks ago) > * f926969a BUG/MINOR: mworker: detach from tty when in daemon mode (5 > weeks ago) > * 4e612023 BUG/MINOR: mworker: fix validity check for the pipe FDs (5 > weeks ago) > > In case it matters: we use threads and did the usual worker setup (which > again works very well in 1.8.0). Ok, so the change in behavior is between 1.8.0 and 1.8.1. > $ cat /usr/lib/systemd/system/haproxy.service > [Unit] > Description=HAProxy Load Balancer > After=syslog.target network.target > > [Service] > EnvironmentFile=/etc/sysconfig/haproxy > ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q > ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS > ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q > ExecReload=/bin/kill -USR2 $MAINPID > Type=forking > KillMode=mixed > Restart=always Your systemd configuration is not uptodate. Please: - make sure haproxy is compiled with USE_SYSTEMD=1 - update the unit file: start haproxy with -Ws instead of -W (ExecStart) - update the unit file: use Type=notify instead of Type=forking We always ship an uptodate unit file in contrib/systemd/haproxy.service.in (just make sure you maintain the $OPTIONS variable, otherwise you are missing the -x call for the seamless reload). Run "systemctl daemon-reload" after updating the unit file and completely stop the old service (don't reload after updating the unit file), to make sure you have a "clean" situation. I don't see how this systemd thing would affect the actual seamless reload (systemd shouldn't be a requirement), but lets fix it nonetheless before continuing the troubleshooting. Maybe the regression only affects non-systemd mode. Regards, Lukas
mworker: seamless reloads broken since 1.8.1
Hi list, We've recently tried to upgrade from 1.8.0 to 1.8.1, then 1.8.2, 1.8.3 on a preprod environment and noticed that the reload is not so seamless since 1.8.1 (easily getting TCP RSTs while reloading). Having a short look on the haproxy-1.8 git remote on the changes affecting haproxy.c, c2b28144 can be eliminated, so 3 commits remains: * 3ce53f66 MINOR: threads: Fix pthread_setaffinity_np on FreeBSD. (5 weeks ago) * f926969a BUG/MINOR: mworker: detach from tty when in daemon mode (5 weeks ago) * 4e612023 BUG/MINOR: mworker: fix validity check for the pipe FDs (5 weeks ago) In case it matters: we use threads and did the usual worker setup (which again works very well in 1.8.0). Here is a config extract: $ cat /etc/haproxy/haproxy.cfg: (...) user haproxy group haproxy nbproc 1 daemon stats socket /var/lib/haproxy/stats level admin mode 644 expose-fd listeners stats timeout 2m nbthread 11 (...) $ cat /etc/sysconfig/haproxy (...) CONFIG="/etc/haproxy/haproxy.cfg" PIDFILE="/run/haproxy.pid" OPTIONS="-x /var/lib/haproxy/stats" (...) $ cat /usr/lib/systemd/system/haproxy.service [Unit] Description=HAProxy Load Balancer After=syslog.target network.target [Service] EnvironmentFile=/etc/sysconfig/haproxy ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID Type=forking KillMode=mixed Restart=always Does the behavior observed sounds consistent regarding the changes that occurred between 1.8.0 and 1.8.1 ? Before trying to bisect, compile, test etc. I'd like to get your feedback. Thanks in advance, Pierre signature.asc Description: OpenPGP digital signature