Re: mworker: seamless reloads broken since 1.8.1

2018-02-20 Thread William Dauchy
Hello,

I retrieve this old thread since we are getting the issue again with:
# haproxy -vv
HA-Proxy version 1.8.4-1deb90d 2018/02/08

I am trying to see whether I can reproduce it easily.

Best,
-- 
William



Re: mworker: seamless reloads broken since 1.8.1

2018-01-24 Thread Willy Tarreau
Hi Pierre,

On Wed, Jan 24, 2018 at 03:07:54PM +0100, Pierre Cheynier wrote:
> Willy, I confirm that it works well again running the following version:
> 
> $ haproxy -v
> HA-Proxy version 1.8.3-945f4cf 2018/01/23
> 
> Added nbthread again, reloads are transparents.

Excellent, many thanks for confirming!

Willy



Re: mworker: seamless reloads broken since 1.8.1

2018-01-24 Thread Pierre Cheynier
On 23/01/2018 19:29, Willy Tarreau wrote:
> Pierre, please give a try to the latest 1.8 branch or the next nightly
> snapshot tomorrow morning. It addresses the aforementionned issue, and
> I hope it's the same you're facing.
>
> Cheers,
> Willy
Willy, I confirm that it works well again running the following version:

$ haproxy -v
HA-Proxy version 1.8.3-945f4cf 2018/01/23

Added nbthread again, reloads are transparents.

Thanks,

Pierre



signature.asc
Description: OpenPGP digital signature


Re: mworker: seamless reloads broken since 1.8.1

2018-01-23 Thread Willy Tarreau
On Tue, Jan 23, 2018 at 06:43:51PM +0100, Willy Tarreau wrote:
> I'm switching to this now after having dealt with the polling fixes,
> I'll try to have something testable this evening or tomorrow.

Pierre, please give a try to the latest 1.8 branch or the next nightly
snapshot tomorrow morning. It addresses the aforementionned issue, and
I hope it's the same you're facing.

Cheers,
Willy



Re: mworker: seamless reloads broken since 1.8.1

2018-01-23 Thread Willy Tarreau
Hi Pierre,

On Wed, Jan 17, 2018 at 05:03:18PM +0100, Pierre Cheynier wrote:
> Hi,
> 
> On 08/01/2018 14:32, Pierre Cheynier wrote:
> > I retried this morning, I confirm that on 1.8.3, using
> (...)
> > I get RSTs (not seamless reloads) when I introduce the global/nbthread
> > X, after a systemctl haproxy restart.
> 
> Any news on that ?
> 
> I saw one mworker commit ("execvp failure depending on argv[0]") but I
> guess it's completely independent.

In another thread with Marc Fournier, we've identified a real issue with
the way threads start the listeners and close the mworker pipe. It causes
all sort of random behaviours, like closing just created listeners. That
could very possibly match what you're seeing.

I'm switching to this now after having dealt with the polling fixes,
I'll try to have something testable this evening or tomorrow.

Cheers,
Willy



Re: mworker: seamless reloads broken since 1.8.1

2018-01-17 Thread Pierre Cheynier
Hi,

On 08/01/2018 14:32, Pierre Cheynier wrote:
> I retried this morning, I confirm that on 1.8.3, using
(...)
> I get RSTs (not seamless reloads) when I introduce the global/nbthread
> X, after a systemctl haproxy restart.

Any news on that ?

I saw one mworker commit ("execvp failure depending on argv[0]") but I
guess it's completely independent.

Thanks,

Pierre





Re: mworker: seamless reloads broken since 1.8.1

2018-01-08 Thread Pierre Cheynier
Hi,

On 08/01/2018 10:24, Lukas Tribus wrote:
>
> FYI there is a report on discourse mentioning this problem, and the
> poster appears to be able to reproduce the problem without nbthread
> paramter as well:
>
> https://discourse.haproxy.org/t/seamless-reloads-dont-work-with-systemd/1954
>
>
> Lukas
I retried this morning, I confirm that on 1.8.3, using

$ haproxy -vv
HA-Proxy version 1.8.3-205f675 2017/12/30
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -Wno-unused-label -DTCP_USER_TIMEOUT=18
  OPTIONS = USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_ZLIB=1
USE_REGPARM=1 USE_OPENSSL=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1
USE_TFO=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

I get RSTs (not seamless reloads) when I introduce the global/nbthread
X, after a systemctl haproxy restart.

Pierre




Re: mworker: seamless reloads broken since 1.8.1

2018-01-08 Thread Lukas Tribus
Hello,


On Fri, Jan 5, 2018 at 4:44 PM, William Lallemand
 wrote:
> I'm able to reproduce, looks like it happens with the nbthread parameter only,
> I'll try to find the problem in the code.

FYI there is a report on discourse mentioning this problem, and the
poster appears to be able to reproduce the problem without nbthread
paramter as well:

https://discourse.haproxy.org/t/seamless-reloads-dont-work-with-systemd/1954


Lukas



Re: mworker: seamless reloads broken since 1.8.1

2018-01-05 Thread Pierre Cheynier
On 05/01/2018 16:44, William Lallemand wrote:
> I'm able to reproduce, looks like it happens with the nbthread parameter only,
Exact, I observe the same.
At least I have a workaround for now to perform the upgrade.
> I'll try to find the problem in the code.
>
Thanks !

Pierre




Re: mworker: seamless reloads broken since 1.8.1

2018-01-05 Thread William Lallemand
On Fri, Jan 05, 2018 at 03:52:22PM +0100, Pierre Cheynier wrote:
> OK so now that I've applied all of Lukas recos (I kept the -x added ) :
> 
> * I don't see any ALERT log anymore.. Only the WARNs
> 

I'm still seing a few of them in journalctl. Maybe you don't see those emitted
by the workers, there is still room for improvement there. I'm taking notes.

> Jan 05 14:47:12 hostname systemd[1]: Reloaded HAProxy Load Balancer.
> Jan 05 14:47:12 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
> Former worker 61331 exited with code 0
> Jan 05 14:47:25 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
> Reexecuting Master process
> Jan 05 14:47:26 hostname systemd[1]: Reloaded HAProxy Load Balancer.
> Jan 05 14:47:26 hostname haproxy[59888]: [WARNING] 004/144726 (59888) :
> Former worker 61355 exited with code 0
> 
> * I still observe the same issue (here doing an ab during a
> rolling/upgrade of my test app => consequently triggering N reloads on
> HAProxy as long as the app instances are created/destroyed).
> 
> $ ab -n10  http://test-app.tld/
> (..)
> Benchmarking test-app.tld (be patient)
> apr_socket_recv: Connection reset by peer (104)
> Total of 3031 requests completed
> 

I'm able to reproduce, looks like it happens with the nbthread parameter only,
I'll try to find the problem in the code.

-- 
William Lallemand



Re: mworker: seamless reloads broken since 1.8.1

2018-01-05 Thread Pierre Cheynier

>> Hi,
>>
>>> Your systemd configuration is not uptodate.
>>>
>>> Please:
>>> - make sure haproxy is compiled with USE_SYSTEMD=1
>>> - update the unit file: start haproxy with -Ws instead of -W (ExecStart)
>>> - update the unit file: use Type=notify instead of Type=forking
>> In fact that should work with this configuration too.
> OK, I have to admit that we started experiments on 1.8-dev2, at that
> time I had to do that to make it work.
> And true, we build the RPM and so didn't notice there was some updates
> after the 1.8.0 release for the systemd unit file provided in contrib/.
> Currently recompiling, bumping the release on CI / dev environment etc...
>>  
>>> We always ship an uptodate unit file in
>>> contrib/systemd/haproxy.service.in (just make sure you maintain the
>>> $OPTIONS variable, otherwise you are missing the -x call for the
>>> seamless reload).
>> You don't need the -x with -W or -Ws, it's added automaticaly by the master
>> during a reload. 
> Interesting. Is this new ? Because I noticed it was not the case at some
> point.
>>> Run "systemctl daemon-reload" after updating the unit file and
>>> completely stop the old service (don't reload after updating the unit
>>> file), to make sure you have a "clean" situation.
>>>
>>> I don't see how this systemd thing would affect the actual seamless
>>> reload (systemd shouldn't be a requirement), but lets fix it
>>> nonetheless before continuing the troubleshooting. Maybe the
>>> regression only affects non-systemd mode.
>> Shouldn't be a problem, but it's better to use -Ws with systemd.
>>
>> During a reload, if the -x fail, you should have this kind of errors:
>>
>> [WARNING] 004/135908 (12013) : Failed to connect to the old process socket 
>> '/tmp/sock4'
>> [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process!
>>
>> Are you seeing anything like this?
> Yes, in > 1.8.0. If I rollback to 1.8.0 it's fine on this aspect.
>
> I'll give updates after applying Lukas recommendations.
>
> Pierre
>
OK so now that I've applied all of Lukas recos (I kept the -x added ) :

* I don't see any ALERT log anymore.. Only the WARNs

Jan 05 14:47:12 hostname systemd[1]: Reloaded HAProxy Load Balancer.
Jan 05 14:47:12 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
Former worker 61331 exited with code 0
Jan 05 14:47:25 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
Reexecuting Master process
Jan 05 14:47:26 hostname systemd[1]: Reloaded HAProxy Load Balancer.
Jan 05 14:47:26 hostname haproxy[59888]: [WARNING] 004/144726 (59888) :
Former worker 61355 exited with code 0

* I still observe the same issue (here doing an ab during a
rolling/upgrade of my test app => consequently triggering N reloads on
HAProxy as long as the app instances are created/destroyed).

$ ab -n10  http://test-app.tld/
(..)
Benchmarking test-app.tld (be patient)
apr_socket_recv: Connection reset by peer (104)
Total of 3031 requests completed

Pierre




signature.asc
Description: OpenPGP digital signature


Re: mworker: seamless reloads broken since 1.8.1

2018-01-05 Thread Pierre Cheynier

> Hi,
>
>>> $ cat /usr/lib/systemd/system/haproxy.service
>>> [Unit]
>>> Description=HAProxy Load Balancer
>>> After=syslog.target network.target
>>>
>>> [Service]
>>> EnvironmentFile=/etc/sysconfig/haproxy
>>> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
>>> ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
>>> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
>>> ExecReload=/bin/kill -USR2 $MAINPID
>>> Type=forking
>>> KillMode=mixed
>>> Restart=always
>> Your systemd configuration is not uptodate.
>>
>> Please:
>> - make sure haproxy is compiled with USE_SYSTEMD=1
>> - update the unit file: start haproxy with -Ws instead of -W (ExecStart)
>> - update the unit file: use Type=notify instead of Type=forking
> In fact that should work with this configuration too.
OK, I have to admit that we started experiments on 1.8-dev2, at that
time I had to do that to make it work.
And true, we build the RPM and so didn't notice there was some updates
after the 1.8.0 release for the systemd unit file provided in contrib/.
Currently recompiling, bumping the release on CI / dev environment etc...
>  
>> We always ship an uptodate unit file in
>> contrib/systemd/haproxy.service.in (just make sure you maintain the
>> $OPTIONS variable, otherwise you are missing the -x call for the
>> seamless reload).
> You don't need the -x with -W or -Ws, it's added automaticaly by the master
> during a reload. 
Interesting. Is this new ? Because I noticed it was not the case at some
point.
>> Run "systemctl daemon-reload" after updating the unit file and
>> completely stop the old service (don't reload after updating the unit
>> file), to make sure you have a "clean" situation.
>>
>> I don't see how this systemd thing would affect the actual seamless
>> reload (systemd shouldn't be a requirement), but lets fix it
>> nonetheless before continuing the troubleshooting. Maybe the
>> regression only affects non-systemd mode.
> Shouldn't be a problem, but it's better to use -Ws with systemd.
>
> During a reload, if the -x fail, you should have this kind of errors:
>
> [WARNING] 004/135908 (12013) : Failed to connect to the old process socket 
> '/tmp/sock4'
> [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process!
>
> Are you seeing anything like this?
Yes, in > 1.8.0. If I rollback to 1.8.0 it's fine on this aspect.

I'll give updates after applying Lukas recommendations.

Pierre




signature.asc
Description: OpenPGP digital signature


Re: mworker: seamless reloads broken since 1.8.1

2018-01-05 Thread William Lallemand
Hi,

> > $ cat /usr/lib/systemd/system/haproxy.service
> > [Unit]
> > Description=HAProxy Load Balancer
> > After=syslog.target network.target
> >
> > [Service]
> > EnvironmentFile=/etc/sysconfig/haproxy
> > ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
> > ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
> > ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
> > ExecReload=/bin/kill -USR2 $MAINPID
> > Type=forking
> > KillMode=mixed
> > Restart=always
> 
> Your systemd configuration is not uptodate.
> 
> Please:
> - make sure haproxy is compiled with USE_SYSTEMD=1
> - update the unit file: start haproxy with -Ws instead of -W (ExecStart)
> - update the unit file: use Type=notify instead of Type=forking

In fact that should work with this configuration too.
 
> We always ship an uptodate unit file in
> contrib/systemd/haproxy.service.in (just make sure you maintain the
> $OPTIONS variable, otherwise you are missing the -x call for the
> seamless reload).

You don't need the -x with -W or -Ws, it's added automaticaly by the master
during a reload. 

> Run "systemctl daemon-reload" after updating the unit file and
> completely stop the old service (don't reload after updating the unit
> file), to make sure you have a "clean" situation.
> 
> I don't see how this systemd thing would affect the actual seamless
> reload (systemd shouldn't be a requirement), but lets fix it
> nonetheless before continuing the troubleshooting. Maybe the
> regression only affects non-systemd mode.

Shouldn't be a problem, but it's better to use -Ws with systemd.

During a reload, if the -x fail, you should have this kind of errors:

[WARNING] 004/135908 (12013) : Failed to connect to the old process socket 
'/tmp/sock4'
[ALERT] 004/135908 (12013) : Failed to get the sockets from the old process!

Are you seeing anything like this?

-- 
William Lallemand



Re: mworker: seamless reloads broken since 1.8.1

2018-01-05 Thread Lukas Tribus
Hello Pierre,


On Fri, Jan 5, 2018 at 11:48 AM, Pierre Cheynier  wrote:
> Hi list,
>
> We've recently tried to upgrade from 1.8.0 to 1.8.1, then 1.8.2, 1.8.3
> on a preprod environment and noticed that the reload is not so seamless
> since 1.8.1 (easily getting TCP RSTs while reloading).
>
> Having a short look on the haproxy-1.8 git remote on the changes
> affecting haproxy.c, c2b28144 can be eliminated, so 3 commits remains:
>
> * 3ce53f66 MINOR: threads: Fix pthread_setaffinity_np on FreeBSD.  (5
> weeks ago)
> * f926969a BUG/MINOR: mworker: detach from tty when in daemon mode  (5
> weeks ago)
> * 4e612023 BUG/MINOR: mworker: fix validity check for the pipe FDs  (5
> weeks ago)
>
> In case it matters: we use threads and did the usual worker setup (which
> again works very well in 1.8.0).

Ok, so the change in behavior is between 1.8.0 and 1.8.1.



> $ cat /usr/lib/systemd/system/haproxy.service
> [Unit]
> Description=HAProxy Load Balancer
> After=syslog.target network.target
>
> [Service]
> EnvironmentFile=/etc/sysconfig/haproxy
> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecReload=/bin/kill -USR2 $MAINPID
> Type=forking
> KillMode=mixed
> Restart=always

Your systemd configuration is not uptodate.

Please:
- make sure haproxy is compiled with USE_SYSTEMD=1
- update the unit file: start haproxy with -Ws instead of -W (ExecStart)
- update the unit file: use Type=notify instead of Type=forking

We always ship an uptodate unit file in
contrib/systemd/haproxy.service.in (just make sure you maintain the
$OPTIONS variable, otherwise you are missing the -x call for the
seamless reload).
Run "systemctl daemon-reload" after updating the unit file and
completely stop the old service (don't reload after updating the unit
file), to make sure you have a "clean" situation.

I don't see how this systemd thing would affect the actual seamless
reload (systemd shouldn't be a requirement), but lets fix it
nonetheless before continuing the troubleshooting. Maybe the
regression only affects non-systemd mode.



Regards,
Lukas