Re: [RFC PATCH] BUG/MINOR: h2: use valid stream id in GOAWAY

2017-11-21 Thread Willy Tarreau
Hi Lukas,

On Wed, Nov 22, 2017 at 01:43:32AM +0100, Lukas Tribus wrote:
> Since af1e4f5167 ("MEDIUM: h2: perform a graceful shutdown on "Connection:
> close"") we send GOAWAY with last stream identifier set to 2^31-1, as per
> the RFC suggestion [1]. However that is only part of what the RFC suggests
> if we want to close the connection gracefully; after at least 1 RTT we
> would have to send another GOAWAY with a proper and valid last stream ID.
> Just the "2^31-1" GOAWAY is not enough.
> 
> In fact this confuses Chrome and leads to a hung connection that clears
> only by "timeout client" or "timeout server" (whatever strikes first).

This is annoying. I'm still wondering whether it's a good thing to try
to close the connection when we have a close. I've been wanting to ensure
we can close a connection so that all users having "http-request deny"
and similar rules can continue to use them to fight attacks, but a graceful
shutdown doesn't appear 100% efficient either :-/

Also I'd like to change all the error messages to support keep-alive (at
least in the announce) so that we don't close on 401 etc. But once we
stop closing on any such codes, we can wonder whether it still makes
sense to close when "connection: close" appears in the response :-/

Maybe we should have a "close" action on the request to explicitly close
a connection, regardless of headers. We already have such an action on
tcp-response to close the server connection. It can make sense.

> When all 3 connections slots to the HTTP2 server are filled, Chrome is
> unable to communicate with the server at all.

Then at least we're not the only ones to have a bug in this chain :-)

> Trivial fix (to be discussed) is to not set the last stream id to 2^31-1.

We must really not do this, that's what we used to do before the patch
you mentionned above. It invites the browser to *safely* retry all
streams with a higher ID than the one mentionned there, but in practice
some clients are not able to retry so they report errors as their pending
requests are simply lost.

I suspect we could act differently : when the last stream has been closed,
we currently re-arm a timer on the H2 connection to close it if it's idle.
What we could do would be to check if we've already sent a graceful shutdown
and in this case arm a much shorter timer (eg: 1s) so that the final GOAWAY
is sent and the connection closed. Could you please try the attached patch,
which tries to do this ?

With that said, there is an issue with Chrome if it refrains from using
these connections an refrains from creating a new one as well, because
this means that it can be limited to a request rate of only 3 per RTT
if each connection handles a single request.

Thanks,
Willy
diff --git a/src/mux_h2.c b/src/mux_h2.c
index eb8dd0e..4d71c95 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -2134,7 +2134,7 @@ static int h2_wake(struct connection *conn)
 
if (h2c->task) {
if (eb_is_empty(>streams_by_id)) {
-   h2c->task->expire = tick_add(now_ms, h2c->timeout);
+   h2c->task->expire = tick_add(now_ms, h2c->last_sid < 0 
? h2c->timeout : MS_TO_TICKS(1000));
task_queue(h2c->task);
}
else
@@ -2166,6 +2166,7 @@ static struct task *h2_timeout_task(struct task *t)
}
 
/* try to send but no need to insist */
+   h2c->last_sid = h2c->max_id;
if (h2c_send_goaway_error(h2c, NULL) <= 0)
h2c->flags |= H2_CF_GOAWAY_FAILED;
 
@@ -2298,7 +2299,7 @@ static void h2_detach(struct conn_stream *cs)
}
else if (h2c->task) {
if (eb_is_empty(>streams_by_id)) {
-   h2c->task->expire = tick_add(now_ms, 
h2c->timeout);
+   h2c->task->expire = tick_add(now_ms, 
h2c->last_sid < 0 ? h2c->timeout : MS_TO_TICKS(1000));
task_queue(h2c->task);
}
else


[RFC PATCH] BUG/MINOR: h2: use valid stream id in GOAWAY

2017-11-21 Thread Lukas Tribus
Since af1e4f5167 ("MEDIUM: h2: perform a graceful shutdown on "Connection:
close"") we send GOAWAY with last stream identifier set to 2^31-1, as per
the RFC suggestion [1]. However that is only part of what the RFC suggests
if we want to close the connection gracefully; after at least 1 RTT we
would have to send another GOAWAY with a proper and valid last stream ID.
Just the "2^31-1" GOAWAY is not enough.

In fact this confuses Chrome and leads to a hung connection that clears
only by "timeout client" or "timeout server" (whatever strikes first).

When all 3 connections slots to the HTTP2 server are filled, Chrome is
unable to communicate with the server at all.

Trivial fix (to be discussed) is to not set the last stream id to 2^31-1.

[1] https://tools.ietf.org/html/rfc7540#section-6.8
---

RFC patch: needs discussion and careful review; I just got my feet wet
with HTTP2 ...

Reproducer:
- use a frontend with no backend (503 error from haproxy)
- raise "timeout server" and "timeout client" to 30 seconds
- in Chrome refresh the error site 4-5 times

This is not limited to people refreshing error sites 5 times; I am hitting
this issue often when using HTTP Basic Auth ("401 Unauthorized" triggers a
close).

For another day, we may verify whether all those errors really need to
close the connection.

---
 src/mux_h2.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/mux_h2.c b/src/mux_h2.c
index eb8dd0e..87e7401 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -2737,8 +2737,6 @@ static int h2s_frt_make_resp_headers(struct h2s *h2s, 
struct buffer *buf)
if (isteq(list[hdr].n, ist("connection"))) {
if (!(h2c->flags & 
(H2_CF_GOAWAY_SENT|H2_CF_GOAWAY_FAILED)) &&
word_match(list[hdr].v.ptr, list[hdr].v.len, 
"close", 5)) {
-   if (h2c->last_sid < 0)
-   h2c->last_sid = (1U << 31) - 1;
if (h2c_send_goaway_error(h2c, h2s) <= 0) {
ret = 0;
goto end;
-- 
2.7.4



haproxy 1.8-rc4: cache and HTTP/1.1 response for HTTP/1.0 request

2017-11-21 Thread Cyril Bonté

Hi Willy and William,

I ran some tests with the cache filter.

In http_action_store_cache(), the code indicates that only HTTP/1.1 is 
cached. This explains why I failed on my first tests with apachebench :)
The protocol version is checked on the request side. Can't we rely on 
the response side instead ?


Btw, here are some first results for those who are interested :

* WITHOUT cache

- ab -n10 -c100 http://localhost/image.gif
Time taken for tests:   59.127 seconds
Requests per second:1691.27 [#/sec] (mean)
Time per request:   59.127 [ms] (mean)
Time per request:   0.591 [ms] (mean, across all concurrent requests)
Transfer rate:  1344.43 [Kbytes/sec] received

- h2load -n10 -c100 https://localhost/image.gif
finished in 60.06s, 1664.91 req/s, 1.11MB/s

* WITH cache :

- ab -n10 -c100 http://localhost/image.gif
Same results as before, but once patched to rely on the response, we get 
more interesting results :

Time taken for tests:   1.801 seconds
Requests per second:55539.79 [#/sec] (mean)
Time per request:   1.801 [ms] (mean)
Time per request:   0.018 [ms] (mean, across all concurrent requests)
Transfer rate:  44149.79 [Kbytes/sec] received

- h2load -n10 -c100 https://localhost/image.gif
finished in 1.49s, 67210.04 req/s, 44.80MB/s

for some details :
- image.gif = 510 bytes
- haproxy runs locally (the backend is in the same network) with this 
configuration :

global
nbthread 4
tune.ssl.default-dh-param 2048
log /dev/log local7 info err
stats socket /tmp/proxy.socket level admin

defaults
timeout client 300s
timeout server 300s
timeout connect 5s
timeout http-keep-alive 5s

listen proxy
mode http

bind :80
bind :443 ssl crt localhost.pem alpn h2,http/1.1

option httplog

http-response cache-store proxy
http-request cache-use proxy

server real backend:80

cache proxy
total-max-size 4


--
Cyril Bonté



Re: [PATCH v2] BUG/MINOR: systemd: ignore daemon mode

2017-11-21 Thread Willy Tarreau
On Tue, Nov 21, 2017 at 12:39:34PM +0100, Lukas Tribus wrote:
> Since we switched to notify mode in the systemd unit file in commit
> d6942c8, haproxy won't start if the daemon keyword is present in the
> configuration.
> 
> This change makes sure that haproxy remains in foreground when using
> systemd mode and adds a note in the documentation.

Now applied, thank you!

Willy



Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner

2017-11-21 Thread PiBa-NL

Hi William,

I was intending to use the new feature to pass open sockets to the next 
haproxy process.
And thought that master-worker is a 'requirement' to make that work as 
it would manage the transferal of sockets.

Now i'm thinking thats not actually how its working at all..
I could 'manually' pass the  -x /haproxy.socket to the next process and 
make it take over the sockets that way i guess.? (How does this combine 
with nbproc>1 and multiple stats sockets bound to separate processes?)


Though i can imagine a future where the master would maybe provide some 
aggregated stats and management socket to perform server status changes.

Perhaps i should step away from using master-worker for the moment.

However the -W -D combination doesn't seem to (fully) work as i 
expected, responses below..


Op 21-11-2017 om 2:59 schreef William Lallemand:

the master-worker was designed in a way to
replace the systemd-wrapper, and the systemd way to run a daemon is to keep it
on the foreground and pipe it to systemd so it can catch the errors on the
standard ouput.

However, it was also designed for normal people who wants to daemonize,
so you can combine -W with -D which will daemonize the master.



I'm not sure of getting the issue there, the errors are still displayed upon
startup like in any other haproxy mode, there is really no change here.
I assume your only problem with your script is the daemonize that you can
achieve by combining -W and -D.

I would prefer to do both 'catch' startup errors and daemonize haproxy.
In my previous mail i'm starting it with -D, and the -W is equivalent of 
the global master-worker option in the config, so it 'should' daemonize 
right?
But it did not(properly?), ive just tried with both startup parameters 
-D -W the result is the same.
The master with pid 3061 is running under the system /sbin/init pid 1, 
however the pid 2926 also keeps running i would want/expect 2926 to exit 
when startup is complete.


I just also noted that the 2926 actually becomes a 'zombie'.?. that cant 
be good right?





A kill -1 itself wont tell if a new configured bind cannot find the
interface address to bind to? and a -c before hand wont find such a problem.

Upon a reload (SIGUSR2 on the master) the master will try to parse the
configuration again and start the listeners. If it fails, the master will
reexec itself in a wait() mode, and won't kill the previous workers, the
parsing/bind error should be displayed on the standard output of the master.
I think i saw it exit but cannot reproduce it anymore with the scenario 
of a wrong ip in the bind.. I might have issued a wrong signal there 
when i tried (a USR1 instead of a USR2 or something. ).

It seems to work properly the way you describe.. (when properly demonized..)
Sorry for the noise on this part..

The end result that nothing is running and the error causing that
however should be 'caught' somehow for logging.?. should haproxy itself
log it to syslogs? but how will the startup script know to notify the
user of a failure?

Well, the master don't do syslog, because there might be no syslog in your
configuration. I think you should try the systemd way and log the standard
output.
I don't want to use systemd, but i do want to log standard output, at 
least during initial startup..

Would it be possible when starting haproxy with -sf  it would tell
if the (original?) master was successful in reloading the config /
starting new workers or how should this be done?

That may be badly documented but you are not supposed to use -sf with the 
master worker,
you just have to send the -USR2 signal to the master and it will parse again the
configuration, launch new workers and kill smoothly the previous ones.

Unfortunately signals are asynchronous, and we don't have a way yet to return
a bad exit code upon reload. But we might implement a synchronous
configuration notification in the future, using the admin socket for example.
Being able to signaling the master to reload over a admin socket and 
getting 'feedback' about its results would likely also solve my 'reload 
feedback' problem.

Lets consider that a feature request :).
Though maybe i shouldn't be using master-worker at all for the moment..

Currently a whole new set of master-worker processes seems to be take over..

Well, I supposed that's because you launched a new master-worker with -sf, it's
not supposed to be used that way but it should work too if you don't mind
having a new PID.
I kinda expected this to indeed be 'as intended' -sf will fully replace 
the old processes.


Thanks for your reply.

Regards,
PiBa-NL / Pieter



Re: [PATCH v2] BUG/MINOR: systemd: ignore daemon mode

2017-11-21 Thread Tim Düsterhus
Lukas,

Am 21.11.2017 um 12:39 schrieb Lukas Tribus:
> This change makes sure that haproxy remains in foreground when using
> systemd mode and adds a note in the documentation.
> ---
>  doc/configuration.txt | 3 ++-
>  src/haproxy.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> *snip*

I agree that this is a good solution to the problem. I tested your patch
and can confirm it is working fine.

Best regards
Tim Düsterhus



Re: 4xx statistics made useless through health checks?

2017-11-21 Thread Georg Faerber
Hi,

On 17-11-21 14:20:39, Daniel Schneller wrote:
> > On 21. Nov. 2017, at 14:08, Lukas Tribus  wrote:
> > [...]
> > Instead of hiding specific errors counters, why not send an actual
> > HTTP request that triggers a 200 OK response? So health checking is
> > not exempt from the statistics and only generates error statistics
> > when actual errors occur?
> 
> Good point. I wanted to avoid, however, having these “high level”
> health checks from the many many sidecars being routed through to the
> actual backends.  Instead, I considered it enough to “only” check if
> the central haproxy is available. In case it is, the sidecars rely on
> it doing the actual health checks of the backends and responding with
> 503 or similar, when all backends for a particular request happen to
> be down.
> 
> However, your idea and a little more Googling led me to this Github
> repo
> https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy
> where they configure a dedicated “health check frontend” (albeit in
> their case to work around an AWS/ELB limitation re/ PROXY protocol). I
> think I will adapt this and configure the sidecars to health check on
> a dedicated port like this.

monitor-uri [1] might help as well with this task.

Personally, I'm using Nginx behind HAProxy (on the same machine), a
dedicated domain routed from HAProxy to Nginx, and just one location in
Nginx which retuns '200 OK' to accomplish this. Of course, HAProxy could
work and Nginx could fail at the same time, but I've never encountered
this, especially with such a limited, static Nginx config.

Cheers,
Georg


[1] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#monitor-uri


signature.asc
Description: Digital signature


Re: 4xx statistics made useless through health checks?

2017-11-21 Thread Daniel Schneller
Hi Pieter, 

>> Good point. I wanted to avoid, however, having these “high level” health 
>> checks from the many many sidecars being routed through to the actual 
>> backends.
>> Instead, I considered it enough to “only” check if the central haproxy is 
>> available. In case it is, the sidecars rely on it doing the actual health 
>> checks of the backends and responding with 503 or similar, when all backends 
>> for a particular request happen to be down.
> Maybe monitor-uri perhaps together with 'monitor fail' could help ?: 
> http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#4.2-monitor-uri
> It says it wont log or forward the request.. not sure but maybe stats will 
> also skip it.

Yes, that’s exactly what’s shown in that linked repo. Thanks for chiming in :)

> Regards,
> PiBa-NL / Pieter
> 

-- 
Daniel Schneller
Principal Cloud Engineer
 
CenterDevice GmbH  | Hochstraße 11
   | 42697 Solingen
tel: +49 1754155711| Deutschland
daniel.schnel...@centerdevice.de   | www.centerdevice.de

Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina,
Michael Rosbach, Handelsregister-Nr.: HRB 18655,
HR-Gericht: Bonn, USt-IdNr.: DE-815299431


Re: 4xx statistics made useless through health checks?

2017-11-21 Thread PiBa-NL

Hi Daniel,

Op 21-11-2017 om 14:20 schreef Daniel Schneller:

On 21. Nov. 2017, at 14:08, Lukas Tribus  wrote:
[...]
Instead of hiding specific errors counters, why not send an actual
HTTP request that triggers a 200 OK response? So health checking is
not exempt from the statistics and only generates error statistics
when actual errors occur?

Good point. I wanted to avoid, however, having these “high level” health checks 
from the many many sidecars being routed through to the actual backends.
Instead, I considered it enough to “only” check if the central haproxy is 
available. In case it is, the sidecars rely on it doing the actual health 
checks of the backends and responding with 503 or similar, when all backends 
for a particular request happen to be down.
Maybe monitor-uri perhaps together with 'monitor fail' could help ?: 
http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#4.2-monitor-uri
It says it wont log or forward the request.. not sure but maybe stats 
will also skip it.


However, your idea and a little more Googling led me to this Github repo 
https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy 
where they configure a dedicated “health check frontend” (albeit in their case 
to work around an AWS/ELB limitation re/ PROXY protocol). I think I will adapt 
this and configure the sidecars to health check on a dedicated port like this.

I’ll let you know how it goes.

Thanks a lot for your thoughts, so far :)

Daniel


Regards,
PiBa-NL / Pieter




Re: 4xx statistics made useless through health checks?

2017-11-21 Thread Daniel Schneller
> On 21. Nov. 2017, at 14:08, Lukas Tribus  wrote:
> [...]
> Instead of hiding specific errors counters, why not send an actual
> HTTP request that triggers a 200 OK response? So health checking is
> not exempt from the statistics and only generates error statistics
> when actual errors occur?

Good point. I wanted to avoid, however, having these “high level” health checks 
from the many many sidecars being routed through to the actual backends.
Instead, I considered it enough to “only” check if the central haproxy is 
available. In case it is, the sidecars rely on it doing the actual health 
checks of the backends and responding with 503 or similar, when all backends 
for a particular request happen to be down.

However, your idea and a little more Googling led me to this Github repo 
https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy 
where they configure a dedicated “health check frontend” (albeit in their case 
to work around an AWS/ELB limitation re/ PROXY protocol). I think I will adapt 
this and configure the sidecars to health check on a dedicated port like this.

I’ll let you know how it goes.

Thanks a lot for your thoughts, so far :)

Daniel

-- 
Daniel Schneller
Principal Cloud Engineer
 
CenterDevice GmbH  | Hochstraße 11
   | 42697 Solingen
tel: +49 1754155711| Deutschland
daniel.schnel...@centerdevice.de   | www.centerdevice.de

Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina,
Michael Rosbach, Handelsregister-Nr.: HRB 18655,
HR-Gericht: Bonn, USt-IdNr.: DE-815299431



Re: 4xx statistics made useless through health checks?

2017-11-21 Thread Lukas Tribus
Hello,


2017-11-21 13:54 GMT+01:00 Daniel Schneller :
> However, I still wonder if there is a good way to discern these from
> “actual"bad requests in the stats, so that we can rely on the error counters
> to show “real” problems.
>
> Some kind of “haproxy-to-haproxy” health checking that does not spoil the
> counters?

Instead of hiding specific errors counters, why not send an actual
HTTP request that triggers a 200 OK response? So health checking is
not exempt from the statistics and only generates error statistics
when actual errors occur?


Lukas



Re: 4xx statistics made useless through health checks?

2017-11-21 Thread Daniel Schneller
Hi Lukas,

thanks — was just about to reply to myself, but you beat me to it ;)

> Yes, we have "option dontlognull" for that:
> http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4-option%20dontlognull

We can get rid of the logging via dontlognull, of course. Was about to mention 
that I had configured that in the meantime so get rid of the log spam.

However, I still wonder if there is a good way to discern these from 
“actual"bad requests in the stats, so that we can rely on the error counters to 
show “real” problems.

Some kind of “haproxy-to-haproxy” health checking that does not spoil the 
counters?

Daniel


Daniel Schneller
Principal Cloud Engineer
 
CenterDevice GmbH  | Hochstraße 11
   | 42697 Solingen
tel: +49 1754155711| Deutschland
daniel.schnel...@centerdevice.de   | www.centerdevice.de

Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina,
Michael Rosbach, Handelsregister-Nr.: HRB 18655,
HR-Gericht: Bonn, USt-IdNr.: DE-815299431



Re: 4xx statistics made useless through health checks?

2017-11-21 Thread Lukas Tribus
Hallo Daniel,


2017-11-21 10:08 GMT+01:00 Daniel Schneller :
> However, I see lots of 4xx errors counted on the central LBs. I have tracked
> those down to being caused by the health checks of all the sidecars,
> checking in every few seconds to see if their backends are healthy.
>
> The log shows this:
>
> Nov 21 09:58:08 loadbalancer-01 haproxy-internal[4053]: 10.205.100.53:54831
> [21/Nov/2017:09:58:08.277] api_internal~ api_internal/ -1/-1/-1/-1/0
> 400 0 - - CR-- 43/1/0/0/3 0/0 {} ""
> [...]
> Is there anything I can do to “make them both happy”?
> Any suggestions would be much appreciated.

Yes, we have "option dontlognull" for that:
http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4-option%20dontlognull



Regards,
Lukas



incorrect search for a server's crt in ca-base

2017-11-21 Thread Alexander Lebedev
Hi!

In 1.7.9 with crt-base and ca-base in global haproxy try to search crt for
server in ca-base. In bind all work as expected. If I specify the full path
for crt in server - it's ok.

# cat /tmp/haproxy.cfg
global
  ca-base  /etc/haproxy/ca-base
  crt-base  /etc/haproxy/crt-base

defaults
  mode  http
  timeout  connect 5s
  timeout  client 120s
  timeout  server 120s

frontend cs
  bind *:4000 ssl crt test_combo.pem ca-file cabundle.pem verify required
  default_backend cs

backend cs
  server s1 1.2.3.4:4000 ssl crt test_combo.pem ca-file cabundle.pem verify
required

# haproxy -c -f /tmp/haproxy.cfg
[ALERT] 324/193009 (30768) : config : backend 'cs', server 's1': unable to
load SSL private key from PEM file '/etc/haproxy/ca-base/test_combo.pem'.
[ALERT] 324/193009 (30768) : Fatal errors found in configuration.

# haproxy -vv
HA-Proxy version 1.7.9 2017/08/18
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv
  OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1
USE_LUA=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe


[PATCH v2] BUG/MINOR: systemd: ignore daemon mode

2017-11-21 Thread Lukas Tribus
Since we switched to notify mode in the systemd unit file in commit
d6942c8, haproxy won't start if the daemon keyword is present in the
configuration.

This change makes sure that haproxy remains in foreground when using
systemd mode and adds a note in the documentation.
---
 doc/configuration.txt | 3 ++-
 src/haproxy.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index bb255be..cc397e1 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -680,7 +680,8 @@ crt-base 
 daemon
   Makes the process fork into background. This is the recommended mode of
   operation. It is equivalent to the command line "-D" argument. It can be
-  disabled by the command line "-db" argument.
+  disabled by the command line "-db" argument. This option is ignored in
+  systemd mode.
 
 deviceatlas-json-file 
   Sets the path of the DeviceAtlas JSON data file to be loaded by the API.
diff --git a/src/haproxy.c b/src/haproxy.c
index b39a95f..4596dbd 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -1363,7 +1363,7 @@ static void init(int argc, char **argv)
else if (*flag == 'D')
arg_mode |= MODE_DAEMON;
else if (*flag == 'W' && flag[1] == 's') {
-   arg_mode |= MODE_MWORKER;
+   arg_mode |= MODE_MWORKER | MODE_FOREGROUND;
 #if defined(USE_SYSTEMD)
global.tune.options |= GTUNE_USE_SYSTEMD;
 #else
-- 
2.7.4



Re: [PATCH] BUG/MINOR: systemd: ignore daemon mode

2017-11-21 Thread Willy Tarreau
On Tue, Nov 21, 2017 at 11:45:29AM +0100, Lukas Tribus wrote:
> Since we switched to notify mode in the systemd unit file in commit
> d6942c8, haproxy won't start if the daemon keyword is present in the
> configuration.
> 
> Update the unit file with -db to disable background mode in all
> circumstances and add a note in the documentation.

Don't you want to try with this instead (sorry for the bold copy-paste) ?
It's also more in line with your doc update.

diff --git a/src/haproxy.c b/src/haproxy.c
index b39a95f..4596dbd 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -1363,7 +1363,7 @@ static void init(int argc, char **argv)
else if (*flag == 'D')
arg_mode |= MODE_DAEMON;
else if (*flag == 'W' && flag[1] == 's') {
-   arg_mode |= MODE_MWORKER;
+   arg_mode |= MODE_MWORKER | MODE_FOREGROUND;
 #if defined(USE_SYSTEMD)
global.tune.options |= GTUNE_USE_SYSTEMD;
 #else

Thanks,
Willy



[PATCH] BUG/MINOR: systemd: ignore daemon mode

2017-11-21 Thread Lukas Tribus
Since we switched to notify mode in the systemd unit file in commit
d6942c8, haproxy won't start if the daemon keyword is present in the
configuration.

Update the unit file with -db to disable background mode in all
circumstances and add a note in the documentation.
---
 contrib/systemd/haproxy.service.in | 2 +-
 doc/configuration.txt  | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/contrib/systemd/haproxy.service.in 
b/contrib/systemd/haproxy.service.in
index edbd4c2..d870db8 100644
--- a/contrib/systemd/haproxy.service.in
+++ b/contrib/systemd/haproxy.service.in
@@ -7,7 +7,7 @@ After=network.target
 # socket if you want seamless reloads.
 Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
 ExecStartPre=@SBINDIR@/haproxy -f $CONFIG -c -q
-ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE
+ExecStart=@SBINDIR@/haproxy -Ws -db -f $CONFIG -p $PIDFILE
 ExecReload=@SBINDIR@/haproxy -f $CONFIG -c -q
 ExecReload=/bin/kill -USR2 $MAINPID
 KillMode=mixed
diff --git a/doc/configuration.txt b/doc/configuration.txt
index bb255be..cc397e1 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -680,7 +680,8 @@ crt-base 
 daemon
   Makes the process fork into background. This is the recommended mode of
   operation. It is equivalent to the command line "-D" argument. It can be
-  disabled by the command line "-db" argument.
+  disabled by the command line "-db" argument. This option is ignored in
+  systemd mode.
 
 deviceatlas-json-file 
   Sets the path of the DeviceAtlas JSON data file to be loaded by the API.
-- 
2.7.4



Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support

2017-11-21 Thread Lukas Tribus
Hello,


2017-11-21 11:18 GMT+01:00 Willy Tarreau :
>> That's not it, the hold-off timer is only a consequence of this
>> problem.
>
> OK but if it's really 100ms, it can be a problem for people loading GeoIP
> maps of millions of entries, or large configs (the largest I saw had 30
> backends and 60 servers and took several seconds to process).

The default timeout to start a daemon is 90 seconds if I understand correctly:
https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultTimeoutStartSec=

The 100ms "timeout" limits the restart interval itself, so those
defaults should be sane.



Regards,
Lukas



Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support

2017-11-21 Thread William Lallemand
On Tue, Nov 21, 2017 at 11:28:55AM +0100, Willy Tarreau wrote:
> On Tue, Nov 21, 2017 at 11:19:41AM +0100, William Lallemand wrote:
> > I don't like the idea of the "daemon" keyword being ignored,
> 
> We already do that with -D which ignores "debug" for example. The command
> line purposely has precedence over the config.
> 
> > however, we could
> > exit with an error when we try to start with -Ws + -D or daemon.
> 
> Errors only cause gray hair to users, especially when they concern stuff
> that can be automatically recovered from. We could definitely consider
> that -Ws automatically implies MODE_FOREGROUND which will do exactly
> what -db does, but without having to think about it. After all, "-Ws"
> is "master-worker mode with systemd awareness" so it makes sense not
> to background in this case.
> 

That's right, I agree.

-- 
William Lallemand



Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support

2017-11-21 Thread William Lallemand
On Tue, Nov 21, 2017 at 11:12:45AM +0100, Lukas Tribus wrote:
> Hello,
> 
> 2017-11-21 8:39 GMT+01:00 William Lallemand :
> 
> That's not it, the hold-off timer is only a consequence of this
> problem. I believe the notification does not work in my case, which is
> why for systemd haproxy did not start correctly which is why systemd
> continues to restart haproxy.
> I found the root of the issue: the "daemon" keyword in the global
> configuration section. We need to remove "daemon" from the
> configuration for systemd to mode work correctly in notify mode.
> 
> We can override the "daemon" keyword in the configuration by passing
> -db to haproxy. Adding it to the unit file fixes the issue even if
> they keyword is in the configuration (so we pass "-Ws -db" ).
> 
> "-Ws -db" along with a note in the documentation that "daemon" may be
> ignored in systemd mode seems like a good compromise here? I will send
> an RFC patch shortly.
> 

You should never use systemd with the daemon mode, either in notify or default
mode, it's not the recommended setup.

I don't like the idea of the "daemon" keyword being ignored, however, we could
exit with an error when we try to start with -Ws + -D or daemon.

-- 
William Lallemand



Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support

2017-11-21 Thread Willy Tarreau
Hi Lukas,

On Tue, Nov 21, 2017 at 11:12:45AM +0100, Lukas Tribus wrote:
> > I suggest we configure it to a greater value per default, it can be set to
> > infinity too, but I don't like the idea.
> 
> That's not it, the hold-off timer is only a consequence of this
> problem.

OK but if it's really 100ms, it can be a problem for people loading GeoIP
maps of millions of entries, or large configs (the largest I saw had 30
backends and 60 servers and took several seconds to process).

> I believe the notification does not work in my case, which is
> why for systemd haproxy did not start correctly which is why systemd
> continues to restart haproxy.

I'm feeling less and less confident in using this mode by default :-/

> I found the root of the issue: the "daemon" keyword in the global
> configuration section. We need to remove "daemon" from the
> configuration for systemd to mode work correctly in notify mode.
> 
> We can override the "daemon" keyword in the configuration by passing
> -db to haproxy. Adding it to the unit file fixes the issue even if
> they keyword is in the configuration (so we pass "-Ws -db" ).
> 
> "-Ws -db" along with a note in the documentation that "daemon" may be
> ignored in systemd mode seems like a good compromise here?

Yes and it definitely is needed, just like we used to suggest users to
always have -D in init scripts to force backgrounding of the process
when there was no "deamon" statement in the conf.

> I will send an RFC patch shortly.

Thank you!
Willy



Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support

2017-11-21 Thread Lukas Tribus
Hello,



2017-11-21 8:39 GMT+01:00 William Lallemand :
> On Tue, Nov 21, 2017 at 07:16:19AM +0100, Willy Tarreau wrote:
>>
>> I really don't like this. My fears with becoming more systemd-friendly
>> was that we'd make users helpless when something decides not to work
>> just to annoy them, and this reports seems to confirm this feeling :-/
>>
>> Tim, have you seen this message about a "hold-off timer over" being
>> displayed at the same second as the startup message ? Isn't there a
>> timeout setting or something like this to place in the config file ?
>> And is there a way to disable it so that people with huge configs
>> are still allowed to load them ?
>>
>
> There is indeed a timeout value, which is configurable in the unit file, the
> default value is 100ms.
>
> https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec=
>
>
> There is also the start one there:
>
> https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStartSec=
>
> I suggest we configure it to a greater value per default, it can be set to 
> infinity too, but I don't like the idea.

That's not it, the hold-off timer is only a consequence of this
problem. I believe the notification does not work in my case, which is
why for systemd haproxy did not start correctly which is why systemd
continues to restart haproxy.
I found the root of the issue: the "daemon" keyword in the global
configuration section. We need to remove "daemon" from the
configuration for systemd to mode work correctly in notify mode.

We can override the "daemon" keyword in the configuration by passing
-db to haproxy. Adding it to the unit file fixes the issue even if
they keyword is in the configuration (so we pass "-Ws -db" ).

"-Ws -db" along with a note in the documentation that "daemon" may be
ignored in systemd mode seems like a good compromise here? I will send
an RFC patch shortly.



Regards,
Lukas