Re: [RFC PATCH] BUG/MINOR: h2: use valid stream id in GOAWAY
Hi Lukas, On Wed, Nov 22, 2017 at 01:43:32AM +0100, Lukas Tribus wrote: > Since af1e4f5167 ("MEDIUM: h2: perform a graceful shutdown on "Connection: > close"") we send GOAWAY with last stream identifier set to 2^31-1, as per > the RFC suggestion [1]. However that is only part of what the RFC suggests > if we want to close the connection gracefully; after at least 1 RTT we > would have to send another GOAWAY with a proper and valid last stream ID. > Just the "2^31-1" GOAWAY is not enough. > > In fact this confuses Chrome and leads to a hung connection that clears > only by "timeout client" or "timeout server" (whatever strikes first). This is annoying. I'm still wondering whether it's a good thing to try to close the connection when we have a close. I've been wanting to ensure we can close a connection so that all users having "http-request deny" and similar rules can continue to use them to fight attacks, but a graceful shutdown doesn't appear 100% efficient either :-/ Also I'd like to change all the error messages to support keep-alive (at least in the announce) so that we don't close on 401 etc. But once we stop closing on any such codes, we can wonder whether it still makes sense to close when "connection: close" appears in the response :-/ Maybe we should have a "close" action on the request to explicitly close a connection, regardless of headers. We already have such an action on tcp-response to close the server connection. It can make sense. > When all 3 connections slots to the HTTP2 server are filled, Chrome is > unable to communicate with the server at all. Then at least we're not the only ones to have a bug in this chain :-) > Trivial fix (to be discussed) is to not set the last stream id to 2^31-1. We must really not do this, that's what we used to do before the patch you mentionned above. It invites the browser to *safely* retry all streams with a higher ID than the one mentionned there, but in practice some clients are not able to retry so they report errors as their pending requests are simply lost. I suspect we could act differently : when the last stream has been closed, we currently re-arm a timer on the H2 connection to close it if it's idle. What we could do would be to check if we've already sent a graceful shutdown and in this case arm a much shorter timer (eg: 1s) so that the final GOAWAY is sent and the connection closed. Could you please try the attached patch, which tries to do this ? With that said, there is an issue with Chrome if it refrains from using these connections an refrains from creating a new one as well, because this means that it can be limited to a request rate of only 3 per RTT if each connection handles a single request. Thanks, Willy diff --git a/src/mux_h2.c b/src/mux_h2.c index eb8dd0e..4d71c95 100644 --- a/src/mux_h2.c +++ b/src/mux_h2.c @@ -2134,7 +2134,7 @@ static int h2_wake(struct connection *conn) if (h2c->task) { if (eb_is_empty(>streams_by_id)) { - h2c->task->expire = tick_add(now_ms, h2c->timeout); + h2c->task->expire = tick_add(now_ms, h2c->last_sid < 0 ? h2c->timeout : MS_TO_TICKS(1000)); task_queue(h2c->task); } else @@ -2166,6 +2166,7 @@ static struct task *h2_timeout_task(struct task *t) } /* try to send but no need to insist */ + h2c->last_sid = h2c->max_id; if (h2c_send_goaway_error(h2c, NULL) <= 0) h2c->flags |= H2_CF_GOAWAY_FAILED; @@ -2298,7 +2299,7 @@ static void h2_detach(struct conn_stream *cs) } else if (h2c->task) { if (eb_is_empty(>streams_by_id)) { - h2c->task->expire = tick_add(now_ms, h2c->timeout); + h2c->task->expire = tick_add(now_ms, h2c->last_sid < 0 ? h2c->timeout : MS_TO_TICKS(1000)); task_queue(h2c->task); } else
[RFC PATCH] BUG/MINOR: h2: use valid stream id in GOAWAY
Since af1e4f5167 ("MEDIUM: h2: perform a graceful shutdown on "Connection: close"") we send GOAWAY with last stream identifier set to 2^31-1, as per the RFC suggestion [1]. However that is only part of what the RFC suggests if we want to close the connection gracefully; after at least 1 RTT we would have to send another GOAWAY with a proper and valid last stream ID. Just the "2^31-1" GOAWAY is not enough. In fact this confuses Chrome and leads to a hung connection that clears only by "timeout client" or "timeout server" (whatever strikes first). When all 3 connections slots to the HTTP2 server are filled, Chrome is unable to communicate with the server at all. Trivial fix (to be discussed) is to not set the last stream id to 2^31-1. [1] https://tools.ietf.org/html/rfc7540#section-6.8 --- RFC patch: needs discussion and careful review; I just got my feet wet with HTTP2 ... Reproducer: - use a frontend with no backend (503 error from haproxy) - raise "timeout server" and "timeout client" to 30 seconds - in Chrome refresh the error site 4-5 times This is not limited to people refreshing error sites 5 times; I am hitting this issue often when using HTTP Basic Auth ("401 Unauthorized" triggers a close). For another day, we may verify whether all those errors really need to close the connection. --- src/mux_h2.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/mux_h2.c b/src/mux_h2.c index eb8dd0e..87e7401 100644 --- a/src/mux_h2.c +++ b/src/mux_h2.c @@ -2737,8 +2737,6 @@ static int h2s_frt_make_resp_headers(struct h2s *h2s, struct buffer *buf) if (isteq(list[hdr].n, ist("connection"))) { if (!(h2c->flags & (H2_CF_GOAWAY_SENT|H2_CF_GOAWAY_FAILED)) && word_match(list[hdr].v.ptr, list[hdr].v.len, "close", 5)) { - if (h2c->last_sid < 0) - h2c->last_sid = (1U << 31) - 1; if (h2c_send_goaway_error(h2c, h2s) <= 0) { ret = 0; goto end; -- 2.7.4
haproxy 1.8-rc4: cache and HTTP/1.1 response for HTTP/1.0 request
Hi Willy and William, I ran some tests with the cache filter. In http_action_store_cache(), the code indicates that only HTTP/1.1 is cached. This explains why I failed on my first tests with apachebench :) The protocol version is checked on the request side. Can't we rely on the response side instead ? Btw, here are some first results for those who are interested : * WITHOUT cache - ab -n10 -c100 http://localhost/image.gif Time taken for tests: 59.127 seconds Requests per second:1691.27 [#/sec] (mean) Time per request: 59.127 [ms] (mean) Time per request: 0.591 [ms] (mean, across all concurrent requests) Transfer rate: 1344.43 [Kbytes/sec] received - h2load -n10 -c100 https://localhost/image.gif finished in 60.06s, 1664.91 req/s, 1.11MB/s * WITH cache : - ab -n10 -c100 http://localhost/image.gif Same results as before, but once patched to rely on the response, we get more interesting results : Time taken for tests: 1.801 seconds Requests per second:55539.79 [#/sec] (mean) Time per request: 1.801 [ms] (mean) Time per request: 0.018 [ms] (mean, across all concurrent requests) Transfer rate: 44149.79 [Kbytes/sec] received - h2load -n10 -c100 https://localhost/image.gif finished in 1.49s, 67210.04 req/s, 44.80MB/s for some details : - image.gif = 510 bytes - haproxy runs locally (the backend is in the same network) with this configuration : global nbthread 4 tune.ssl.default-dh-param 2048 log /dev/log local7 info err stats socket /tmp/proxy.socket level admin defaults timeout client 300s timeout server 300s timeout connect 5s timeout http-keep-alive 5s listen proxy mode http bind :80 bind :443 ssl crt localhost.pem alpn h2,http/1.1 option httplog http-response cache-store proxy http-request cache-use proxy server real backend:80 cache proxy total-max-size 4 -- Cyril Bonté
Re: [PATCH v2] BUG/MINOR: systemd: ignore daemon mode
On Tue, Nov 21, 2017 at 12:39:34PM +0100, Lukas Tribus wrote: > Since we switched to notify mode in the systemd unit file in commit > d6942c8, haproxy won't start if the daemon keyword is present in the > configuration. > > This change makes sure that haproxy remains in foreground when using > systemd mode and adds a note in the documentation. Now applied, thank you! Willy
Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner
Hi William, I was intending to use the new feature to pass open sockets to the next haproxy process. And thought that master-worker is a 'requirement' to make that work as it would manage the transferal of sockets. Now i'm thinking thats not actually how its working at all.. I could 'manually' pass the -x /haproxy.socket to the next process and make it take over the sockets that way i guess.? (How does this combine with nbproc>1 and multiple stats sockets bound to separate processes?) Though i can imagine a future where the master would maybe provide some aggregated stats and management socket to perform server status changes. Perhaps i should step away from using master-worker for the moment. However the -W -D combination doesn't seem to (fully) work as i expected, responses below.. Op 21-11-2017 om 2:59 schreef William Lallemand: the master-worker was designed in a way to replace the systemd-wrapper, and the systemd way to run a daemon is to keep it on the foreground and pipe it to systemd so it can catch the errors on the standard ouput. However, it was also designed for normal people who wants to daemonize, so you can combine -W with -D which will daemonize the master. I'm not sure of getting the issue there, the errors are still displayed upon startup like in any other haproxy mode, there is really no change here. I assume your only problem with your script is the daemonize that you can achieve by combining -W and -D. I would prefer to do both 'catch' startup errors and daemonize haproxy. In my previous mail i'm starting it with -D, and the -W is equivalent of the global master-worker option in the config, so it 'should' daemonize right? But it did not(properly?), ive just tried with both startup parameters -D -W the result is the same. The master with pid 3061 is running under the system /sbin/init pid 1, however the pid 2926 also keeps running i would want/expect 2926 to exit when startup is complete. I just also noted that the 2926 actually becomes a 'zombie'.?. that cant be good right? A kill -1 itself wont tell if a new configured bind cannot find the interface address to bind to? and a -c before hand wont find such a problem. Upon a reload (SIGUSR2 on the master) the master will try to parse the configuration again and start the listeners. If it fails, the master will reexec itself in a wait() mode, and won't kill the previous workers, the parsing/bind error should be displayed on the standard output of the master. I think i saw it exit but cannot reproduce it anymore with the scenario of a wrong ip in the bind.. I might have issued a wrong signal there when i tried (a USR1 instead of a USR2 or something. ). It seems to work properly the way you describe.. (when properly demonized..) Sorry for the noise on this part.. The end result that nothing is running and the error causing that however should be 'caught' somehow for logging.?. should haproxy itself log it to syslogs? but how will the startup script know to notify the user of a failure? Well, the master don't do syslog, because there might be no syslog in your configuration. I think you should try the systemd way and log the standard output. I don't want to use systemd, but i do want to log standard output, at least during initial startup.. Would it be possible when starting haproxy with -sf it would tell if the (original?) master was successful in reloading the config / starting new workers or how should this be done? That may be badly documented but you are not supposed to use -sf with the master worker, you just have to send the -USR2 signal to the master and it will parse again the configuration, launch new workers and kill smoothly the previous ones. Unfortunately signals are asynchronous, and we don't have a way yet to return a bad exit code upon reload. But we might implement a synchronous configuration notification in the future, using the admin socket for example. Being able to signaling the master to reload over a admin socket and getting 'feedback' about its results would likely also solve my 'reload feedback' problem. Lets consider that a feature request :). Though maybe i shouldn't be using master-worker at all for the moment.. Currently a whole new set of master-worker processes seems to be take over.. Well, I supposed that's because you launched a new master-worker with -sf, it's not supposed to be used that way but it should work too if you don't mind having a new PID. I kinda expected this to indeed be 'as intended' -sf will fully replace the old processes. Thanks for your reply. Regards, PiBa-NL / Pieter
Re: [PATCH v2] BUG/MINOR: systemd: ignore daemon mode
Lukas, Am 21.11.2017 um 12:39 schrieb Lukas Tribus: > This change makes sure that haproxy remains in foreground when using > systemd mode and adds a note in the documentation. > --- > doc/configuration.txt | 3 ++- > src/haproxy.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > *snip* I agree that this is a good solution to the problem. I tested your patch and can confirm it is working fine. Best regards Tim Düsterhus
Re: 4xx statistics made useless through health checks?
Hi, On 17-11-21 14:20:39, Daniel Schneller wrote: > > On 21. Nov. 2017, at 14:08, Lukas Tribuswrote: > > [...] > > Instead of hiding specific errors counters, why not send an actual > > HTTP request that triggers a 200 OK response? So health checking is > > not exempt from the statistics and only generates error statistics > > when actual errors occur? > > Good point. I wanted to avoid, however, having these “high level” > health checks from the many many sidecars being routed through to the > actual backends. Instead, I considered it enough to “only” check if > the central haproxy is available. In case it is, the sidecars rely on > it doing the actual health checks of the backends and responding with > 503 or similar, when all backends for a particular request happen to > be down. > > However, your idea and a little more Googling led me to this Github > repo > https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy > where they configure a dedicated “health check frontend” (albeit in > their case to work around an AWS/ELB limitation re/ PROXY protocol). I > think I will adapt this and configure the sidecars to health check on > a dedicated port like this. monitor-uri [1] might help as well with this task. Personally, I'm using Nginx behind HAProxy (on the same machine), a dedicated domain routed from HAProxy to Nginx, and just one location in Nginx which retuns '200 OK' to accomplish this. Of course, HAProxy could work and Nginx could fail at the same time, but I've never encountered this, especially with such a limited, static Nginx config. Cheers, Georg [1] https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#monitor-uri signature.asc Description: Digital signature
Re: 4xx statistics made useless through health checks?
Hi Pieter, >> Good point. I wanted to avoid, however, having these “high level” health >> checks from the many many sidecars being routed through to the actual >> backends. >> Instead, I considered it enough to “only” check if the central haproxy is >> available. In case it is, the sidecars rely on it doing the actual health >> checks of the backends and responding with 503 or similar, when all backends >> for a particular request happen to be down. > Maybe monitor-uri perhaps together with 'monitor fail' could help ?: > http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#4.2-monitor-uri > It says it wont log or forward the request.. not sure but maybe stats will > also skip it. Yes, that’s exactly what’s shown in that linked repo. Thanks for chiming in :) > Regards, > PiBa-NL / Pieter > -- Daniel Schneller Principal Cloud Engineer CenterDevice GmbH | Hochstraße 11 | 42697 Solingen tel: +49 1754155711| Deutschland daniel.schnel...@centerdevice.de | www.centerdevice.de Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431
Re: 4xx statistics made useless through health checks?
Hi Daniel, Op 21-11-2017 om 14:20 schreef Daniel Schneller: On 21. Nov. 2017, at 14:08, Lukas Tribuswrote: [...] Instead of hiding specific errors counters, why not send an actual HTTP request that triggers a 200 OK response? So health checking is not exempt from the statistics and only generates error statistics when actual errors occur? Good point. I wanted to avoid, however, having these “high level” health checks from the many many sidecars being routed through to the actual backends. Instead, I considered it enough to “only” check if the central haproxy is available. In case it is, the sidecars rely on it doing the actual health checks of the backends and responding with 503 or similar, when all backends for a particular request happen to be down. Maybe monitor-uri perhaps together with 'monitor fail' could help ?: http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#4.2-monitor-uri It says it wont log or forward the request.. not sure but maybe stats will also skip it. However, your idea and a little more Googling led me to this Github repo https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy where they configure a dedicated “health check frontend” (albeit in their case to work around an AWS/ELB limitation re/ PROXY protocol). I think I will adapt this and configure the sidecars to health check on a dedicated port like this. I’ll let you know how it goes. Thanks a lot for your thoughts, so far :) Daniel Regards, PiBa-NL / Pieter
Re: 4xx statistics made useless through health checks?
> On 21. Nov. 2017, at 14:08, Lukas Tribuswrote: > [...] > Instead of hiding specific errors counters, why not send an actual > HTTP request that triggers a 200 OK response? So health checking is > not exempt from the statistics and only generates error statistics > when actual errors occur? Good point. I wanted to avoid, however, having these “high level” health checks from the many many sidecars being routed through to the actual backends. Instead, I considered it enough to “only” check if the central haproxy is available. In case it is, the sidecars rely on it doing the actual health checks of the backends and responding with 503 or similar, when all backends for a particular request happen to be down. However, your idea and a little more Googling led me to this Github repo https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy where they configure a dedicated “health check frontend” (albeit in their case to work around an AWS/ELB limitation re/ PROXY protocol). I think I will adapt this and configure the sidecars to health check on a dedicated port like this. I’ll let you know how it goes. Thanks a lot for your thoughts, so far :) Daniel -- Daniel Schneller Principal Cloud Engineer CenterDevice GmbH | Hochstraße 11 | 42697 Solingen tel: +49 1754155711| Deutschland daniel.schnel...@centerdevice.de | www.centerdevice.de Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431
Re: 4xx statistics made useless through health checks?
Hello, 2017-11-21 13:54 GMT+01:00 Daniel Schneller: > However, I still wonder if there is a good way to discern these from > “actual"bad requests in the stats, so that we can rely on the error counters > to show “real” problems. > > Some kind of “haproxy-to-haproxy” health checking that does not spoil the > counters? Instead of hiding specific errors counters, why not send an actual HTTP request that triggers a 200 OK response? So health checking is not exempt from the statistics and only generates error statistics when actual errors occur? Lukas
Re: 4xx statistics made useless through health checks?
Hi Lukas, thanks — was just about to reply to myself, but you beat me to it ;) > Yes, we have "option dontlognull" for that: > http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4-option%20dontlognull We can get rid of the logging via dontlognull, of course. Was about to mention that I had configured that in the meantime so get rid of the log spam. However, I still wonder if there is a good way to discern these from “actual"bad requests in the stats, so that we can rely on the error counters to show “real” problems. Some kind of “haproxy-to-haproxy” health checking that does not spoil the counters? Daniel Daniel Schneller Principal Cloud Engineer CenterDevice GmbH | Hochstraße 11 | 42697 Solingen tel: +49 1754155711| Deutschland daniel.schnel...@centerdevice.de | www.centerdevice.de Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431
Re: 4xx statistics made useless through health checks?
Hallo Daniel, 2017-11-21 10:08 GMT+01:00 Daniel Schneller: > However, I see lots of 4xx errors counted on the central LBs. I have tracked > those down to being caused by the health checks of all the sidecars, > checking in every few seconds to see if their backends are healthy. > > The log shows this: > > Nov 21 09:58:08 loadbalancer-01 haproxy-internal[4053]: 10.205.100.53:54831 > [21/Nov/2017:09:58:08.277] api_internal~ api_internal/ -1/-1/-1/-1/0 > 400 0 - - CR-- 43/1/0/0/3 0/0 {} "" > [...] > Is there anything I can do to “make them both happy”? > Any suggestions would be much appreciated. Yes, we have "option dontlognull" for that: http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4-option%20dontlognull Regards, Lukas
incorrect search for a server's crt in ca-base
Hi! In 1.7.9 with crt-base and ca-base in global haproxy try to search crt for server in ca-base. In bind all work as expected. If I specify the full path for crt in server - it's ok. # cat /tmp/haproxy.cfg global ca-base /etc/haproxy/ca-base crt-base /etc/haproxy/crt-base defaults mode http timeout connect 5s timeout client 120s timeout server 120s frontend cs bind *:4000 ssl crt test_combo.pem ca-file cabundle.pem verify required default_backend cs backend cs server s1 1.2.3.4:4000 ssl crt test_combo.pem ca-file cabundle.pem verify required # haproxy -c -f /tmp/haproxy.cfg [ALERT] 324/193009 (30768) : config : backend 'cs', server 's1': unable to load SSL private key from PEM file '/etc/haproxy/ca-base/test_combo.pem'. [ALERT] 324/193009 (30768) : Fatal errors found in configuration. # haproxy -vv HA-Proxy version 1.7.9 2017/08/18 Copyright 2000-2017 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.7 Running on zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017 Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 8.32 2012-11-30 Running on PCRE version : 8.32 2012-11-30 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with Lua version : Lua 5.3.4 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available filters : [COMP] compression [TRACE] trace [SPOE] spoe
[PATCH v2] BUG/MINOR: systemd: ignore daemon mode
Since we switched to notify mode in the systemd unit file in commit d6942c8, haproxy won't start if the daemon keyword is present in the configuration. This change makes sure that haproxy remains in foreground when using systemd mode and adds a note in the documentation. --- doc/configuration.txt | 3 ++- src/haproxy.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index bb255be..cc397e1 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -680,7 +680,8 @@ crt-base daemon Makes the process fork into background. This is the recommended mode of operation. It is equivalent to the command line "-D" argument. It can be - disabled by the command line "-db" argument. + disabled by the command line "-db" argument. This option is ignored in + systemd mode. deviceatlas-json-file Sets the path of the DeviceAtlas JSON data file to be loaded by the API. diff --git a/src/haproxy.c b/src/haproxy.c index b39a95f..4596dbd 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -1363,7 +1363,7 @@ static void init(int argc, char **argv) else if (*flag == 'D') arg_mode |= MODE_DAEMON; else if (*flag == 'W' && flag[1] == 's') { - arg_mode |= MODE_MWORKER; + arg_mode |= MODE_MWORKER | MODE_FOREGROUND; #if defined(USE_SYSTEMD) global.tune.options |= GTUNE_USE_SYSTEMD; #else -- 2.7.4
Re: [PATCH] BUG/MINOR: systemd: ignore daemon mode
On Tue, Nov 21, 2017 at 11:45:29AM +0100, Lukas Tribus wrote: > Since we switched to notify mode in the systemd unit file in commit > d6942c8, haproxy won't start if the daemon keyword is present in the > configuration. > > Update the unit file with -db to disable background mode in all > circumstances and add a note in the documentation. Don't you want to try with this instead (sorry for the bold copy-paste) ? It's also more in line with your doc update. diff --git a/src/haproxy.c b/src/haproxy.c index b39a95f..4596dbd 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -1363,7 +1363,7 @@ static void init(int argc, char **argv) else if (*flag == 'D') arg_mode |= MODE_DAEMON; else if (*flag == 'W' && flag[1] == 's') { - arg_mode |= MODE_MWORKER; + arg_mode |= MODE_MWORKER | MODE_FOREGROUND; #if defined(USE_SYSTEMD) global.tune.options |= GTUNE_USE_SYSTEMD; #else Thanks, Willy
[PATCH] BUG/MINOR: systemd: ignore daemon mode
Since we switched to notify mode in the systemd unit file in commit d6942c8, haproxy won't start if the daemon keyword is present in the configuration. Update the unit file with -db to disable background mode in all circumstances and add a note in the documentation. --- contrib/systemd/haproxy.service.in | 2 +- doc/configuration.txt | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/contrib/systemd/haproxy.service.in b/contrib/systemd/haproxy.service.in index edbd4c2..d870db8 100644 --- a/contrib/systemd/haproxy.service.in +++ b/contrib/systemd/haproxy.service.in @@ -7,7 +7,7 @@ After=network.target # socket if you want seamless reloads. Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=@SBINDIR@/haproxy -f $CONFIG -c -q -ExecStart=@SBINDIR@/haproxy -Ws -f $CONFIG -p $PIDFILE +ExecStart=@SBINDIR@/haproxy -Ws -db -f $CONFIG -p $PIDFILE ExecReload=@SBINDIR@/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed diff --git a/doc/configuration.txt b/doc/configuration.txt index bb255be..cc397e1 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -680,7 +680,8 @@ crt-base daemon Makes the process fork into background. This is the recommended mode of operation. It is equivalent to the command line "-D" argument. It can be - disabled by the command line "-db" argument. + disabled by the command line "-db" argument. This option is ignored in + systemd mode. deviceatlas-json-file Sets the path of the DeviceAtlas JSON data file to be loaded by the API. -- 2.7.4
Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support
Hello, 2017-11-21 11:18 GMT+01:00 Willy Tarreau: >> That's not it, the hold-off timer is only a consequence of this >> problem. > > OK but if it's really 100ms, it can be a problem for people loading GeoIP > maps of millions of entries, or large configs (the largest I saw had 30 > backends and 60 servers and took several seconds to process). The default timeout to start a daemon is 90 seconds if I understand correctly: https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultTimeoutStartSec= The 100ms "timeout" limits the restart interval itself, so those defaults should be sane. Regards, Lukas
Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support
On Tue, Nov 21, 2017 at 11:28:55AM +0100, Willy Tarreau wrote: > On Tue, Nov 21, 2017 at 11:19:41AM +0100, William Lallemand wrote: > > I don't like the idea of the "daemon" keyword being ignored, > > We already do that with -D which ignores "debug" for example. The command > line purposely has precedence over the config. > > > however, we could > > exit with an error when we try to start with -Ws + -D or daemon. > > Errors only cause gray hair to users, especially when they concern stuff > that can be automatically recovered from. We could definitely consider > that -Ws automatically implies MODE_FOREGROUND which will do exactly > what -db does, but without having to think about it. After all, "-Ws" > is "master-worker mode with systemd awareness" so it makes sense not > to background in this case. > That's right, I agree. -- William Lallemand
Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support
On Tue, Nov 21, 2017 at 11:12:45AM +0100, Lukas Tribus wrote: > Hello, > > 2017-11-21 8:39 GMT+01:00 William Lallemand: > > That's not it, the hold-off timer is only a consequence of this > problem. I believe the notification does not work in my case, which is > why for systemd haproxy did not start correctly which is why systemd > continues to restart haproxy. > I found the root of the issue: the "daemon" keyword in the global > configuration section. We need to remove "daemon" from the > configuration for systemd to mode work correctly in notify mode. > > We can override the "daemon" keyword in the configuration by passing > -db to haproxy. Adding it to the unit file fixes the issue even if > they keyword is in the configuration (so we pass "-Ws -db" ). > > "-Ws -db" along with a note in the documentation that "daemon" may be > ignored in systemd mode seems like a good compromise here? I will send > an RFC patch shortly. > You should never use systemd with the daemon mode, either in notify or default mode, it's not the recommended setup. I don't like the idea of the "daemon" keyword being ignored, however, we could exit with an error when we try to start with -Ws + -D or daemon. -- William Lallemand
Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support
Hi Lukas, On Tue, Nov 21, 2017 at 11:12:45AM +0100, Lukas Tribus wrote: > > I suggest we configure it to a greater value per default, it can be set to > > infinity too, but I don't like the idea. > > That's not it, the hold-off timer is only a consequence of this > problem. OK but if it's really 100ms, it can be a problem for people loading GeoIP maps of millions of entries, or large configs (the largest I saw had 30 backends and 60 servers and took several seconds to process). > I believe the notification does not work in my case, which is > why for systemd haproxy did not start correctly which is why systemd > continues to restart haproxy. I'm feeling less and less confident in using this mode by default :-/ > I found the root of the issue: the "daemon" keyword in the global > configuration section. We need to remove "daemon" from the > configuration for systemd to mode work correctly in notify mode. > > We can override the "daemon" keyword in the configuration by passing > -db to haproxy. Adding it to the unit file fixes the issue even if > they keyword is in the configuration (so we pass "-Ws -db" ). > > "-Ws -db" along with a note in the documentation that "daemon" may be > ignored in systemd mode seems like a good compromise here? Yes and it definitely is needed, just like we used to suggest users to always have -D in init scripts to force backgrounding of the process when there was no "deamon" statement in the conf. > I will send an RFC patch shortly. Thank you! Willy
Re: [PATCH v3 1/1] MEDIUM: mworker: Add systemd `Type=notify` support
Hello, 2017-11-21 8:39 GMT+01:00 William Lallemand: > On Tue, Nov 21, 2017 at 07:16:19AM +0100, Willy Tarreau wrote: >> >> I really don't like this. My fears with becoming more systemd-friendly >> was that we'd make users helpless when something decides not to work >> just to annoy them, and this reports seems to confirm this feeling :-/ >> >> Tim, have you seen this message about a "hold-off timer over" being >> displayed at the same second as the startup message ? Isn't there a >> timeout setting or something like this to place in the config file ? >> And is there a way to disable it so that people with huge configs >> are still allowed to load them ? >> > > There is indeed a timeout value, which is configurable in the unit file, the > default value is 100ms. > > https://www.freedesktop.org/software/systemd/man/systemd.service.html#RestartSec= > > > There is also the start one there: > > https://www.freedesktop.org/software/systemd/man/systemd.service.html#TimeoutStartSec= > > I suggest we configure it to a greater value per default, it can be set to > infinity too, but I don't like the idea. That's not it, the hold-off timer is only a consequence of this problem. I believe the notification does not work in my case, which is why for systemd haproxy did not start correctly which is why systemd continues to restart haproxy. I found the root of the issue: the "daemon" keyword in the global configuration section. We need to remove "daemon" from the configuration for systemd to mode work correctly in notify mode. We can override the "daemon" keyword in the configuration by passing -db to haproxy. Adding it to the unit file fixes the issue even if they keyword is in the configuration (so we pass "-Ws -db" ). "-Ws -db" along with a note in the documentation that "daemon" may be ignored in systemd mode seems like a good compromise here? I will send an RFC patch shortly. Regards, Lukas