Re: Segfault on haproxy 1.7.10 with state file and slowstart
Hi Willy, Baptiste On Fri, 29 Jun 2018 at 08:12 Willy Tarreau wrote: > Hi Raghu, > [...] > >[ALERT] 248/130258 (21960) : parsing [/etc/haproxy/test.cfg:53] : > 'server bla' : no method found to resolve address '(null)' >[ALERT] 248/130258 (21960) : Failed to initialize server(s) addr. > > According to Nenad : > "It's not a good way to fix the issue we were experiencing >before. It will need a bigger rewrite, because the logic in >srv_iterate_initaddr needs to be changed." > > Could you please try the example above in the commit message with and > without > your patch ? Maybe the problem is still there, or maybe the logic change > mentioned by Nenad has already been done and this is no longer a problem, > in which case we could take your patch this time. > Yes this is a problem - I missed to test this case. I have a proposal. The init-addr option only makes sense when fqdn is used. But last method is useful when using IP addresses. This patch enables the last method to work by default when IP addresses are used. Can you please review? Thanks Raghu 0001-MEDIUM-init-allow-init-addr-to-work-with-server-ip-a.patch Description: Binary data
Re: Segfault on haproxy 1.7.10 with state file and slowstart
Hi Willy, Baptiste I have been running this in production without any issues. Can you please review, verified that this will apply on current devel as well. On Fri, 6 Apr 2018 at 22:36 Willy Tarreau wrote: > Hi Raghu, > > On Wed, Mar 28, 2018 at 08:30:39PM +0000, Raghu Udiyar wrote: > > Hi Willy, Baptiste > > > > It looks like this was the intended behaviour as per the documentation. > But > > it looks to be simple to enable the init-addr behaviour for IP addresses > as > > well. Can you please review the attached patch, this is against devel. > > Baptiste, I missed this one. Could you please take a look ? > > Thanks, > Willy >
Re: Segfault on haproxy 1.7.10 with state file and slowstart
Hi Willy, Baptiste It looks like this was the intended behaviour as per the documentation. But it looks to be simple to enable the init-addr behaviour for IP addresses as well. Can you please review the attached patch, this is against devel. 0001-MEDIUM-init-allow-init-addr-to-work-with-server-ip-a.patch Description: Binary data
Re: Segfault on haproxy 1.7.10 with state file and slowstart
Hi Baptiste Were you able to look into this? A workaround is to use a dummy hostname, and init-addr like so : server testserver13 dummyhost:9003 check init-addr last,10.0.19.10 server testserver14 dummyhost:9003 check init-addr last,10.0.19.12 # disabled placeholder server testserver15 dummyhost:9003 disabled check init-addr last,169.254.0.9 This works as expected, but seems like a hack to me. And the blog post should be updated to indicate this behaviour. -- Raghu On Mon, Jan 15, 2018 at 1:10 AM, Willy Tarreau wrote: > Hi Raghu, > > On Sun, Jan 14, 2018 at 05:59:16PM +0530, Raghu Udiyar wrote: > > Hi Willy > > > > Yes, this patch works, no more segfaults. > > > > However, unrelated to this issue, while testing I noticed that the > address > > from the state file is not applied if the configuration does not use a > > hostname. > > > > In the example I provided, the configuration originally has the IP > address > > set to 169.254.0.9, which is then updated through the socket to say > 10.0.19.17. > > I noticed it as well but I thought I was the one doing something wrong. > CCing Baptiste in case he remembers certain cases that have to be > addressed a specific way by the state file. I remember this stuff was > tested quite a bit so I suspect that it could depend on certain > conditions. > > > Expectation was that upon haproxy restart the address in the state fill > > will override the address in the configuration, but this does not happen. > > > > I see the relevant code in srv_init_addr : > > > > if (srv->hostname) > > return_code |= srv_iterate_initaddr(srv); > > > > As per the server-template example in the dynamic scaling blog post, this > > should have been supported. Am I missing some configuration? > > I can't really tell, let's wait for Baptiste's opinion on this. > > Thanks, > Willy >
Re: Segfault on haproxy 1.7.10 with state file and slowstart
Hi Willy Yes, this patch works, no more segfaults. However, unrelated to this issue, while testing I noticed that the address from the state file is not applied if the configuration does not use a hostname. In the example I provided, the configuration originally has the IP address set to 169.254.0.9, which is then updated through the socket to say 10.0.19.17. Expectation was that upon haproxy restart the address in the state fill will override the address in the configuration, but this does not happen. I see the relevant code in srv_init_addr : if (srv->hostname) return_code |= srv_iterate_initaddr(srv); As per the server-template example in the dynamic scaling blog post, this should have been supported. Am I missing some configuration? Thanks! -- Raghu -- Raghu On Fri, Jan 12, 2018 at 4:14 PM, Willy Tarreau wrote: > Hello Raghu, > > On Thu, Jan 11, 2018 at 02:20:34PM +0530, Raghu Udiyar wrote: > > Hello, > > > > Haproxy 1.7.10 segfaults when the srv_admin_state is set to > > SRV_ADMF_CMAINT (0x04) > > for a backend server, and that backend has the `slowstart` option set. > > > > The following configuration reproduces it : > (...) > > Thanks for all the details, they made it easy to reproduce. > > From what I'm seeing, it's a fundamental design issue in the state > file handling in 1.7. It starts checks before they have been > initialized, and try to wake up a NULL task. In 1.8 due to the more > dynamic config, the initialization sequence has changed and checks > are initialized before parsing the state file, but I don't feel at > ease with doing in 1.7 since I don't know if some config elements > may remain non-updated. > > So instead I've just protected against using the task wakeups during > the state file parsing, and they will be initialized later with the > appropriate parameters. > > Could you please check the attached patch on top of 1.7.10 ? > > Thanks, > Willy >
Segfault on haproxy 1.7.10 with state file and slowstart
Hello, Haproxy 1.7.10 segfaults when the srv_admin_state is set to SRV_ADMF_CMAINT (0x04) for a backend server, and that backend has the `slowstart` option set. The following configuration reproduces it : - # haproxy.cfg (replace below) global maxconn 3 user haproxy group haproxy server-state-file //servers.state log-tag haproxy nbproc 1 cpu-map 1 2 stats socket /run/haproxy.sock level admin stats socket /run/haproxy_op.sock mode 666 level operator defaults mode http option forwardfor option dontlognull option httplog log 127.0.0.1 local1 debug timeout connect 5s timeout client 50s timeout server 50s timeout http-request 8s load-server-state-from-file global listen admin bind *:9002 stats enable stats auth haproxyadmin:xxx frontend testserver bind *:9000 option tcp-smart-accept option splice-request option splice-response default_backend testservers backend testservers balance roundrobin option tcp-smart-connect option splice-request option splice-response timeout server 2s timeout queue 2s default-server maxconn 10 *slowstart 10s* weight 1 server testserver15 10.0.19.10:9003check server testserver16 10.0.19.12:9003check server testserver17 169.254.0.9:9003 disabledcheck server testserver20 169.254.0.9:9003 disabledcheck # servers.state file 1 # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id 4 testservers 1 testserver15 10.0.19.10 2 0 1 1 924 6 3 4 6 0 0 0 4 testservers 2 testserver16 10.0.19.12 2 0 1 1 924 6 3 4 6 0 0 0 4 testservers 3 testserver17 169.254.0.9 0 5 1 1 924 1 0 0 14 0 0 0 4 testservers 4 testserver20 10.0.19.17 0 *4* 1 1 454 6 3 4 6 0 0 0 The state *4* above for testserver20 causes the segfault, and only occurs when slowstart is set. The configuration check can reproduce it ie: haproxy -c -f haproxy.cfg The backtrace : (gdb) bt #0 task_schedule (when=-508447097, task=0x0) at include/proto/task.h:244 #1 srv_clr_admin_flag (mode=SRV_ADMF_FMAINT, s=0x1fb0fd0) at src/server.c:626 #2 srv_adm_set_ready (s=0x1fb0fd0) at include/proto/server.h:231 #3 srv_update_state (params=0x7ffe4f15e7d0, version=1, srv=0x1fb0fd0) at src/server.c:2289 #4 apply_server_state () at src/server.c:2664 #5 0x0044b60f in init (argc=, argc@entry=4, argv=, argv@entry=0x7ffe4f160d38) at src/haproxy.c:975 #6 0x004491be in main (argc=4, argv=0x7ffe4f160d38) at src/haproxy.c:1795 The way we use the state file is to have servers with `disabled` option in the configuration; and during scaling update the backend address and mark as active using the socket. The 169.254.0.9 address is a dummy address for the disabled servers. Can someone take a look? I couldn't find any related bugs fixed in 1.8. Thanks -- Raghu
Re: Herald - Generic and extensible agent-check service for Haproxy
On Thu, Nov 17, 2016 at 8:03 PM, Willy Tarreau wrote: > > Yes actually. When we started, herald was multi service, which is when we > > realised its not possible to distinguish which backend agent request was > > for. An "agent-send" feature that sends something like > > "/\n" would be useful. Herald would require > > minor modifications to support this. > > Right now haproxy already supports "agent-send" but you have to pass the > string yourself in the configuration. So you can already update Herald > to make use of this when configs support it. And later we may add a new > directive so that users don't even have to configure the string anymore. > Ah, yes I see it in 1.7; I'll add this to Herald. > > > > Do you want me to add a link to the main haproxy page, and if so to > > > what location ? > > > > > > > Yes, I think the external link sections here is appropriate : > > http://www.haproxy.org/#link > > Sure, that's what I intented, but I meant what should be the target of > the link ? The github page, your article, anything else ? Please link to the github page : https://github.com/helpshift/herald Thanks, Raghu
Re: Herald - Generic and extensible agent-check service for Haproxy
On Thu, Nov 10, 2016 at 4:16 PM, Willy Tarreau wrote: > Hello Raghu, > > Blog post here : > > https://engineering.helpshift.com/2016/herald-haproxy- > loadfeedback-agent/ > > > > Hope this is useful for the haproxy community. > > This is excellent, thank you for contributing this! > Thanks Willy! > I'm having a few questions : did you miss anything from haproxy when > doing this ? For example, did you have to workaround the impossibility > to do something via the agent just because of the agent protocol or > because of some validity checks performed by haproxy ? I'm asking > because if we have to perform very minor tweaks, better do them before > the release. Do you think you could benefit from "agent-send" so that > the same agent is used for multiple servers, when haproxy would then > indicate with the connection for what server it is connecting ? If so, > would an automatic string such as "agent-send-name" sending the backend > and the server name be an useful improvement for this ? > Yes actually. When we started, herald was multi service, which is when we realised its not possible to distinguish which backend agent request was for. An "agent-send" feature that sends something like "/\n" would be useful. Herald would require minor modifications to support this. > Do you want me to add a link to the main haproxy page, and if so to > what location ? > Yes, I think the external link sections here is appropriate : http://www.haproxy.org/#link Thanks, Raghu
Herald - Generic and extensible agent-check service for Haproxy
Hello We have developed a generic service (using python gevent) to serve as an agent check service for Haproxy. Its extensible via plugins for application feedback, and supports result cacheing, json expressions, arithmetic, regex match, and fallback in case of failure. The source is here : https://github.com/helpshift/herald Blog post here : https://engineering.helpshift.com/2016/herald-haproxy-loadfeedback-agent/ Hope this is useful for the haproxy community. Thanks, -- Raghu
[PATCH] BUG/MINOR: stats: fix missing comma in stats on agent drain
The csv stats format breaks when agent changes server state to drain. Tools like hatop, metric or check agents will fail due to this. This should be backported to 1.6. --- src/dumpstats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/dumpstats.c b/src/dumpstats.c index f43a5cc..b868ef5 100644 --- a/src/dumpstats.c +++ b/src/dumpstats.c @@ -3584,7 +3584,7 @@ static int stats_dump_sv_stats(struct stream_interface *si, struct proxy *px, in [SRV_STATS_STATE_NOLB] = "NOLB,", [SRV_STATS_STATE_DRAIN_GOING_DOWN] = "DRAIN %d/%d,", [SRV_STATS_STATE_DRAIN] = "DRAIN,", - [SRV_STATS_STATE_DRAIN_AGENT] = "DRAIN (agent)", + [SRV_STATS_STATE_DRAIN_AGENT] = "DRAIN (agent),", [SRV_STATS_STATE_NO_CHECK] = "no check," }; -- 2.7.0