problem with haproxy reload
Hi, We faced with haproxy, we have a script which deletes the frontend and backend entries of haproxy based on name and does a reload of haproxy after haproxy file check is done. In one such scenario after deleting the frontend and backend and reloading we found that haproxy was in stop state Below are the logs which shows the backend was started again during reload but the frontends were not started and the same are shown in logs after we manually restarted haproxy Any feedback regarding this will be very useful. Regards Senthil May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend ssl_frontend_1 in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend ssl_frontend_1BACK in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend ssl_frontend_2 in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend ssl_frontend_2BACK in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend Star in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend StarBACK in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend Staging in 0 ms. May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend StagingBACK in 0 ms. May 18 19:36:10 indya-lb haproxy[13147]: Proxy ssl_frontend_2BACK started. May 18 19:36:10 indya-lb haproxy[13147]: Proxy StarBACK started. May 18 19:36:10 indya-lb haproxy[13147]: Proxy StagingBACK started. May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_1 stopped (FE: 3886 conns, BE: 0 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_1BACK stopped (FE: 0 conns, BE: 3583 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_2 stopped (FE: 0 conns, BE: 0 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_2BACK stopped (FE: 0 conns, BE: 0 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy Star stopped (FE: 60927284 conns, BE: 0 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy StarBACK stopped (FE: 0 conns, BE: 59690087 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy Staging stopped (FE: 0 conns, BE: 0 conns). May 18 19:36:10 indya-lb haproxy[7375]: Proxy StagingBACK stopped (FE: 0 conns, BE: 0 conns). May 18 20:09:32 indya-lb haproxy[13204]: Proxy ssl_frontend_2 started. May 18 20:09:32 indya-lb haproxy[13204]: Proxy ssl_frontend_2BACK started. May 18 20:09:32 indya-lb haproxy[13204]: Proxy Star started. May 18 20:09:32 indya-lb haproxy[13204]: Proxy StarBACK started. May 18 20:09:32 indya-lb haproxy[13204]: Proxy Staging started. May 18 20:09:32 indya-lb haproxy[13204]: Proxy StagingBACK started. We are the using the init script to reload haproxy service haproxy reload in centos and the script is as follows #!/bin/sh # # chkconfig: - 85 15 # description: HA-Proxy is a TCP/HTTP reverse proxy which is particularly suited \ # for high availability environments. # processname: haproxy # config: /etc/haproxy.cfg # pidfile: /var/run/haproxy.pid # Source function library. if [ -f /etc/init.d/functions ]; then . /etc/init.d/functions elif [ -f /etc/rc.d/init.d/functions ] ; then . /etc/rc.d/init.d/functions else exit 0 fi # Source networking configuration. . /etc/sysconfig/network # Check that networking is up. [ ${NETWORKING} = no ] exit 0 [ -f /etc/haproxy.cfg ] || exit 1 RETVAL=0 start() { /usr/sbin/haproxy -c -q -f /etc/haproxy.cfg if [ $? -ne 0 ]; then echo Errors found in configuration file. return 1 fi echo -n Starting HAproxy: daemon /usr/sbin/haproxy -D -f /etc/haproxy.cfg -p /var/run/haproxy.pid RETVAL=$? echo [ $RETVAL -eq 0 ] touch /var/lock/subsys/haproxy return $RETVAL } stop() { echo -n Shutting down HAproxy: killproc haproxy -USR1 RETVAL=$? echo [ $RETVAL -eq 0 ] rm -f /var/lock/subsys/haproxy [ $RETVAL -eq 0 ] rm -f /var/run/haproxy.pid return $RETVAL } restart() { /usr/sbin/haproxy -c -q -f /etc/haproxy.cfg if [ $? -ne 0 ]; then echo Errors found in configuration file, check it with 'haproxy check'. return 1 fi stop start } check() { /usr/sbin/haproxy -c -q -V -f /etc/haproxy.cfg } rhstatus() { status haproxy } condrestart() { [ -e /var/lock/subsys/haproxy ] restart || : }
Re: ACL routing help
Where you have acl_issomedomain hdr_beg(host) -i www.somedomain.com Change it to acl_issomedomain hdr_beg(host) -i somedomain.com www.somedomain.com Space delimited fields are permitted, and apparently quite efficient :) Chris On 29/05/2012 17:53, Lofland, Bryan W. wrote: I have an active/passive LB setup and I have multiple domains and applications behind the setup. I am hoping you can help me see if something is possible. I have a rule that states that www.somedomain.com gets forwarded to farm X. I have another rule that states that staging.somedomain.com gets routed to farm Y. I have been asked if I can allow http://somedomain.com to work. I understand that I have to mess with DNS as well, but from an HAProxy perspective can I add an ACL that allows this to route to farm X? The existing rules would also need to continue to work. So staging.somedomain.com would still need to route to farm Y, etc. Below are my rules per my config. frontend http-in mode http # ACLs # ## These are test ones that would direct clients ### ## to different backends depending on the ## ## host or domain field in the host header # acl is_lsr hdr_beg(host) -i www.domaina.com acl is_domainb hdr_beg(host) -i domainb.com acl is_domainb hdr_beg(host) -i www.domainb.com acl is_domainb dst 10.101.69.96 acl is_domainb hdr_beg(host) 63.239.123.254 acl is_domainc hdr_dom(host) -i domainc.com acl is_domaind hdr_dom(host) -i domaind.com acl is_domainc hdr_beg(host) -i 63.239.123.254 acl is_domaind hdr_dom(host) -i domaind.org acl is_domainc hdr_dom(host) -i domainc.org acl is_domaind hdr_dom(host) -i domaind.net acl is_domainc hdr_dom(host) -i domainc.net acl is_punchout hdr_beg(host) -i punchout.domainb.com acl is_lbtest hdr_beg(host) -i lbtest.domainb.com acl is_stg hdr_beg(host) -i staging.domaina.com acl is_stg hdr_beg(host) -i staging.domainb.com acl is_stg dst 10.101.69.75 acl is_load hdr_beg(host) -i load.domaina.com acl is_load hdr_beg(host) -i load.domainb.com acl is_domaine hdr_dom(host) -i domaine.org acl is_domainf hdr_dom(host) -i domainf.org acl is_domaine hdr_dom(host) -i domaine.com acl is_domainf hdr_dom(host) -i domainf.com acl is_domaine hdr_dom(host) -i domaine.net acl is_domainf hdr_dom(host) -i domainf.net redirect location http://www.domaine.org if is_domaine redirect location http://www.domaine.org if is_domainf redirect location http://www.domainc.com if is_domainc or is_domaind use_backend XYZ-HTTP if is_lsr or is_lbtest use_backend DOMA-HTTP if is_dharmacon or is_punchout use_backend DOMB-HTTP if is_open use_backend STG-HTTP if is_stg or is_load use_backend DOMF-HTTP if is_domaine or is_domainf # # ACLs ending ### # Thanks, Bryan
Re: Problems with layer7 check timeout
I've been monitoring our service availability check (http head of a resource that truly provides availability status of the application). Under normal circumstances, the check takes 2-3 seconds. We found periods of time where the application would take 15+seconds and fail (I did not capture HTTP code, but I'm pretty sure it was a 500 series from what I've been looking through). These failure periods match the times where haproxy was indicating timeouts of 1002ms. So, it looks like haproxy is doing its job. Is this then a bug in the logging of the timeout value (reporting 1002ms vs 15000+ms)? We haven't had any problems since 25 May, but we're keeping watch. - Kevin On 5/25/12 11:18 AM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] wrote: Willy, I'll try the patch, but not until next week because of the holiday weekend. I don't want to make a significant change that I would have to support over the long weekend. I'm capturing tcpdump between SLB and the three backends. I'd like to have a capture during an outage. I expect to see something today, and I'll send to you. - Kevin On May 25, 2012, at 2:12 AM, Willy Tarreau wrote: Hi again Kevin, Well, I suspect that there might be a corner case with the bug I fixed which might have caused what you observed. The timeout connect is computed from the last expire date. Since timeout check was added upon connection establishment but the task was woken too late, then that after a first check failure reported too late, you can have the next check timeout shortened. It's still unclear to me how it is possible that the check timeout is reported this small, considering that it's updated once the connect succeeds. But performing computations in the past is never a good way to have something reliable. Could you please apply the attached fix for the bug I mentionned in previous mail, to see if the issue is still present ? After all, I would not be totally surprized if this bug has nasty side effects like this. Thanks, Willy 0001-BUG-MINOR-checks-expire-on-timeout.check-if-smaller-.patch Kevin Lange kevin.m.la...@nasa.gov mailto:kevin.m.la...@nasa.gov kla...@raytheon.com mailto:kla...@raytheon.com W: +1 (301) 851-8450 Raytheon | NASA | ECS Evolution Development Program https://www.echo.com | https://www.raytheon.com
Re: Problems with layer7 check timeout
Hi Kevin, On Tue, May 29, 2012 at 03:08:17PM -0400, Kevin M Lange wrote: I've been monitoring our service availability check (http head of a resource that truly provides availability status of the application). Under normal circumstances, the check takes 2-3 seconds. We found periods of time where the application would take 15+seconds and fail (I did not capture HTTP code, but I'm pretty sure it was a 500 series from what I've been looking through). These failure periods match the times where haproxy was indicating timeouts of 1002ms. So, it looks like haproxy is doing its job. Is this then a bug in the logging of the timeout value (reporting 1002ms vs 15000+ms)? This is the strange part, as I didn't manage to get this indication on my test platform. Would you accept to send me in private the network capture for a series of checks that were mis-reported ? Depending on how it's segmented and aborted, maybe I could get a clue about what is happening. We haven't had any problems since 25 May, but we're keeping watch. It reminds me the old days of early Opterons where clock was unsynced between the cores and was jumping back and forth, causing early timeouts and wrong timer reports. The issue comes back with the use of VMs everywhere. This led me to implement the internal monotonic clock which compensates for jumps, which cannot exceed 1s now. But even with a 1s jump, this does not explain 15000 - 1002ms, so right now I'm a bit stuck. Regards, Willy
FW: haproxy conditional healthchecks/failover
am I wildly off course or is this config salvageable? Hi! I'm trying to use HAproxy to support the concepts of offline, in maintenance mode, and not working servers. Any good reason to do that??? (I'm a bit curious) Sure. I want to be able to mark a machine offline by creating a file (as opposed to marking it online by creating a file), which is why I can't use disable-on-404 below. This covers situations where I need to take a machine out of public-facing operation for some reason, but perhaps I still want it to be able to render pages etc -- maybe I'm testing a code deployment once it's already deployed in order to verify the system is ready to be marked online. I also want to be able to mark a machine down for maintenance by creating a file, maintenance.html, which apache will nicely rewrite URLs to etc. during critical deployment phases or when performing other maintenance. In this case, I don't want it to render pages (usually to replace otherwise nasty-looking 500 error pages with a nice html facade). For normal operations, I want the machine to be up. But if it's not intentionally placed offline or in maintenance and the machines fail heartbeat checks, then the machine is not working and should not be served requests. Does this make sense? I have separate health checks for each condition and I have been trying to use ACLs to be able to switch between backends. In addition to the fact that this doesn't seem to work, I'm also not loving having to repeat the server lists (which are the same) for each backend. Nothing weird here, this is how HAProxy configuration works. Cool, but variables would be nice to save time and avoid potential inconsistencies between sections. -- I think it's more like if any of these succeed, mark this server online -- and that's what's making this scenario complex. euh I might misunderstanding something. There is nothing more simple that if the health check is successful, then the server is considered healthy... Since it's not strictly binary, as described above, it's a bit more complex. frontend staging 0.0.0.0:8080 # if the number of servers *not marked offline* is *less than the total number of app servers* (in this case, 2), then it is considered degraded acl degraded nbsrv(only_online) lt 2 This will match 0 and 1 # if the number of servers *not marked offline* is *less than one*, the site is considered down acl down nbsrv(only_online) lt 1 This will match 0, so you're both down and degraded ACL covers the same value (0). Which may lead to an issue later # if the number of servers without the maintenance page is *less than the total number of app servers* (in this case, 2), then it is considered maintenance mode acl mx_mode nbsrv(maintenance) lt 2 # if the number of servers without the maintenance page is less than 1, we're down because everything is in maintenance mode acl down_mx nbsrv(maintenance) lt 1 Same remark as above. # if not running at full potential, use the backend that identified the degraded state use_backend only_online if degraded use_backend maintenance if mx_mode # if we are down for any reason, use the backend that identified that fact use_backend backup_only if down use_backend backup_only if down_mx Here is the problem (see above). The 2 use_backend above will NEVER match, because the degraded ad mx_mode ACL overlaps their values! Why would they never match? Aren't you saying they *both* should match and wouldn't it then take action on the final match and switch the backend to maintenance mode? That's what I want. Maintenance mode overrides offline mode as a failsafe (since it's more restrictive) to prevent page rendering. Do you know the disable-on-404 option? it may help you make your configuration in the right way (not considering a 404 as a healthy response). Yes, but what I actually would need is enable-on-404 :) Thanks for your feedback! I'm definitely open to other options, but I'm hoping to not have to lose the flexibility described above! -chaz
[PATCH] BUG/MEDIUM: option forwardfor if-none doesn't work with some configurations
When option forwardfor is enabled in a frontend that uses backends, if-none ignores the header name provided in the frontend. This prevents haproxy to add the X-Forwarded-For header if the option is not used in the backend. This may introduce security issues for servers/applications that rely on the header provided by haproxy. A minimal configuration which can reproduce the bug: defaults mode http listen OK bind :9000 option forwardfor if-none server s1 127.0.0.1:80 listen BUG-frontend bind :9001 option forwardfor if-none default_backend BUG-backend backend BUG-backend server s1 127.0.0.1:80 --- src/proto_http.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/proto_http.c b/src/proto_http.c index 7cf413d..b41b70a 100644 --- a/src/proto_http.c +++ b/src/proto_http.c @@ -3249,9 +3249,10 @@ int http_process_request(struct session *s, struct buffer *req, int an_bit) */ if ((s-fe-options | s-be-options) PR_O_FWDFOR) { struct hdr_ctx ctx = { .idx = 0 }; - if (!((s-fe-options | s-be-options) PR_O_FF_ALWAYS) - http_find_header2(s-be-fwdfor_hdr_name, s-be-fwdfor_hdr_len, req-p, txn-hdr_idx, ctx)) { + http_find_header2(s-be-fwdfor_hdr_len ? s-be-fwdfor_hdr_name : s-fe-fwdfor_hdr_name, + s-be-fwdfor_hdr_len ? s-be-fwdfor_hdr_len : s-fe-fwdfor_hdr_len, + req-p, txn-hdr_idx, ctx)) { /* The header is set to be added only if none is present * and we found it, so don't do anything. */ -- 1.7.10
Re: [PATCH] BUG/MEDIUM: option forwardfor if-none doesn't work with some configurations
On Tue, May 29, 2012 at 11:27:41PM +0200, Cyril Bonté wrote: When option forwardfor is enabled in a frontend that uses backends, if-none ignores the header name provided in the frontend. This prevents haproxy to add the X-Forwarded-For header if the option is not used in the backend. Thank you Cyril, applied to both 1.5-dev and 1.4. Willy
Re: FW: haproxy conditional healthchecks/failover
On Tue, May 29, 2012 at 08:32:29PM +, Zulu Chas wrote: am I wildly off course or is this config salvageable? To be honnest, your mail with overly long lines (half a kilobyte) is painful to read, and once I made the effort of reading it, I didn't understand why you're trying to cross-dress something which already exists and works. The disable-on-404 is made to permit enabling/disabling a server by a simple touch or rm. It appears that you want to exactly swap these two commands, it really makes no sense to me to modify haproxy to support such a swap in a script. Another reason for disabling on 404 is that it will not accidently enable a server which was started from an unmounted docroot file system. With your method, it would still start it. Also, the suggested way of dealing with very specific health checks is to write a CGI or servlet to handle the various situations. Most people are already doing this, and if you absolutely want to use rm to start the server and touch to stop it, then 5 lines of shell in a CGI will do it. Regards, Willy
Re: No PID file when running in foreground
On Wed, May 23, 2012 at 09:08:15AM -0400, Chad Gatesman wrote: Is there a major reason the -p option to generate a pid file is ignored when running haproxy the foreground (e.g. using -db)? It would be nice if this file was still generated when specified--even in foreground mode. Could this be something that could be changed in a future releases? No it's not planned because all -dXXX are mainly debugging/development switches. -db is used all the time during development since it allows one to stop haproxy by a simple Ctrl-C. This is the only way I start it when developing or troubleshooting configs. Having haproxy fail to start because of an unwritable directory to write the pid file would be really annoying. I really don't understand why this would be useful to you. A pid file makes sense for a background process since it saves you from searching it. But a foreground process, what's the purpose ? Normally you're supposed to stop it using Ctrl-C, so I fail to catch your use case. Regards, Willy