problem with haproxy reload

2012-05-29 Thread Senthil
Hi,

 

  We faced with haproxy, we have a script which deletes the

  frontend and backend entries of haproxy based on name and does a reload of

  haproxy after haproxy file check is done.

  

   

  

  In one such scenario after deleting the frontend and backend and reloading

  we found that haproxy was in stop state

  

   

  

  Below are the logs which shows the backend was started again during reload

   but the frontends were not started and the same are  shown in logs  after
we manually restarted

  haproxy

  

  

  

 Any feedback regarding this will be very useful.

 

Regards

Senthil

 

 

 

May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend ssl_frontend_1 in
0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend
ssl_frontend_1BACK  in 0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend ssl_frontend_2
in  0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend
ssl_frontend_2BACK  in 0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend Star in 0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend StarBACK in 0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping frontend Staging in 0 ms.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Stopping backend StagingBACK in 0
ms.

  

  May 18 19:36:10 indya-lb haproxy[13147]: Proxy ssl_frontend_2BACK started.

  

  May 18 19:36:10 indya-lb haproxy[13147]: Proxy StarBACK started.

  

  May 18 19:36:10 indya-lb haproxy[13147]: Proxy StagingBACK started.

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_1 stopped (FE:
3886 conns, BE: 0 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_1BACK stopped
(FE: 0 conns, BE: 3583 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_2 stopped (FE:
0  conns, BE: 0 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy ssl_frontend_2BACK stopped
(FE: 0 conns, BE: 0 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy Star stopped (FE: 60927284
conns, BE: 0 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy StarBACK stopped (FE: 0
conns,  BE: 59690087 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy Staging stopped (FE: 0
conns,  BE: 0 conns).

  

  May 18 19:36:10 indya-lb haproxy[7375]: Proxy StagingBACK stopped (FE: 0
conns, BE: 0 conns).

  

  May 18 20:09:32 indya-lb haproxy[13204]: Proxy ssl_frontend_2 started.

  

  May 18 20:09:32 indya-lb haproxy[13204]: Proxy ssl_frontend_2BACK started.

  

  May 18 20:09:32 indya-lb haproxy[13204]: Proxy Star started.

  

  May 18 20:09:32 indya-lb haproxy[13204]: Proxy StarBACK started.

  

  May 18 20:09:32 indya-lb haproxy[13204]: Proxy Staging started.

  

  May 18 20:09:32 indya-lb haproxy[13204]: Proxy StagingBACK started.

 

 

 We are the using the init script to reload haproxy service haproxy reload
in centos and the script is as follows

 

#!/bin/sh

  

  #

  

  # chkconfig: - 85 15

  

  # description: HA-Proxy is a TCP/HTTP reverse proxy which is particularly

  suited

  

   \

  

  #  for high availability environments.

  

  # processname: haproxy

  

  # config: /etc/haproxy.cfg

  

  # pidfile: /var/run/haproxy.pid

  

   

  

  # Source function library.

  

  if [ -f /etc/init.d/functions ]; then

  

. /etc/init.d/functions

  

  elif [ -f /etc/rc.d/init.d/functions ] ; then

  

. /etc/rc.d/init.d/functions

  

  else

  

exit 0

  

  fi

  

   

  

  # Source networking configuration.

  

  . /etc/sysconfig/network

  

   

  

  # Check that networking is up.

  

  [ ${NETWORKING} = no ]  exit 0

  

   

  

  [ -f /etc/haproxy.cfg ] || exit 1

  

   

  

  RETVAL=0

  

   

  

  start() {

  

/usr/sbin/haproxy -c -q -f /etc/haproxy.cfg

  

if [ $? -ne 0 ]; then

  

  echo Errors found in configuration file.

  

  return 1

  

fi

  

   

  

echo -n Starting HAproxy: 

  

daemon /usr/sbin/haproxy -D -f /etc/haproxy.cfg -p /var/run/haproxy.pid

  

RETVAL=$?

  

echo

  

[ $RETVAL -eq 0 ]  touch /var/lock/subsys/haproxy

  

return $RETVAL

  

  }

  

   

  

  stop() {

  

echo -n Shutting down HAproxy: 

  

 killproc haproxy -USR1

  

RETVAL=$?

  

echo

  

[ $RETVAL -eq 0 ]  rm -f /var/lock/subsys/haproxy

  

[ $RETVAL -eq 0 ]  rm -f /var/run/haproxy.pid

  

return $RETVAL

  

  }

  

   

  

  restart() {

  

/usr/sbin/haproxy -c -q -f /etc/haproxy.cfg

  

if [ $? -ne 0 ]; then

  

  echo Errors found in configuration file, check it with 'haproxy

  check'.

  

  return 1

  

fi

  

stop

  

start

  

  }

  

   

  

  check() {

  

/usr/sbin/haproxy -c -q -V -f /etc/haproxy.cfg

  

  }

  

   

  

  rhstatus() {

  

status haproxy

  

  }

  

   

  

  condrestart() {

  

[ -e /var/lock/subsys/haproxy ]  restart || :

  

  }

  

   

  

Re: ACL routing help

2012-05-29 Thread Chris Sarginson

Where you have

acl_issomedomain hdr_beg(host) -i www.somedomain.com

Change it to

acl_issomedomain hdr_beg(host) -i somedomain.com www.somedomain.com

Space delimited fields are permitted, and apparently quite efficient :)

Chris

On 29/05/2012 17:53, Lofland, Bryan W. wrote:

I have an active/passive LB setup and I have multiple domains and applications 
behind the setup.  I am hoping you can help me see if something is possible.

I have a rule that states that www.somedomain.com gets forwarded to farm X.  I 
have another rule that states that staging.somedomain.com gets routed to farm 
Y.  I have been asked if I can allow http://somedomain.com to work.  I 
understand that I have to mess with DNS as well, but from an HAProxy 
perspective can I add an ACL that allows this to route to farm X?  The existing 
rules would also need to continue to work.  So staging.somedomain.com would 
still need to route to farm Y, etc.  Below are my rules per my config.


frontend http-in
mode http

# ACLs 
#
## These are test ones that would direct clients 
###
## to different backends depending on the 
##
## host or domain field in the host header 
#

acl is_lsr hdr_beg(host) -i www.domaina.com
acl is_domainb hdr_beg(host) -i domainb.com
acl is_domainb hdr_beg(host) -i www.domainb.com
acl is_domainb dst 10.101.69.96
acl is_domainb hdr_beg(host) 63.239.123.254
acl is_domainc hdr_dom(host) -i domainc.com
acl is_domaind hdr_dom(host) -i domaind.com
acl is_domainc hdr_beg(host) -i 63.239.123.254
acl is_domaind hdr_dom(host) -i domaind.org
acl is_domainc hdr_dom(host) -i domainc.org
acl is_domaind hdr_dom(host) -i domaind.net
acl is_domainc hdr_dom(host) -i domainc.net
acl is_punchout hdr_beg(host) -i punchout.domainb.com
acl is_lbtest hdr_beg(host) -i lbtest.domainb.com
acl is_stg hdr_beg(host) -i staging.domaina.com
acl is_stg hdr_beg(host) -i staging.domainb.com
acl is_stg dst 10.101.69.75
acl is_load hdr_beg(host) -i load.domaina.com
acl is_load hdr_beg(host) -i load.domainb.com
acl is_domaine hdr_dom(host) -i domaine.org
acl is_domainf hdr_dom(host) -i domainf.org
acl is_domaine hdr_dom(host) -i domaine.com
acl is_domainf hdr_dom(host) -i domainf.com
acl is_domaine hdr_dom(host) -i domaine.net
acl is_domainf hdr_dom(host) -i domainf.net
redirect location http://www.domaine.org if is_domaine
redirect location http://www.domaine.org if is_domainf
redirect location http://www.domainc.com if is_domainc or is_domaind
use_backend XYZ-HTTP if is_lsr or is_lbtest
use_backend DOMA-HTTP if is_dharmacon or is_punchout
use_backend DOMB-HTTP if is_open
use_backend STG-HTTP if is_stg or is_load
use_backend DOMF-HTTP if is_domaine or is_domainf
#
# ACLs ending ###
#

Thanks,

Bryan








Re: Problems with layer7 check timeout

2012-05-29 Thread Kevin M Lange
I've been monitoring our service availability check (http head of a 
resource that truly provides availability status of the application).  
Under normal circumstances, the check takes 2-3 seconds.  We found 
periods of time where the application would take 15+seconds and fail (I 
did not capture HTTP code, but I'm pretty sure it was a 500 series from 
what I've been looking through).  These failure periods match the times 
where haproxy was indicating timeouts of 1002ms.  So, it looks like 
haproxy is doing its job.  Is this then a bug in the logging of the 
timeout value (reporting 1002ms vs 15000+ms)?


We haven't had any problems since 25 May, but we're keeping watch.

- Kevin

On 5/25/12 11:18 AM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] wrote:

Willy,
I'll try the patch, but not until next week because of the holiday 
weekend.  I don't want to make a significant change that I would have 
to support over the long weekend.
I'm capturing tcpdump between SLB and the three backends.  I'd like to 
have a capture during an outage.  I expect to see something today, 
and I'll send to you.

- Kevin


On May 25, 2012, at 2:12 AM, Willy Tarreau wrote:


Hi again Kevin,

Well, I suspect that there might be a corner case with the bug I fixed
which might have caused what you observed.

The timeout connect is computed from the last expire date. Since
timeout check was added upon connection establishment but the task
was woken too late, then that after a first check failure reported
too late, you can have the next check timeout shortened.

It's still unclear to me how it is possible that the check timeout is
reported this small, considering that it's updated once the connect
succeeds. But performing computations in the past is never a good way
to have something reliable.

Could you please apply the attached fix for the bug I mentionned in
previous mail, to see if the issue is still present ? After all, I
would not be totally surprized if this bug has nasty side effects
like this.

Thanks,
Willy

0001-BUG-MINOR-checks-expire-on-timeout.check-if-smaller-.patch


Kevin Lange
kevin.m.la...@nasa.gov mailto:kevin.m.la...@nasa.gov
kla...@raytheon.com mailto:kla...@raytheon.com
W: +1 (301) 851-8450
Raytheon  | NASA  | ECS Evolution Development Program
https://www.echo.com  | https://www.raytheon.com





Re: Problems with layer7 check timeout

2012-05-29 Thread Willy Tarreau
Hi Kevin,

On Tue, May 29, 2012 at 03:08:17PM -0400, Kevin M Lange wrote:
 I've been monitoring our service availability check (http head of a 
 resource that truly provides availability status of the application).  
 Under normal circumstances, the check takes 2-3 seconds.  We found 
 periods of time where the application would take 15+seconds and fail (I 
 did not capture HTTP code, but I'm pretty sure it was a 500 series from 
 what I've been looking through).  These failure periods match the times 
 where haproxy was indicating timeouts of 1002ms.  So, it looks like 
 haproxy is doing its job.  Is this then a bug in the logging of the 
 timeout value (reporting 1002ms vs 15000+ms)?

This is the strange part, as I didn't manage to get this indication on
my test platform. Would you accept to send me in private the network
capture for a series of checks that were mis-reported ? Depending on
how it's segmented and aborted, maybe I could get a clue about what is
happening.

 We haven't had any problems since 25 May, but we're keeping watch.

It reminds me the old days of early Opterons where clock was unsynced
between the cores and was jumping back and forth, causing early timeouts
and wrong timer reports. The issue comes back with the use of VMs
everywhere. This led me to implement the internal monotonic clock which
compensates for jumps, which cannot exceed 1s now. But even with a 1s
jump, this does not explain 15000 - 1002ms, so right now I'm a bit stuck.

Regards,
Willy




FW: haproxy conditional healthchecks/failover

2012-05-29 Thread Zulu Chas

am I wildly off course or is this config salvageable?






  Hi!
 
  I'm trying to use HAproxy to support the concepts of offline, in
  maintenance mode, and not working servers.
 
 Any good reason to do that???
 (I'm a bit curious)

Sure.  I want to be able to mark a machine offline by creating a file (as 
opposed to marking it online by creating a file), which is why I can't use 
disable-on-404 below.  This covers situations where I need to take a machine 
out of public-facing operation for some reason, but perhaps I still want it to 
be able to render pages etc -- maybe I'm testing a code deployment once it's 
already deployed in order to verify the system is ready to be marked online.
I also want to be able to mark a machine down for maintenance by creating a 
file, maintenance.html, which apache will nicely rewrite URLs to etc. during 
critical deployment phases or when performing other maintenance.  In this case, 
I don't want it to render pages (usually to replace otherwise nasty-looking 500 
error pages with a nice html facade).
For normal operations, I want the machine to be up.  But if it's not 
intentionally placed offline or in maintenance and the machines fail 
heartbeat checks, then the machine is not working and should not be served 
requests.
Does this make sense?
 
   I have separate health checks
  for each condition and I have been trying to use ACLs to be able to switch
  between backends.  In addition to the fact that this doesn't seem to work,
  I'm also not loving having to repeat the server lists (which are the same)
  for each backend.
 
 Nothing weird here, this is how HAProxy configuration works.
Cool, but variables would be nice to save time and avoid potential 
inconsistencies between sections.
  -- I think it's more like if any of
  these succeed, mark this server online -- and that's what's making this
  scenario complex.
 
 euh I might misunderstanding something.
 There is nothing more simple that if the health check is successful,
 then the server is considered healthy...

Since it's not strictly binary, as described above, it's a bit more complex.

  frontend staging 0.0.0.0:8080
# if the number of servers *not marked offline* is *less than the total
  number of app servers* (in this case, 2), then it is considered degraded
acl degraded nbsrv(only_online) lt 2
 
 
 This will match 0 and 1
 
# if the number of servers *not marked offline* is *less than one*, the
  site is considered down
acl down nbsrv(only_online) lt 1
 
 
 This will match 0, so you're both down and degraded ACL covers the
 same value (0).
 Which may lead to an issue later
 
# if the number of servers without the maintenance page is *less than the
  total number of app servers* (in this case, 2), then it is
  considered maintenance mode
acl mx_mode nbsrv(maintenance) lt 2
 
# if the number of servers without the maintenance page is less than 1,
  we're down because everything is in maintenance mode
acl down_mx nbsrv(maintenance) lt 1
 
 
 Same remark as above.
 
 
# if not running at full potential, use the backend that identified the
  degraded state
use_backend only_online if degraded
use_backend maintenance if mx_mode
 
# if we are down for any reason, use the backend that identified that fact
use_backend backup_only if down
use_backend backup_only if down_mx
 
 
 Here is the problem (see above).
 The 2 use_backend above will NEVER match, because the degraded ad
 mx_mode ACL overlaps their values!

Why would they never match?  Aren't you saying they *both* should match and 
wouldn't it then take action on the final match and switch the backend to 
maintenance mode?  That's what I want.  Maintenance mode overrides offline mode 
as a failsafe (since it's more restrictive) to prevent page rendering.
 Do you know the disable-on-404 option?
 it may help you make your configuration in the right way (not
 considering a 404 as a healthy response).
 

Yes, but what I actually would need is enable-on-404 :)
Thanks for your feedback!  I'm definitely open to other options, but I'm hoping 
to not have to lose the flexibility described above!
-chaz

  

[PATCH] BUG/MEDIUM: option forwardfor if-none doesn't work with some configurations

2012-05-29 Thread Cyril Bonté
When option forwardfor is enabled in a frontend that uses backends,
if-none ignores the header name provided in the frontend.
This prevents haproxy to add the X-Forwarded-For header if the option is not
used in the backend.

This may introduce security issues for servers/applications that rely on the
header provided by haproxy.

A minimal configuration which can reproduce the bug:
defaults
mode http

listen OK
bind :9000

option forwardfor if-none
server s1 127.0.0.1:80

listen BUG-frontend
bind :9001

option forwardfor if-none

default_backend BUG-backend

backend BUG-backend
server s1 127.0.0.1:80
---
 src/proto_http.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/proto_http.c b/src/proto_http.c
index 7cf413d..b41b70a 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -3249,9 +3249,10 @@ int http_process_request(struct session *s, struct 
buffer *req, int an_bit)
 */
if ((s-fe-options | s-be-options)  PR_O_FWDFOR) {
struct hdr_ctx ctx = { .idx = 0 };
-
if (!((s-fe-options | s-be-options)  PR_O_FF_ALWAYS) 
-   http_find_header2(s-be-fwdfor_hdr_name, 
s-be-fwdfor_hdr_len, req-p, txn-hdr_idx, ctx)) {
+   http_find_header2(s-be-fwdfor_hdr_len ? 
s-be-fwdfor_hdr_name : s-fe-fwdfor_hdr_name,
+ s-be-fwdfor_hdr_len ? 
s-be-fwdfor_hdr_len : s-fe-fwdfor_hdr_len,
+ req-p, txn-hdr_idx, ctx)) {
/* The header is set to be added only if none is present
 * and we found it, so don't do anything.
 */
-- 
1.7.10




Re: [PATCH] BUG/MEDIUM: option forwardfor if-none doesn't work with some configurations

2012-05-29 Thread Willy Tarreau
On Tue, May 29, 2012 at 11:27:41PM +0200, Cyril Bonté wrote:
 When option forwardfor is enabled in a frontend that uses backends,
 if-none ignores the header name provided in the frontend.
 This prevents haproxy to add the X-Forwarded-For header if the option is not
 used in the backend.

Thank you Cyril, applied to both 1.5-dev and 1.4.

Willy




Re: FW: haproxy conditional healthchecks/failover

2012-05-29 Thread Willy Tarreau
On Tue, May 29, 2012 at 08:32:29PM +, Zulu Chas wrote:
 
 am I wildly off course or is this config salvageable?
 

To be honnest, your mail with overly long lines (half a kilobyte) is painful
to read, and once I made the effort of reading it, I didn't understand why
you're trying to cross-dress something which already exists and works.
 
The disable-on-404 is made to permit enabling/disabling a server by a simple
touch or rm. It appears that you want to exactly swap these two commands,
it really makes no sense to me to modify haproxy to support such a swap in a
script.

Another reason for disabling on 404 is that it will not accidently enable a
server which was started from an unmounted docroot file system. With your
method, it would still start it.

Also, the suggested way of dealing with very specific health checks is to
write a CGI or servlet to handle the various situations. Most people are
already doing this, and if you absolutely want to use rm to start the
server and touch to stop it, then 5 lines of shell in a CGI will do it.

Regards,
Willy




Re: No PID file when running in foreground

2012-05-29 Thread Willy Tarreau
On Wed, May 23, 2012 at 09:08:15AM -0400, Chad Gatesman wrote:
 Is there a major reason the -p option to generate a pid file is ignored
 when running haproxy the foreground (e.g. using -db)?  It would be nice if
 this file was still generated when specified--even in foreground mode.
 Could this be something that could be changed in a future releases?

No it's not planned because all -dXXX are mainly debugging/development
switches. -db is used all the time during development since it allows
one to stop haproxy by a simple Ctrl-C. This is the only way I start it
when developing or troubleshooting configs. Having haproxy fail to start
because of an unwritable directory to write the pid file would be really
annoying.

I really don't understand why this would be useful to you. A pid file makes
sense for a background process since it saves you from searching it. But a
foreground process, what's the purpose ? Normally you're supposed to stop
it using Ctrl-C, so I fail to catch your use case.

Regards,
Willy