Server health check being called from each pool

2015-05-01 Thread Michael Bushey
I have a master-master-master MySQL DB cluster, but run into deadlocks
if writes from one web node are across multiple DB servers, so I have
this:

   listen QA-Single-DB1:23321
bind 127.0.0.1:23321
option httpchk
default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2
server db1 db1:3306 check
server db2 db2:3306 check backup
server db3 db3:3306 check backup

  listen QA-Single-DB2:23322
bind 127.0.0.1:23322
option httpchk
default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2
server db2 db2:3306 check
server db3 db3:3306 check backup
server db1 db1:3306 check backup

  listen QA-Single-DB3:23323
bind 127.0.0.1:23323
option httpchk
default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2
server db3 db3:3306 check
server db1 db1:3306 check backup
server db2 db2:3306 check backup


This works, but each listen section is doing a health check. Is there
any way to specify the health check as a global default? Not having
backup and using balance source would almost work, but I have
multiple sites on one server. I would like the sites spread out over
the the three DB servers but with fail-over.

Thanks for any help/insight/comments!
Michael



Re: Server health check being called from each pool

2015-05-01 Thread Bryan Talbot
You're looking for track

http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#track

-Bryan


On Fri, May 1, 2015 at 5:34 PM, Michael Bushey corw...@gmail.com wrote:

 I have a master-master-master MySQL DB cluster, but run into deadlocks
 if writes from one web node are across multiple DB servers, so I have
 this:

listen QA-Single-DB1:23321
 bind 127.0.0.1:23321
 option httpchk
 default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2
 server db1 db1:3306 check
 server db2 db2:3306 check backup
 server db3 db3:3306 check backup

   listen QA-Single-DB2:23322
 bind 127.0.0.1:23322
 option httpchk
 default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2
 server db2 db2:3306 check
 server db3 db3:3306 check backup
 server db1 db1:3306 check backup

   listen QA-Single-DB3:23323
 bind 127.0.0.1:23323
 option httpchk
 default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2
 server db3 db3:3306 check
 server db1 db1:3306 check backup
 server db2 db2:3306 check backup


 This works, but each listen section is doing a health check. Is there
 any way to specify the health check as a global default? Not having
 backup and using balance source would almost work, but I have
 multiple sites on one server. I would like the sites spread out over
 the the three DB servers but with fail-over.

 Thanks for any help/insight/comments!
 Michael




Re: Config option for staging/dev backends?

2015-05-01 Thread Shawn Heisey
On 4/30/2015 4:08 PM, Cyril Bonté wrote:
 No, you didn't provide err as the minlevel argument.
 It should be something like :
   log 127.0.0.1   local0 notice err
 
 Also, ensure you don't have a log global somewhere in those backends
 or in the previously declared defaults section.

I now have a log line exactly like that (also tried with warning instead
of err) in my dev/staging backends, but I am still getting notifications
on those backends via the console and in ssh connections.

Message from syslogd@ at Fri May  1 10:43:27 2015 ...
localhost.localdomain haproxy[17258]: backend be-services-dev-8443 has
no server available!
Message from syslogd@ at Fri May  1 10:43:27 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has
no server available!
Message from syslogd@ at Fri May  1 11:29:51 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has
no server available!
Message from syslogd@ at Fri May  1 11:29:51 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has
no server available!

Any other ideas?

Thanks,
Shawn




100% epoll_wait loops in 1.5.11

2015-05-01 Thread dormando
Hi,

We're running a haproxy as a TLS unwrapping daemon (socket to socket) and
are running into some cases where processes will spin at 100% CPU for 5-30
seconds.

It looks related to half-closed or resetting TCP connections out to end
users, and always self-recovers after some amount of time.

The symptoms differ slightly:

Usually it looks like this:
epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) =
1
read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) =
1
read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) =
1
read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) =
1
read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 13) =
1

EPOLLIN is set, EPOLLHUP is set, and EPOLLRDHUP (0x2000, which is also
mapped to POLL_HUP internally). The read() always fails as EAGAIN and it
drops immediately back into the epoll loop.

Occasionally the syscalls are recvfrom() instead of read().

Occasionally the EPOLLERR flag is *also* set, yet it still loops. So the
connection has a fatal problem.

Occasionally it'll call epoll_wait() in a tight loop, with a combination of
the above options, but never make a syscall.

...and I've seen it with just EPOLLIN|RDHUP and no HUP.

The src/ssl_sock.c:ssl_sock_to_buf func appears to be most of the problem.
Compared to the raw socket function, nothing is checking for POLL_ERR.
However I'm not confident I know where to place the HUP detection, or how
it's looping with epoll_wait() without doing any syscalls.

OpenSSL internally buffers partial packets, but if the remote end is HUP'ed
there's probably no way you'll get the rest of the read.

I'm also not really sure how the sockets can end up in this state without
haproxy immediately closing them (IN + HUP + EAGAIN - socket was closed,
right?). It's an obvious enough weirdness that I feel like I'm reading this
wrong. It does also eventually wiggle into a state where haproxy closes the
conn completely, which is why it doesn't spin forever. It's not clear if
that's from a timeout or a state change in the socket.

Anything else I can dig up on this? We have some small modifications to
haproxy, but largely in unrelated areas of the code. I can share the
changes privately with someone if necessary. Haven't figured out a test
case yet, hoping the description makes the issue obvious to the authors.

thanks!
-Dormando
(not mailing from my usual address since I got RBL'ed :/ user on the same
IP had a spam field day)


Re: Config option for staging/dev backends?

2015-05-01 Thread Cyril Bonté

Le 01/05/2015 21:57, Shawn Heisey a écrit :

On 5/1/2015 12:30 PM, Cyril Bonté wrote:

Message from syslogd@ at Fri May  1 11:29:51 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has
no server available!

Any other ideas?


Please provide your configuration, I'm quite sure it's a
misconfiguration in it.


I'm very good at user error!  In some ways I prefer things that are my
fault, despite the embarrassment, because it means I can get the problem
fixed quickly.

Redacted config:

http://apaste.info/K5J


Ok, that's what I expected ;-)
As previously said, ensure that you don't have any log global in the 
defaults.
In order to not remove it from the defaults, you can also add no log 
to reset the logger list where you want.


Which means you can provide such configuration :
backend be-services-stg-8443
# description Back end for stg services requests.
no log
log 127.0.0.1   local0 notice warning

That should do the trick ;-)

--
Cyril Bonté



Re: Config option for staging/dev backends?

2015-05-01 Thread Shawn Heisey
On 5/1/2015 12:30 PM, Cyril Bonté wrote:
 Message from syslogd@ at Fri May  1 11:29:51 2015 ...
 localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has
 no server available!

 Any other ideas?
 
 Please provide your configuration, I'm quite sure it's a
 misconfiguration in it.

I'm very good at user error!  In some ways I prefer things that are my
fault, despite the embarrassment, because it means I can get the problem
fixed quickly.

Redacted config:

http://apaste.info/K5J

Thanks,
Shawn




Re: Config option for staging/dev backends?

2015-05-01 Thread Cyril Bonté

Le 01/05/2015 19:36, Shawn Heisey a écrit :

On 4/30/2015 4:08 PM, Cyril Bonté wrote:

No, you didn't provide err as the minlevel argument.
It should be something like :
   log 127.0.0.1   local0 notice err

Also, ensure you don't have a log global somewhere in those backends
or in the previously declared defaults section.


I now have a log line exactly like that (also tried with warning instead
of err) in my dev/staging backends, but I am still getting notifications
on those backends via the console and in ssh connections.

Message from syslogd@ at Fri May  1 10:43:27 2015 ...
localhost.localdomain haproxy[17258]: backend be-services-dev-8443 has
no server available!
Message from syslogd@ at Fri May  1 10:43:27 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has
no server available!
Message from syslogd@ at Fri May  1 11:29:51 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has
no server available!
Message from syslogd@ at Fri May  1 11:29:51 2015 ...
localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has
no server available!

Any other ideas?


Please provide your configuration, I'm quite sure it's a 
misconfiguration in it.



--
Cyril Bonté



Sharing a generic script for OCSP stapling retrieval

2015-05-01 Thread Shawn Heisey
I've built a shell script that will gather OCSP responses with the
'openssl' binary for a list of certificates.  This will be very helpful
for me when I get a production haproxy running that does OCSP stapling.

It consists of a script, a config file, and a set of certificates that
probably need to be PEM format.  Here is the script and an example of a
config file for that script, in case this might be helpful for anyone else:

http://apaste.info/Ume
http://apaste.info/kN8

Each line of the config file consists of two filenames separated by a
space.  The first filename must include the site certificate as the
first certificate in the file - just like you need for the crt option
on the bind parameter in haproxy.cfg.  The second filename must contain
the certificate for the issuing CA, which normally you will download
from the CA website.

If you find problems or have suggestions, don't hesitate to let me know.

The script may require some customization for *NIX operating systems
other than Ubuntu 14 with openssl installed from source.

I will be running this script hourly, but OCSP responses typically have
a valid lifetime well beyond one hour, so you could run it less frequently.

The script does not appear to require bash -- on Ubuntu 14, /bin/sh is
symlinked to dash, and it works.

Thanks,
Shawn