Server health check being called from each pool
I have a master-master-master MySQL DB cluster, but run into deadlocks if writes from one web node are across multiple DB servers, so I have this: listen QA-Single-DB1:23321 bind 127.0.0.1:23321 option httpchk default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2 server db1 db1:3306 check server db2 db2:3306 check backup server db3 db3:3306 check backup listen QA-Single-DB2:23322 bind 127.0.0.1:23322 option httpchk default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2 server db2 db2:3306 check server db3 db3:3306 check backup server db1 db1:3306 check backup listen QA-Single-DB3:23323 bind 127.0.0.1:23323 option httpchk default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2 server db3 db3:3306 check server db1 db1:3306 check backup server db2 db2:3306 check backup This works, but each listen section is doing a health check. Is there any way to specify the health check as a global default? Not having backup and using balance source would almost work, but I have multiple sites on one server. I would like the sites spread out over the the three DB servers but with fail-over. Thanks for any help/insight/comments! Michael
Re: Server health check being called from each pool
You're looking for track http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#track -Bryan On Fri, May 1, 2015 at 5:34 PM, Michael Bushey corw...@gmail.com wrote: I have a master-master-master MySQL DB cluster, but run into deadlocks if writes from one web node are across multiple DB servers, so I have this: listen QA-Single-DB1:23321 bind 127.0.0.1:23321 option httpchk default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2 server db1 db1:3306 check server db2 db2:3306 check backup server db3 db3:3306 check backup listen QA-Single-DB2:23322 bind 127.0.0.1:23322 option httpchk default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2 server db2 db2:3306 check server db3 db3:3306 check backup server db1 db1:3306 check backup listen QA-Single-DB3:23323 bind 127.0.0.1:23323 option httpchk default-server port 9200 inter 5000 fastinter 2000 rise 2 fall 2 server db3 db3:3306 check server db1 db1:3306 check backup server db2 db2:3306 check backup This works, but each listen section is doing a health check. Is there any way to specify the health check as a global default? Not having backup and using balance source would almost work, but I have multiple sites on one server. I would like the sites spread out over the the three DB servers but with fail-over. Thanks for any help/insight/comments! Michael
Re: Config option for staging/dev backends?
On 4/30/2015 4:08 PM, Cyril Bonté wrote: No, you didn't provide err as the minlevel argument. It should be something like : log 127.0.0.1 local0 notice err Also, ensure you don't have a log global somewhere in those backends or in the previously declared defaults section. I now have a log line exactly like that (also tried with warning instead of err) in my dev/staging backends, but I am still getting notifications on those backends via the console and in ssh connections. Message from syslogd@ at Fri May 1 10:43:27 2015 ... localhost.localdomain haproxy[17258]: backend be-services-dev-8443 has no server available! Message from syslogd@ at Fri May 1 10:43:27 2015 ... localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has no server available! Message from syslogd@ at Fri May 1 11:29:51 2015 ... localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has no server available! Message from syslogd@ at Fri May 1 11:29:51 2015 ... localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has no server available! Any other ideas? Thanks, Shawn
100% epoll_wait loops in 1.5.11
Hi, We're running a haproxy as a TLS unwrapping daemon (socket to socket) and are running into some cases where processes will spin at 100% CPU for 5-30 seconds. It looks related to half-closed or resetting TCP connections out to end users, and always self-recovers after some amount of time. The symptoms differ slightly: Usually it looks like this: epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) = 1 read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily unavailable) epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) = 1 read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily unavailable) epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) = 1 read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily unavailable) epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 14) = 1 read(8881, 0x870ff53, 5)= -1 EAGAIN (Resource temporarily unavailable) epoll_wait(0, {{EPOLLIN|EPOLLHUP|0x2000, {u32=8881, u64=8881}}}, 200, 13) = 1 EPOLLIN is set, EPOLLHUP is set, and EPOLLRDHUP (0x2000, which is also mapped to POLL_HUP internally). The read() always fails as EAGAIN and it drops immediately back into the epoll loop. Occasionally the syscalls are recvfrom() instead of read(). Occasionally the EPOLLERR flag is *also* set, yet it still loops. So the connection has a fatal problem. Occasionally it'll call epoll_wait() in a tight loop, with a combination of the above options, but never make a syscall. ...and I've seen it with just EPOLLIN|RDHUP and no HUP. The src/ssl_sock.c:ssl_sock_to_buf func appears to be most of the problem. Compared to the raw socket function, nothing is checking for POLL_ERR. However I'm not confident I know where to place the HUP detection, or how it's looping with epoll_wait() without doing any syscalls. OpenSSL internally buffers partial packets, but if the remote end is HUP'ed there's probably no way you'll get the rest of the read. I'm also not really sure how the sockets can end up in this state without haproxy immediately closing them (IN + HUP + EAGAIN - socket was closed, right?). It's an obvious enough weirdness that I feel like I'm reading this wrong. It does also eventually wiggle into a state where haproxy closes the conn completely, which is why it doesn't spin forever. It's not clear if that's from a timeout or a state change in the socket. Anything else I can dig up on this? We have some small modifications to haproxy, but largely in unrelated areas of the code. I can share the changes privately with someone if necessary. Haven't figured out a test case yet, hoping the description makes the issue obvious to the authors. thanks! -Dormando (not mailing from my usual address since I got RBL'ed :/ user on the same IP had a spam field day)
Re: Config option for staging/dev backends?
Le 01/05/2015 21:57, Shawn Heisey a écrit : On 5/1/2015 12:30 PM, Cyril Bonté wrote: Message from syslogd@ at Fri May 1 11:29:51 2015 ... localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has no server available! Any other ideas? Please provide your configuration, I'm quite sure it's a misconfiguration in it. I'm very good at user error! In some ways I prefer things that are my fault, despite the embarrassment, because it means I can get the problem fixed quickly. Redacted config: http://apaste.info/K5J Ok, that's what I expected ;-) As previously said, ensure that you don't have any log global in the defaults. In order to not remove it from the defaults, you can also add no log to reset the logger list where you want. Which means you can provide such configuration : backend be-services-stg-8443 # description Back end for stg services requests. no log log 127.0.0.1 local0 notice warning That should do the trick ;-) -- Cyril Bonté
Re: Config option for staging/dev backends?
On 5/1/2015 12:30 PM, Cyril Bonté wrote: Message from syslogd@ at Fri May 1 11:29:51 2015 ... localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has no server available! Any other ideas? Please provide your configuration, I'm quite sure it's a misconfiguration in it. I'm very good at user error! In some ways I prefer things that are my fault, despite the embarrassment, because it means I can get the problem fixed quickly. Redacted config: http://apaste.info/K5J Thanks, Shawn
Re: Config option for staging/dev backends?
Le 01/05/2015 19:36, Shawn Heisey a écrit : On 4/30/2015 4:08 PM, Cyril Bonté wrote: No, you didn't provide err as the minlevel argument. It should be something like : log 127.0.0.1 local0 notice err Also, ensure you don't have a log global somewhere in those backends or in the previously declared defaults section. I now have a log line exactly like that (also tried with warning instead of err) in my dev/staging backends, but I am still getting notifications on those backends via the console and in ssh connections. Message from syslogd@ at Fri May 1 10:43:27 2015 ... localhost.localdomain haproxy[17258]: backend be-services-dev-8443 has no server available! Message from syslogd@ at Fri May 1 10:43:27 2015 ... localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has no server available! Message from syslogd@ at Fri May 1 11:29:51 2015 ... localhost.localdomain haproxy[23754]: backend be-services-stg-8443 has no server available! Message from syslogd@ at Fri May 1 11:29:51 2015 ... localhost.localdomain haproxy[23754]: backend be-services-dev-8443 has no server available! Any other ideas? Please provide your configuration, I'm quite sure it's a misconfiguration in it. -- Cyril Bonté
Sharing a generic script for OCSP stapling retrieval
I've built a shell script that will gather OCSP responses with the 'openssl' binary for a list of certificates. This will be very helpful for me when I get a production haproxy running that does OCSP stapling. It consists of a script, a config file, and a set of certificates that probably need to be PEM format. Here is the script and an example of a config file for that script, in case this might be helpful for anyone else: http://apaste.info/Ume http://apaste.info/kN8 Each line of the config file consists of two filenames separated by a space. The first filename must include the site certificate as the first certificate in the file - just like you need for the crt option on the bind parameter in haproxy.cfg. The second filename must contain the certificate for the issuing CA, which normally you will download from the CA website. If you find problems or have suggestions, don't hesitate to let me know. The script may require some customization for *NIX operating systems other than Ubuntu 14 with openssl installed from source. I will be running this script hourly, but OCSP responses typically have a valid lifetime well beyond one hour, so you could run it less frequently. The script does not appear to require bash -- on Ubuntu 14, /bin/sh is symlinked to dash, and it works. Thanks, Shawn