OK, this is somewhat funny, but I'm mostly done with this email and a VERY similar sounding problem was just asked a few minutes ago...
All, Long story short(ish): We put haproxy in front of a few servers that generate dynamic pages from a database. Here's a crude description of the setup: HAProxy -> 2 to 10 Apache servers -> Gateway (connection to db) -> Local caching database server ---(LAN or WAN)-> Database The point is that if the page is cached, the local caching db server will reply very fast. If not, it may take a few seconds to respond. We've also found that we basically HAVE to use keep alive (eg loading an image takes well under a second to load without HAProxy and perhaps .5 to 1.5 seconds with keepalive on whereas with keepalive off, the same image on the same page takes 12-18 seconds) if that makes a difference. Here's where things get a bit... tricky? We have httpcheck disabled. This is essentially because it's not working for us - at least how we'd like it to be. In a nutshell, we're getting a LOT of false positives where a server is listed as "up going down" or down when in reality, a non-cached page was simply taking a couple seconds (probably 3-5 but definitely less than 10) to load. The point is, we get several 503 errors throughout the day. And they appear to be random. Apache never goes down nor reports an error. Frankly, I think what's happening is that haproxy is hitting a server which takes too long to respond, so it tries another server (which also doesn't have the page cached) and goes through the list until it gives up and reports a 503. Meanwhile, if you go directly to the page on the Apache server, it loads fine. Or if you re-load using HAProxy, it works fine as well. I'm just wondering where to start with this. We have several sites experiencing the same problem, but since we're using roughly the same setup for each one, I'm not opposed to saying it could be how we have HAProxy set up. TIA.