Thanks much, I'll start digging into the sysctls. I'm reasonably certain it isn't something with the app servers, because in the tcpdump output I can see the conversation between the load balancer and the app server complete successfully (all aspects of the request/response even), it's just from the load balancer to the client machines that gets tetchy. I will try the retry value though; it certainly wouldn't hurt and sounds like a good idea.
Thanks again, ;P mn On 2007/11/19 2:20 AM, "Reyk Floeter" <[EMAIL PROTECTED]> muttered eloquently: > hi! > > are you sure that the apaches are not dropping the connections when > you reach a specific limit of max connections? i've seen problems like > this with apache2+linux webservers. > > - make sure that you tuned some sysctls for hoststated. for example > kern.maxfiles, kern.somaxconn, kern.maxclusters, > net.inet.ip.ifq.maxlen. you have to be very careful when tuning the > sysctls, but you mostly always have to bump them up for L7 load > balancing. > > - try out the "retry" option in the table configuration. this is a > work-around for buggy backends. i experienced that the _backend_ > servers sometimes drop the inbound connection attempts, so i added > this option to immediatly retry it... which works very well. > > table foo { > real port 80 > check http '/ZendPlatform/client/getPing.php' code 200 > > host $www01 retry 2 > host $www02 retry 2 > host $www03 retry 2 > ... > > demote carp > } > > reyk > > On Mon, Nov 19, 2007 at 12:14:18AM -0800, Preston Norvell wrote: >> We have been trying to migrate from an Apache proxy balancer to hoststated >> and have run into a couple issues, one of which I have asked about and the I >> write about now. >> >> We are using 4.2-stable: >> OpenBSD mesh1 4.2 GENERIC.MP#1378 amd64 >> >> This particular issue is rather odd, such that I'm afraid my description may >> be somewhat confusing, but here goes... >> >> We are doing layer 7 http load balancing for an application hosted on 8+ >> machines behind the hoststated box for clients on the Internet. In our >> testing, we seem to have an issue with hoststated somewhat randomly dropping >> inbound connections to a resource behind it. It is not exactly >> deterministic, in that we cannot seem to generate a specific packet to make >> the connection fail, but it's just about statistically guaranteed to fail. >> The failure rate goes up as the traffic increases, though even a sequential >> run of 1000 single connections is likely to fail once or twice. >> >>> From a tcpdump standpoint, I see the connection established through the load >> balancer. The GET request is issued from the client machine, which is >> delivered by hoststated to the server, which dutifully considers the request >> and returns a valid response. Oddly though, on the client-facing side of >> the load balancer, immediately after the GET request is received, a FIN is >> sent from the load balancer itself. >> >> As stated, the likelihood of this occurring goes up with more traffic, even >> with low-bandwidth request/response sequences. The only message of any >> import in any log I've looked in is the following from /var/log/daemon: >> >> Nov 18 17:17:02 mesh1 hoststated[1945]: relay appx, session 2948 (50 >> active), a.b.c.d -> 10.100.0.208:8080, session failed >> >> There are no blocks in pf, and no errors as far as the app server is >> concerned. The connections work fine through a similarly configured OpenBSD >> firewall without hoststated in the loop. >> >> I'm not sure where to start looking next to narrow down the issue farther, >> does anyone have any suggestions? >> >> Thanks much, >> >> ;P mn >> >> -- >> Preston M Norvell <[EMAIL PROTECTED]> >> Systems/Network Administrator >> Serials Solutions <http://www.serialssolutions.com> >> Phone: (866) SERIALS (737-4257) ext 1094 >> -- Preston M Norvell <[EMAIL PROTECTED]> Systems/Network Administrator Serials Solutions <http://www.serialssolutions.com> Phone: (866) SERIALS (737-4257) ext 1094