RE: Doesn't work for a very few visitors
This was definitely not caused by too many connections or the conntrack table being full. In my test setup there were only five people trying to connect... two people who were having the problem and three of us that it worked fine for. Still the strange thing is the exact same iptables setup works fine for everyone involved connecting directly to the Apache server but not going through HAProxy. I also tried a couple other load balancing solutions and got the same result (Apache fine, load balancing solution fail). The only load balancing solution that worked was Apache mod_balancer, but it is too basic for my needs. Anyway, on the load balancer I don't need a very sophisticated iptables. As long as I can do basic protection and run HAProxy everything will be fine. Thanks! -- Joe Torsitano -Original Message- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Saturday, December 19, 2009 9:58 PM To: John Lauro Cc: 'Joe Torsitano'; haproxy@formilux.org Subject: Re: Doesn't work for a very few visitors On Sat, Dec 19, 2009 at 05:14:42PM -0500, John Lauro wrote: > Are you using connection tracking with iptables? If so, you might want to > consider using a more basic configuration without connection tracking. Indeed! most likely you have a rule somewhere which does a REJECT on INVALID packets and those poor users are running a buggy TCP stack which breaks window scaling, SACKs or things like this, regularly causing some INVALID packets to be detected by the conntrack code. Once I even found a user who was doing all of his browsing using the same TCP source port ! You bet the conntrack has good reasons to complain. The other common issue with conntrack as shipped in common distros is that it's tuned for a desktop system (ie not tuned). And the table fills very fast when you use that on a server. You can easily detect this by messages in kernel logs : "Conntrack table is full". Regards, Willy __ Information from ESET Smart Security, version of virus signature database 4702 (20091219) __ The message was checked by ESET Smart Security. http://www.eset.com
Re: Doesn't work for a very few visitors
On Sat, Dec 19, 2009 at 05:14:42PM -0500, John Lauro wrote: > Are you using connection tracking with iptables? If so, you might want to > consider using a more basic configuration without connection tracking. Indeed! most likely you have a rule somewhere which does a REJECT on INVALID packets and those poor users are running a buggy TCP stack which breaks window scaling, SACKs or things like this, regularly causing some INVALID packets to be detected by the conntrack code. Once I even found a user who was doing all of his browsing using the same TCP source port ! You bet the conntrack has good reasons to complain. The other common issue with conntrack as shipped in common distros is that it's tuned for a desktop system (ie not tuned). And the table fills very fast when you use that on a server. You can easily detect this by messages in kernel logs : "Conntrack table is full". Regards, Willy
RE: Doesn't work for a very few visitors
Are you using connection tracking with iptables? If so, you might want to consider using a more basic configuration without connection tracking. What does your iptables configuration look like? From: Joe Torsitano [mailto:jtorsit...@weatherforyou.com] Sent: Saturday, December 19, 2009 4:25 PM To: Willy Tarreau Cc: haproxy@formilux.org Subject: Re: Doesn't work for a very few visitors Hi Willy, I have been using iptables on the HAProxy servers. Luckily I found a couple of willing test subject who were having the problem and shutting off iptables seemed to correct it (they could then see the sites). I use a pretty basic iptables configuration just to restrict access to SSH and close off all unused ports. What is it about iptables that HAProxy doesn't get along with? Is there an iptables or other firewall configuration that will work with HAProxy or do I just have to pretty much leave the server HAProxy is running on wide open? Thanks for the information. -- Joe Torsitano On Fri, Dec 18, 2009 at 11:04 PM, Willy Tarreau wrote: On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote: > Hi Willy, > > What's strange is traffic still appears normal, and is, for probably at > least 99% of the visitors. Logged traffic remains about normal (hundreds of > thousands of visitors a day). I just get a few e-mails asking why the site > has been down for days or when it will be back. But I cannot recreate the > problem. And I know there are probably people who just don't e-mail and, > unfortunately, don't come back. yes, very possible unfortunately. > Here is the config file with the IP addresses changed, pretty much the > default that comes with it... A few questions that come to mind : - What version are you running by the way (haproxy -vv) ? Several cases of truncated responses were observed between 1.3.16 and 1.3.18, and sometimes a 502 response could be sent if the server closed too fast before 1.3.19. So please endure you're on 1.3.22. More info here about the bugs in your version : http://haproxy.1wt.eu/knownbugs-1.3.html - Have you tried to look for client errors in the logs ? - Have you tried to look in the logs if you could find some of the complainers' traces ? Most often, you can check for the same class-B or class-C addresses as the IP that posted the mail, and try to isolate the accesses by taking the access time into account. - are you sure that 2000 concurrent connections are enough ? You may check that in the logs too, as there is a field with connection counts. - I'm seeing there is no "option httpclose" below. Could you try to add it in the defaults section and see if it changes anything ? Before doing that, please check that you don't have iptables enabled on your haproxy machine. I'm also thinking about something else. You said that when you don't go through haproxy you don't get any complaint. Are your systems configured similarly ? I mean, the very low rate of problems could very well be caused by some TCP settings which are incompatible with a minority of users running behind a buggy router/firewall. In order to check this, you could run the following command on each server (including the one with haproxy) : $ sysctl -a | fgrep net.ipv4.tcp Please verify if tcp_ecn and tcp_window_scaling are at the same values. If not, start by setting tcp_ecn to 0 on the haproxy server. Then later you can try to similarly disable tcp_window_scaling, though this one is far less likely because it's enabled almost everywhere. Also check with "ip route" and "ip address" on all servers if you don't see a different MTU value on the default route. It's possible that a small part of your clients are still running misconfigured a PPPoE ADSL line and can't send/receive full packets. There are still some large sites who deal with that by setting their MTU to 1492 or even 1452 on the external interface. But this is less likely. Regards, Willy Internal Virus Database is out of date. Checked by AVG - www.avg.com Version: 8.5.427 / Virus Database: 270.14.105/2561 - Release Date: 12/12/09 19:39:00
Re: Doesn't work for a very few visitors
Hi Willy, I have been using iptables on the HAProxy servers. Luckily I found a couple of willing test subject who were having the problem and shutting off iptables seemed to correct it (they could then see the sites). I use a pretty basic iptables configuration just to restrict access to SSH and close off all unused ports. What is it about iptables that HAProxy doesn't get along with? Is there an iptables or other firewall configuration that will work with HAProxy or do I just have to pretty much leave the server HAProxy is running on wide open? Thanks for the information. -- Joe Torsitano On Fri, Dec 18, 2009 at 11:04 PM, Willy Tarreau wrote: > On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote: > > Hi Willy, > > > > What's strange is traffic still appears normal, and is, for probably at > > least 99% of the visitors. Logged traffic remains about normal (hundreds > of > > thousands of visitors a day). I just get a few e-mails asking why the > site > > has been down for days or when it will be back. But I cannot recreate > the > > problem. And I know there are probably people who just don't e-mail and, > > unfortunately, don't come back. > > yes, very possible unfortunately. > > > Here is the config file with the IP addresses changed, pretty much the > > default that comes with it... > > A few questions that come to mind : > - What version are you running by the way (haproxy -vv) ? > Several cases of truncated responses were observed between > 1.3.16 and 1.3.18, and sometimes a 502 response could be > sent if the server closed too fast before 1.3.19. So please > endure you're on 1.3.22. More info here about the bugs in > your version : > >http://haproxy.1wt.eu/knownbugs-1.3.html > > - Have you tried to look for client errors in the logs ? > > - Have you tried to look in the logs if you could find some of > the complainers' traces ? Most often, you can check for the > same class-B or class-C addresses as the IP that posted the > mail, and try to isolate the accesses by taking the access > time into account. > > - are you sure that 2000 concurrent connections are enough ? > You may check that in the logs too, as there is a field > with connection counts. > > - I'm seeing there is no "option httpclose" below. Could you > try to add it in the defaults section and see if it changes > anything ? Before doing that, please check that you don't > have iptables enabled on your haproxy machine. > > I'm also thinking about something else. You said that when > you don't go through haproxy you don't get any complaint. > Are your systems configured similarly ? I mean, the very > low rate of problems could very well be caused by some TCP > settings which are incompatible with a minority of users > running behind a buggy router/firewall. > > In order to check this, you could run the following command > on each server (including the one with haproxy) : > >$ sysctl -a | fgrep net.ipv4.tcp > > Please verify if tcp_ecn and tcp_window_scaling are at the > same values. If not, start by setting tcp_ecn to 0 on > the haproxy server. Then later you can try to similarly > disable tcp_window_scaling, though this one is far less > likely because it's enabled almost everywhere. > > Also check with "ip route" and "ip address" on all servers > if you don't see a different MTU value on the default > route. It's possible that a small part of your clients > are still running misconfigured a PPPoE ADSL line and > can't send/receive full packets. There are still some > large sites who deal with that by setting their MTU to > 1492 or even 1452 on the external interface. But this > is less likely. > > Regards, > Willy > >
Re: Doesn't work for a very few visitors
On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote: > Hi Willy, > > What's strange is traffic still appears normal, and is, for probably at > least 99% of the visitors. Logged traffic remains about normal (hundreds of > thousands of visitors a day). I just get a few e-mails asking why the site > has been down for days or when it will be back. But I cannot recreate the > problem. And I know there are probably people who just don't e-mail and, > unfortunately, don't come back. yes, very possible unfortunately. > Here is the config file with the IP addresses changed, pretty much the > default that comes with it... A few questions that come to mind : - What version are you running by the way (haproxy -vv) ? Several cases of truncated responses were observed between 1.3.16 and 1.3.18, and sometimes a 502 response could be sent if the server closed too fast before 1.3.19. So please endure you're on 1.3.22. More info here about the bugs in your version : http://haproxy.1wt.eu/knownbugs-1.3.html - Have you tried to look for client errors in the logs ? - Have you tried to look in the logs if you could find some of the complainers' traces ? Most often, you can check for the same class-B or class-C addresses as the IP that posted the mail, and try to isolate the accesses by taking the access time into account. - are you sure that 2000 concurrent connections are enough ? You may check that in the logs too, as there is a field with connection counts. - I'm seeing there is no "option httpclose" below. Could you try to add it in the defaults section and see if it changes anything ? Before doing that, please check that you don't have iptables enabled on your haproxy machine. I'm also thinking about something else. You said that when you don't go through haproxy you don't get any complaint. Are your systems configured similarly ? I mean, the very low rate of problems could very well be caused by some TCP settings which are incompatible with a minority of users running behind a buggy router/firewall. In order to check this, you could run the following command on each server (including the one with haproxy) : $ sysctl -a | fgrep net.ipv4.tcp Please verify if tcp_ecn and tcp_window_scaling are at the same values. If not, start by setting tcp_ecn to 0 on the haproxy server. Then later you can try to similarly disable tcp_window_scaling, though this one is far less likely because it's enabled almost everywhere. Also check with "ip route" and "ip address" on all servers if you don't see a different MTU value on the default route. It's possible that a small part of your clients are still running misconfigured a PPPoE ADSL line and can't send/receive full packets. There are still some large sites who deal with that by setting their MTU to 1492 or even 1452 on the external interface. But this is less likely. Regards, Willy
RE: Doesn't work for a very few visitors
Hi Willy, What's strange is traffic still appears normal, and is, for probably at least 99% of the visitors. Logged traffic remains about normal (hundreds of thousands of visitors a day). I just get a few e-mails asking why the site has been down for days or when it will be back. But I cannot recreate the problem. And I know there are probably people who just don't e-mail and, unfortunately, don't come back. Here is the config file with the IP addresses changed, pretty much the default that comes with it... # this config needs haproxy-1.1.28 or haproxy-1.2.1 global log 127.0.0.1 local0 log 127.0.0.1 local1 notice #log loghostlocal0 info maxconn 4096 chroot /var/lib/haproxy user haproxy group haproxy daemon #debug #quiet defaults log global modehttp option httplog option dontlognull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 5 srvtimeout 5 listen HTTP 12.34.56.78:80 modehttp balance roundrobin server httpA 98.76.54.32:80 check inter 35000 rise 3 fall 3 server httpB 23.45.67.89:80 check inter 35000 rise 3 fall 3 source 10.176.192.82 listen stats :2680 mode http stats uri / -- Joe Torsitano -Original Message- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Thursday, December 17, 2009 9:46 PM To: Joe Torsitano Cc: haproxy@formilux.org Subject: Re: Doesn't work for a very few visitors Hi, On Thu, Dec 17, 2009 at 02:07:41PM -0800, Joe Torsitano wrote: > Whenever I turn on HAProxy everything appears to be working great. However > I always get two or three e-mails from people who ask when the site is going > to be back up. They say the site can no longer be found in their browser, > even people who have it bookmarked. They say it's like the server is down. > As soon as I switch off HAProxy and have the requests delivered directly > they say everything is fine again. Unfortunately I've never been able to > recreate it on almost a dozen computers I try. I'm attempting to use > HAProxy in HTTP mode with Apache servers. Using the Apache load balancer > works. Any ideas? not at all, sounds very strange. Are you sure you don't have too short client-side timeouts, which would be compatible with your local computers but not with remote clients (eg: 10 ms) ? Otherwise, please post your config (you can mask your IPs if you want). Willy __ Information from ESET Smart Security, version of virus signature database 4700 (20091218) __ The message was checked by ESET Smart Security. http://www.eset.com
Re: Doesn't work for a very few visitors
Hi, On Thu, Dec 17, 2009 at 02:07:41PM -0800, Joe Torsitano wrote: > Whenever I turn on HAProxy everything appears to be working great. However > I always get two or three e-mails from people who ask when the site is going > to be back up. They say the site can no longer be found in their browser, > even people who have it bookmarked. They say it's like the server is down. > As soon as I switch off HAProxy and have the requests delivered directly > they say everything is fine again. Unfortunately I've never been able to > recreate it on almost a dozen computers I try. I'm attempting to use > HAProxy in HTTP mode with Apache servers. Using the Apache load balancer > works. Any ideas? not at all, sounds very strange. Are you sure you don't have too short client-side timeouts, which would be compatible with your local computers but not with remote clients (eg: 10 ms) ? Otherwise, please post your config (you can mask your IPs if you want). Willy