RE: Doesn't work for a very few visitors

2009-12-20 Thread Joe Torsitano
This was definitely not caused by too many connections or the conntrack
table being full.  In my test setup there were only five people trying to
connect... two people who were having the problem and three of us that it
worked fine for.  Still the strange thing is the exact same iptables setup
works fine for everyone involved connecting directly to the Apache server
but not going through HAProxy.  I also tried a couple other load balancing
solutions and got the same result (Apache fine, load balancing solution
fail).  The only load balancing solution that worked was Apache
mod_balancer, but it is too basic for my needs.

Anyway, on the load balancer I don't need a very sophisticated iptables.  As
long as I can do basic protection and run HAProxy everything will be fine.

Thanks!


--
 Joe Torsitano


-Original Message-
From: Willy Tarreau [mailto:w...@1wt.eu] 
Sent: Saturday, December 19, 2009 9:58 PM
To: John Lauro
Cc: 'Joe Torsitano'; haproxy@formilux.org
Subject: Re: Doesn't work for a very few visitors

On Sat, Dec 19, 2009 at 05:14:42PM -0500, John Lauro wrote:
> Are you using connection tracking with iptables?  If so, you might want to
> consider using a more basic configuration without connection tracking.

Indeed!

most likely you have a rule somewhere which does a REJECT on
INVALID packets and those poor users are running a buggy TCP
stack which breaks window scaling, SACKs or things like this,
regularly causing some INVALID packets to be detected by the
conntrack code.

Once I even found a user who was doing all of his browsing
using the same TCP source port ! You bet the conntrack has
good reasons to complain.

The other common issue with conntrack as shipped in common
distros is that it's tuned for a desktop system (ie not tuned).
And the table fills very fast when you use that on a server.
You can easily detect this by messages in kernel logs :
"Conntrack table is full".

Regards,
Willy
 

__ Information from ESET Smart Security, version of virus signature
database 4702 (20091219) __

The message was checked by ESET Smart Security.

http://www.eset.com
 




Re: Doesn't work for a very few visitors

2009-12-19 Thread Willy Tarreau
On Sat, Dec 19, 2009 at 05:14:42PM -0500, John Lauro wrote:
> Are you using connection tracking with iptables?  If so, you might want to
> consider using a more basic configuration without connection tracking.

Indeed!

most likely you have a rule somewhere which does a REJECT on
INVALID packets and those poor users are running a buggy TCP
stack which breaks window scaling, SACKs or things like this,
regularly causing some INVALID packets to be detected by the
conntrack code.

Once I even found a user who was doing all of his browsing
using the same TCP source port ! You bet the conntrack has
good reasons to complain.

The other common issue with conntrack as shipped in common
distros is that it's tuned for a desktop system (ie not tuned).
And the table fills very fast when you use that on a server.
You can easily detect this by messages in kernel logs :
"Conntrack table is full".

Regards,
Willy




RE: Doesn't work for a very few visitors

2009-12-19 Thread John Lauro
Are you using connection tracking with iptables?  If so, you might want to
consider using a more basic configuration without connection tracking.

 

What does your iptables configuration look like?

 

 

 

From: Joe Torsitano [mailto:jtorsit...@weatherforyou.com] 
Sent: Saturday, December 19, 2009 4:25 PM
To: Willy Tarreau
Cc: haproxy@formilux.org
Subject: Re: Doesn't work for a very few visitors

 

Hi Willy,

I have been using iptables on the HAProxy servers.  Luckily I found a couple
of willing test subject who were having the problem and shutting off
iptables seemed to correct it (they could then see the sites).  I use a
pretty basic iptables configuration just to restrict access to SSH and close
off all unused ports.  What is it about iptables that HAProxy doesn't get
along with?  Is there an iptables or other firewall configuration that will
work with HAProxy or do I just have to pretty much leave the server HAProxy
is running on wide open?

Thanks for the information.


-- 
Joe Torsitano




On Fri, Dec 18, 2009 at 11:04 PM, Willy Tarreau  wrote:

On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote:
> Hi Willy,
>
> What's strange is traffic still appears normal, and is, for probably at
> least 99% of the visitors.  Logged traffic remains about normal (hundreds
of
> thousands of visitors a day).  I just get a few e-mails asking why the
site
> has been down for days or when it will be back.  But I cannot recreate the
> problem.  And I know there are probably people who just don't e-mail and,
> unfortunately, don't come back.

yes, very possible unfortunately.

> Here is the config file with the IP addresses changed, pretty much the
> default that comes with it...

A few questions that come to mind :
- What version are you running by the way (haproxy -vv) ?
 Several cases of truncated responses were observed between
 1.3.16 and 1.3.18, and sometimes a 502 response could be
 sent if the server closed too fast before 1.3.19. So please
 endure you're on 1.3.22. More info here about the bugs in
 your version :

   http://haproxy.1wt.eu/knownbugs-1.3.html

- Have you tried to look for client errors in the logs ?

- Have you tried to look in the logs if you could find some of
 the complainers' traces ? Most often, you can check for the
 same class-B or class-C addresses as the IP that posted the
 mail, and try to isolate the accesses by taking the access
 time into account.

- are you sure that 2000 concurrent connections are enough ?
 You may check that in the logs too, as there is a field
 with connection counts.

- I'm seeing there is no "option httpclose" below. Could you
 try to add it in the defaults section and see if it changes
 anything ? Before doing that, please check that you don't
 have iptables enabled on your haproxy machine.

I'm also thinking about something else. You said that when
you don't go through haproxy you don't get any complaint.
Are your systems configured similarly ? I mean, the very
low rate of problems could very well be caused by some TCP
settings which are incompatible with a minority of users
running behind a buggy router/firewall.

In order to check this, you could run the following command
on each server (including the one with haproxy) :

   $ sysctl -a | fgrep net.ipv4.tcp

Please verify if tcp_ecn and tcp_window_scaling are at the
same values. If not, start by setting tcp_ecn to 0 on
the haproxy server. Then later you can try to similarly
disable tcp_window_scaling, though this one is far less
likely because it's enabled almost everywhere.

Also check with "ip route" and "ip address" on all servers
if you don't see a different MTU value on the default
route. It's possible that a small part of your clients
are still running misconfigured a PPPoE ADSL line and
can't send/receive full packets. There are still some
large sites who deal with that by setting their MTU to
1492 or even 1452 on the external interface. But this
is less likely.

Regards,
Willy





Internal Virus Database is out of date.
Checked by AVG - www.avg.com
Version: 8.5.427 / Virus Database: 270.14.105/2561 - Release Date: 12/12/09
19:39:00



Re: Doesn't work for a very few visitors

2009-12-19 Thread Joe Torsitano
Hi Willy,

I have been using iptables on the HAProxy servers.  Luckily I found a couple
of willing test subject who were having the problem and shutting off
iptables seemed to correct it (they could then see the sites).  I use a
pretty basic iptables configuration just to restrict access to SSH and close
off all unused ports.  What is it about iptables that HAProxy doesn't get
along with?  Is there an iptables or other firewall configuration that will
work with HAProxy or do I just have to pretty much leave the server HAProxy
is running on wide open?

Thanks for the information.


-- 
Joe Torsitano



On Fri, Dec 18, 2009 at 11:04 PM, Willy Tarreau  wrote:

> On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote:
> > Hi Willy,
> >
> > What's strange is traffic still appears normal, and is, for probably at
> > least 99% of the visitors.  Logged traffic remains about normal (hundreds
> of
> > thousands of visitors a day).  I just get a few e-mails asking why the
> site
> > has been down for days or when it will be back.  But I cannot recreate
> the
> > problem.  And I know there are probably people who just don't e-mail and,
> > unfortunately, don't come back.
>
> yes, very possible unfortunately.
>
> > Here is the config file with the IP addresses changed, pretty much the
> > default that comes with it...
>
> A few questions that come to mind :
> - What version are you running by the way (haproxy -vv) ?
>  Several cases of truncated responses were observed between
>  1.3.16 and 1.3.18, and sometimes a 502 response could be
>  sent if the server closed too fast before 1.3.19. So please
>  endure you're on 1.3.22. More info here about the bugs in
>  your version :
>
>http://haproxy.1wt.eu/knownbugs-1.3.html
>
> - Have you tried to look for client errors in the logs ?
>
> - Have you tried to look in the logs if you could find some of
>  the complainers' traces ? Most often, you can check for the
>  same class-B or class-C addresses as the IP that posted the
>  mail, and try to isolate the accesses by taking the access
>  time into account.
>
> - are you sure that 2000 concurrent connections are enough ?
>  You may check that in the logs too, as there is a field
>  with connection counts.
>
> - I'm seeing there is no "option httpclose" below. Could you
>  try to add it in the defaults section and see if it changes
>  anything ? Before doing that, please check that you don't
>  have iptables enabled on your haproxy machine.
>
> I'm also thinking about something else. You said that when
> you don't go through haproxy you don't get any complaint.
> Are your systems configured similarly ? I mean, the very
> low rate of problems could very well be caused by some TCP
> settings which are incompatible with a minority of users
> running behind a buggy router/firewall.
>
> In order to check this, you could run the following command
> on each server (including the one with haproxy) :
>
>$ sysctl -a | fgrep net.ipv4.tcp
>
> Please verify if tcp_ecn and tcp_window_scaling are at the
> same values. If not, start by setting tcp_ecn to 0 on
> the haproxy server. Then later you can try to similarly
> disable tcp_window_scaling, though this one is far less
> likely because it's enabled almost everywhere.
>
> Also check with "ip route" and "ip address" on all servers
> if you don't see a different MTU value on the default
> route. It's possible that a small part of your clients
> are still running misconfigured a PPPoE ADSL line and
> can't send/receive full packets. There are still some
> large sites who deal with that by setting their MTU to
> 1492 or even 1452 on the external interface. But this
> is less likely.
>
> Regards,
> Willy
>
>


Re: Doesn't work for a very few visitors

2009-12-18 Thread Willy Tarreau
On Fri, Dec 18, 2009 at 05:00:38PM -0800, Joe Torsitano wrote:
> Hi Willy,
> 
> What's strange is traffic still appears normal, and is, for probably at
> least 99% of the visitors.  Logged traffic remains about normal (hundreds of
> thousands of visitors a day).  I just get a few e-mails asking why the site
> has been down for days or when it will be back.  But I cannot recreate the
> problem.  And I know there are probably people who just don't e-mail and,
> unfortunately, don't come back.

yes, very possible unfortunately.

> Here is the config file with the IP addresses changed, pretty much the
> default that comes with it...

A few questions that come to mind :
- What version are you running by the way (haproxy -vv) ?
  Several cases of truncated responses were observed between
  1.3.16 and 1.3.18, and sometimes a 502 response could be
  sent if the server closed too fast before 1.3.19. So please
  endure you're on 1.3.22. More info here about the bugs in
  your version :

http://haproxy.1wt.eu/knownbugs-1.3.html

- Have you tried to look for client errors in the logs ?

- Have you tried to look in the logs if you could find some of
  the complainers' traces ? Most often, you can check for the
  same class-B or class-C addresses as the IP that posted the
  mail, and try to isolate the accesses by taking the access
  time into account.

- are you sure that 2000 concurrent connections are enough ?
  You may check that in the logs too, as there is a field
  with connection counts.

- I'm seeing there is no "option httpclose" below. Could you
  try to add it in the defaults section and see if it changes
  anything ? Before doing that, please check that you don't
  have iptables enabled on your haproxy machine.

I'm also thinking about something else. You said that when
you don't go through haproxy you don't get any complaint.
Are your systems configured similarly ? I mean, the very
low rate of problems could very well be caused by some TCP
settings which are incompatible with a minority of users
running behind a buggy router/firewall.

In order to check this, you could run the following command
on each server (including the one with haproxy) :

$ sysctl -a | fgrep net.ipv4.tcp

Please verify if tcp_ecn and tcp_window_scaling are at the
same values. If not, start by setting tcp_ecn to 0 on
the haproxy server. Then later you can try to similarly
disable tcp_window_scaling, though this one is far less
likely because it's enabled almost everywhere.

Also check with "ip route" and "ip address" on all servers
if you don't see a different MTU value on the default
route. It's possible that a small part of your clients
are still running misconfigured a PPPoE ADSL line and
can't send/receive full packets. There are still some
large sites who deal with that by setting their MTU to
1492 or even 1452 on the external interface. But this
is less likely.

Regards,
Willy




RE: Doesn't work for a very few visitors

2009-12-18 Thread Joe Torsitano
Hi Willy,

What's strange is traffic still appears normal, and is, for probably at
least 99% of the visitors.  Logged traffic remains about normal (hundreds of
thousands of visitors a day).  I just get a few e-mails asking why the site
has been down for days or when it will be back.  But I cannot recreate the
problem.  And I know there are probably people who just don't e-mail and,
unfortunately, don't come back.

Here is the config file with the IP addresses changed, pretty much the
default that comes with it...

# this config needs haproxy-1.1.28 or haproxy-1.2.1

global
log 127.0.0.1   local0
log 127.0.0.1   local1 notice
#log loghostlocal0 info
maxconn 4096
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
log global
modehttp
option  httplog
option  dontlognull
retries 3
option  redispatch
maxconn 2000
contimeout  5000
clitimeout  5
srvtimeout  5

listen  HTTP 12.34.56.78:80
modehttp
balance roundrobin
server  httpA 98.76.54.32:80 check inter 35000 rise 3 fall 3
server  httpB 23.45.67.89:80 check inter 35000 rise 3 fall 3
source  10.176.192.82

listen stats :2680
mode http
stats uri /

--
 Joe Torsitano


-Original Message-
From: Willy Tarreau [mailto:w...@1wt.eu] 
Sent: Thursday, December 17, 2009 9:46 PM
To: Joe Torsitano
Cc: haproxy@formilux.org
Subject: Re: Doesn't work for a very few visitors

Hi,

On Thu, Dec 17, 2009 at 02:07:41PM -0800, Joe Torsitano wrote:
> Whenever I turn on HAProxy everything appears to be working great.
However
> I always get two or three e-mails from people who ask when the site is
going
> to be back up.  They say the site can no longer be found in their browser,
> even people who have it bookmarked.  They say it's like the server is
down.
> As soon as I switch off HAProxy and have the requests delivered directly
> they say everything is fine again.  Unfortunately I've never been able to
> recreate it on almost a dozen computers I try.  I'm attempting to use
> HAProxy in HTTP mode with Apache servers.  Using the Apache load balancer
> works.  Any ideas?

not at all, sounds very strange. Are you sure you don't have
too short client-side timeouts, which would be compatible with
your local computers but not with remote clients (eg: 10 ms) ?

Otherwise, please post your config (you can mask your IPs if
you want).

Willy
 

__ Information from ESET Smart Security, version of virus signature
database 4700 (20091218) __

The message was checked by ESET Smart Security.

http://www.eset.com
 




Re: Doesn't work for a very few visitors

2009-12-17 Thread Willy Tarreau
Hi,

On Thu, Dec 17, 2009 at 02:07:41PM -0800, Joe Torsitano wrote:
> Whenever I turn on HAProxy everything appears to be working great.  However
> I always get two or three e-mails from people who ask when the site is going
> to be back up.  They say the site can no longer be found in their browser,
> even people who have it bookmarked.  They say it's like the server is down.
> As soon as I switch off HAProxy and have the requests delivered directly
> they say everything is fine again.  Unfortunately I've never been able to
> recreate it on almost a dozen computers I try.  I'm attempting to use
> HAProxy in HTTP mode with Apache servers.  Using the Apache load balancer
> works.  Any ideas?

not at all, sounds very strange. Are you sure you don't have
too short client-side timeouts, which would be compatible with
your local computers but not with remote clients (eg: 10 ms) ?

Otherwise, please post your config (you can mask your IPs if
you want).

Willy