Thanks Willy for offering to help us out with this.

We are running on an Amazon EC2 m1small instance which is very common for a
load balancer machine.

I changed /proc/sys/net/ipv4/tcp_timestamps to 1 - unfortunately to no
effect.

Here are my iptables settings (nothing special here that I can see - I
haven't modified anything):
r...@lb1:~$ iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

I would like to try accepting INVALIDs as you suggest - just to see if that
addresses the problem before digging deeper.  Unfortunately I'm not very
familiar with iptables - could you show me what I should run to try that?

If not that, perhaps something else about the EC2 infrastructure is using
sequence number randomization?  Are there other things I can look for?

Thanks again,
Lincoln


On Wed, Dec 2, 2009 at 5:42 PM, Willy Tarreau <w...@1wt.eu> wrote:

> Hi,
>
> On Wed, Dec 02, 2009 at 04:47:01PM -0500, Lincoln wrote:
> > Hi, I'm running HAProxy as my load balancer and sometimes (but not all
> the
> > time) clients experience an 11s delay.  The delay is always about 11s
> when
> > it happens.
> >
> > I used Wireshark to try and see what was happening (screenshot from the
> > capture on the haproxy box attached).
> >
> > As you can see SYNs are retried over and over but not ACKed until for
> some
> > reason WS, TSV, and TSER are not passed in the request.  When this
> happens
> > it always happens the same number of times and always takes the same
> > duration before acknowledgement happens.
> >
> > Here is my config file (it's worth mentioning that 7 of the 10 aux
> servers
> > are not in rotation - don't think that has anything to do with anything).
> >
> > Any ideas on what is going on here?  I'm a relative novice at tuning tcp
> (my
> > tuning script is at the very bottom of this email).  Also, this happens
> > regardless of whether there is any load on our site.
>
> well, what you show is rather surprizing. It has nothing to do with your
> haproxy config since haproxy only comes into play when the system receives
> the ACK from the client. However, I find the case interesting enough to
> investigate it.
>
> The fact that after some time the SYN is retransmitted without any option
> is caused by the client which finally attempts to remove some of its
> options
> (window scaling and timestamps) in the hope that it will finally establish
> (and it was right to try because it worked).
>
> The question is why doesn't the system acknowledge thoe SYN packets ?
>
> I see nothing wrong in your network settings BTW.
>
> Hmmm, I have an idea. Would you happen to run your haproxy machine
> behind a firewall which does sequence number randomization (typically
> a cisco PIX, FWSM or an OpenBSD firewall) ? If so, what generally
> happens is that when a source port is reused early by a client and
> the session is still present in your system's table in TIME_WAIT
> state, then the second session's sequence number is random and can
> be less than the last sequence number of the previous session from
> the same port. The packet is then considered as invalid by the local
> TCP stack (which conforms to RFC793), but at least your stack should
> send an ACK back to the client (instead of a SYN/ACK), to indicate
> what sequence number it expects. The client can then send an RST and
> retry with a new SYN which will get accepted.
>
> Since we don't see this ACK, it means that either something is
> preventing the SYN from reaching the stack or something is preventing
> the SYN/ACK from going out.
>
> Are you running iptables on this machine ? If so, maybe your rules
> are too strict and either the faulty SYN or the SYN/ACK can't pass
> through. Check if you have a rule matching state INVALID and check
> the counter. From memories, I used to accept INVALID states on SYNs
> and a few other combinations to permit such situations to work, but
> it was in old times, so things might have changed since.
>
> I see that your local system has tcp_timetamps disabled. Normally,
> it's the workaround for random sequence numbers. You just enable
> timestamps and the local stack applies the PAWS algorithm to
> correctly identify that the new packet is not from the old session.
> Netfilter also supports it, but I think it will only work for it
> if your local system emits timestamps. You can do that by echoing
> 1 into /proc/sys/net/ipv4/tcp_timestamps. It may very well fix the
> issue.
>
> Please try that and keep us informed. And if your provider runs a
> firewall as described above, please tell them that their option is
> broken. At least on cisco it can be disabled on a per-rule basis
> now :-)
>
> Regards,
> Willy
>
>

Reply via email to