Re: weird tcp syn/ack problem

Lincoln Wed, 02 Dec 2009 21:30:42 -0800

Hi Willy, I agree it's pretty confusing.

I should have been clearer - the problem does not happen every time, it's
very random.  But when it happens it always follows that exact pattern -
that's what I meant to say.


I actually have somaxconn set to 10000 so I don't think that's the issue.

At this point I'm thinking about scrapping my EC2 instances and trying 2 new
ones - you never know.

Just in case you have any other insights here's the output from the 3
commands you mentioned.  Thanks again for all your help!

Lincoln

r...@lb1:~$ uname -a
Linux domU-12-31-39-0A-92-72 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36
EST 2008 i686 i686 i386 GNU/Linux

r...@lb1:~$ netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP
TX-OVR Flg
eth0       1500   0 67999261      0      0      0 70299595      0      0
 0 BMRU
lo        16436   0  8045554      0      0      0  8045554      0      0
 0 LRU

r...@lb1:~$ netstat -s
Ip:
    76004137 total packets received
    2 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    76004135 incoming packets delivered
    78424996 requests sent out
Icmp:
    1700441 ICMP messages received
    6 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 1485599
        echo requests: 74856
        echo replies: 139986
    1559234 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 1484378
        echo replies: 74856
Tcp:
    15400091 active connections openings
    1500044 passive connection openings
    2110125 failed connection attempts
    646 connection resets received
    1 connections established
    72811429 segments received
    74607063 segments send out
    56735 segments retransmited
    1510 bad segments received.
    781257 resets sent
Udp:
    7887 packets received
    1484378 packets to unknown port received.
    0 packet receive errors
    1781057 packets sent
UdpLite:
TcpExt:
    2722 invalid SYN cookies received
    1922 resets received for embryonic SYN_RECV sockets
    712136 TCP sockets finished time wait in fast timer
    22808 time wait sockets recycled by time stamp
    851007 TCP sockets finished time wait in slow timer
    39530 passive connections rejected because of time stamp
    160 packets rejects in established connections because of timestamp
    585822 delayed acks sent
    23 delayed acks further delayed because of locked socket
    Quick ack mode was activated 6840 times
    19917 packets directly queued to recvmsg prequeue.
    338 packets directly received from prequeue
    15636688 packets header predicted
    26116808 acknowledgments not containing data received
    936810 predicted acknowledgments
    130 times recovered from packet loss due to fast retransmit
    7603 times recovered from packet loss due to SACK data
    3 bad SACKs received
    Detected reordering 22 times using FACK
    Detected reordering 6 times using SACK
    Detected reordering 13 times using reno fast retransmit
    Detected reordering 119 times using time stamp
    117 congestion windows fully recovered
    542 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 43
    14626 congestion windows recovered after partial ack
    6847 TCP data loss events
    60 timeouts after reno fast retransmit
    1965 timeouts after SACK recovery
    306 timeouts in loss state
    12099 fast retransmits
    3795 forward retransmits
    9935 retransmits in slow start
    23335 other TCP timeouts
    TCPRenoRecoveryFail: 74
    739 sack retransmits failed
    6890 DSACKs sent for old packets
    3367 DSACKs received
    19 DSACKs for out of order packets received
    643 connections reset due to unexpected data
    240 connections reset due to early user close
    201 connections aborted due to timeout

On Thu, Dec 3, 2009 at 12:16 AM, Willy Tarreau <w...@1wt.eu> wrote:

> On Wed, Dec 02, 2009 at 07:44:40PM -0500, Lincoln wrote:
> > Thanks Willy for offering to help us out with this.
> >
> > We are running on an Amazon EC2 m1small instance which is very common for
> a
> > load balancer machine.
> >
> > I changed /proc/sys/net/ipv4/tcp_timestamps to 1 - unfortunately to no
> > effect.
>
> OK.
>
> > Here are my iptables settings (nothing special here that I can see - I
> > haven't modified anything):
> > r...@lb1:~$ iptables -L
> > Chain INPUT (policy ACCEPT)
> > target     prot opt source               destination
> >
> > Chain FORWARD (policy ACCEPT)
> > target     prot opt source               destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target     prot opt source               destination
>
> OK so most likely it was not even loaded.
>
> > I would like to try accepting INVALIDs as you suggest - just to see if
> that
> > addresses the problem before digging deeper.  Unfortunately I'm not very
> > familiar with iptables - could you show me what I should run to try that?
>
> you don't need to because you don't have any iptables rules, so those are
> implicitly allowed. The common case I was talking about was when people
> explicitly drop packets in invalid state.
>
> > If not that, perhaps something else about the EC2 infrastructure is using
> > sequence number randomization?  Are there other things I can look for?
>
> If you don't have iptables, the your machine should have sent either a
> SYN/ACK or an ACK. If you really took the trace from the machine itself,
> then I have no explanation about the problem :-(
>
> You said that in every trace it was the same pattern, ie the first
> packet which was accepted was the SYN without timestamps. Are you
> absolutely sure it's *always* the case and it's not just random ?
> I'm asking because the system might refrain from sending a SYN/ACK
> when the TCP SYN backlog is full, which is completely independant
> from the SYN packet's shape. Your tcp parameters tuning were OK,
> but for the backlog you also need to set /proc/sys/net/core/somaxconn
> to a large value otherwise it serves as a max. By default it's very
> low (128). Try setting it to 10000 (you need to restart haproxy for
> the change to take effect).
>
> A "uname -a", "netstat -i" and "netstat -s" can help too.
>
> Regards,
> Willy
>
>

Re: weird tcp syn/ack problem

Reply via email to