On Thu, Dec 03, 2009 at 12:29:55AM -0500, Lincoln wrote:
> Hi Willy, I agree it's pretty confusing.
> 
> I should have been clearer - the problem does not happen every time, it's
> very random.  But when it happens it always follows that exact pattern -
> that's what I meant to say.

OK, that's what I understood first, but I wanted confirmation.

> I actually have somaxconn set to 10000 so I don't think that's the issue.

indeed.

> At this point I'm thinking about scrapping my EC2 instances and trying 2 new
> ones - you never know.

One large site I know about had problems with some instances that were
a lot slower than others, and looked like they were randomly losing a
lot of packets (probably sharing the same machine as others saturating
the bandwidth). When they switched to other instances, they discovered
that some of them were immediately receiving attacks, most likely
because they were abandonned by sites being attacked. It seems like
what works well is already used and what you can find unused is probably
bad... This site finally moved off there to solve their problems, which
were undebugable in virtualized environments.

> Just in case you have any other insights here's the output from the 3
> commands you mentioned.  Thanks again for all your help!
> 
> Lincoln
> 
> r...@lb1:~$ uname -a
> Linux domU-12-31-39-0A-92-72 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36
> EST 2008 i686 i686 i386 GNU/Linux

I don't know if it's the latest Xen kernel available, but 2.6.21 does not
sound like on of the best kernels to me, so maybe that can explain things,
though I'm not specifically aware of issues in it. Don't you have anything
more recent for these boxes ? This kernel was built almost 2 years ago, and
given the number of critical security vulnerabilities since, there must
have been updates.

> r...@lb1:~$ netstat -i
> Kernel Interface table
> Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP
> TX-OVR Flg
> eth0       1500   0 67999261      0      0      0 70299595      0      0
>  0 BMRU
> lo        16436   0  8045554      0      0      0  8045554      0      0
>  0 LRU

OK no drop here.

> r...@lb1:~$ netstat -s
> Tcp:
>     15400091 active connections openings
>     1500044 passive connection openings
>     2110125 failed connection attempts

Is it expected that you have that many failed connection
attempts ? Maybe one of your servers is down and it's just
the health checks count, but it looks large for a health
check. It's possible that we have the same problem on both
sides.

> TcpExt:
>     2722 invalid SYN cookies received

Do you have SYN cookies enabled ? If so, could you try disabling
them ?

>     1922 resets received for embryonic SYN_RECV sockets
>     712136 TCP sockets finished time wait in fast timer

That sounds a lot, how many connections per second do you get
in average ? And from a same IP address ?

>     39530 passive connections rejected because of time stamp

Troubling ! Looks like what you're experiencing. I don't know
under what condition it can happen. Maybe the sender's clock
is going backwards when it reuses a same connection ?

Willy


Reply via email to