Re: node frequently goes down on another physical machine

Willy Tarreau Sun, 26 Dec 2010 21:55:42 -0800

Hi Amit,

On Fri, Dec 24, 2010 at 12:24:55PM +0530, Amit Nigam wrote:
(...)
I see nothing wrong in your configs which could justify your issues.


> Now in new stats page I noticed one thing which was not in 1.3.22 is 
> LastChk, but I wonder tc1 is showing L7OK/302 in 324ms _and tc2 is showing 
> L7OK/302 in 104ms _ while currently haproxy is running on LB1 and there are 
> 13 retries at TC2.

The only explanation I can see is a network connection issue. What you
describe looks like packet loss over the wire. It's possible that one
of your NICs is dying, or that the network cable or switch port is
defective.

You should try to perform a file transfer between the machine showing
issues and another one from the local network to verify this hypothesis.
If you can't achieve wire speed, it's possible you're having such a
problem. Then you should first move to another switch port (generally
easy), then swap the cable with another one (possibly swap the cables
between your two LBs if they're close) then try another port on the
machine.

Another possible explanation which becomes quite rare nowadays would
be that you'd be using a forced 100Mbps full duplex port on your switch
with a gigabit port on your server, which would negociate half duplex.
You can check for that with "ethtool eth0" on your LBs and TCs.

> Also can this issue be due to time differences between cluster nodes? as I 
> have seen there is a time difference of around 2 minutes between physical 
> machine 1 vms and physical machine 2 vms.

While it's a bad thing to have machines running at different times, I
don't see why it could cause any such issue.

Regards,
Willy

Re: node frequently goes down on another physical machine

Reply via email to