One more follow up to see if there are any other suggestions. Yesterday I added a sixth real server to the cluster. All of these servers are of the exact same type (bare metal machines). I installed and configured the new server exactly as the others. I added it to the cluster and tried it. It failed too, that is, sending requests to the VIP causes the real server to send a SYN-ACK (response to the SYN), but it is never seen by the client. The one working server, of the same type, continues to respond correctly!
Today I reconfigured a non-working server to use Direct Routing via the arptables_jf technique. I tried a request and it failed. Then I reconfigured the working server to use arptables_jf and it worked. So the failure continues on all bad servers with either DR configuration, and works on the sixth. I doubt five servers can have a hardware problem with their NICs. The cloud vendor has checked their smart switches and they state they are working fine. Thanks for listening and any support suggestions you may have. Regards, Bruce On 3/3/14 1:54 PM, Bruce Rudolph wrote: > On the failing real servers the response is sent but is never received > by the client (e4:11:5b:ae:f9:e5). On the working server the response > is sent and the client gets it and sends an ACK and the connection is > open. > > I run tcpdump on the client (my Mac for the testing) and that is how I > know that the SYN-ACK packet is not received from the failing real > servers. > > This is the mind boggling thing...where are they going? Could it be a > smart switch in the cloud environment? If so, then why would one > server out of five work correctly? > > The real servers are not responding to arping. Only the Directory does. > > Bruce > > On 3/3/14 12:28 PM, Julian Anastasov wrote: >> Hello, >> >> On Mon, 3 Mar 2014, Bruce Rudolph wrote: >> >>> 18:21:12.346386 Out e4:11:5b:ae:f9:e5 ethertype IPv4 (0x0800), >>> length 76: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP >>> (6), length 60) >>> <VIP>.80 > <CIP//>.62628: Flags [S.], cksum 0xf2a9 (correct), >>> seq 4207299083, ack 4011092519, win 14480, options [mss >>> 1460,sackOK,TS val 82369115 ecr 3844971164,nop,wscale 7], length 0 >> Response is going to e4:11:5b:ae:f9:e5 ? Do >> you see it reaching there? Also, simple test with >> client on LAN can reveal the problem, just check with >> tcpdump on client box. It can show if problem comes >> from router or from real servers. Sometimes, smart >> switches can be the culprit too. >> >> Also, check on real servers (mostly the working >> one) with tcpdump that you don't see the VIP in >> outgoing ARP packets, only director can expose the VIP >> in ARP packets. This can be also checked from client on >> LAN with 'arping -c 1 VIP', only the director should >> reply for VIP. >> >> Regards >> >> -- >> Julian Anastasov<j...@ssi.bg> > _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users