Hi, My fresh LVS installation start to malfunction after few hours.
I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only static files, not persistent ) When I start the service everything work as expected. After few hours one of the real servers become unavailable ( randomly ) even though ipvsadm show it is ok. The real server which is not answering is continuously been accessed by the nanny process every 6 sec and is accessible through its real IP. The only clue I found is that instead of 2 nanny process there are 4 nanny process ( 2 for each server ). logs show nothing of interest while pulse ruining. When I stop pulse all the related process are terminated beside the 4 nannies which I need to kill by hand: Here is a snip from the logs: Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15 Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15 Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! Here is the data I gathered while the build malfunctioned: ####### LVS server ( no backup server ) CentOS 5.4 - piranha-0.8.4-13.el5 - ipvsadm-1.24-10 # sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 1 # cat /etc/sysconfig/ha/lvs.cf serial_no = 37 primary = 82.81.215.137 service = lvs backup = 0.0.0.0 heartbeat = 1 heartbeat_port = 539 keepalive = 6 deadtime = 18 network = direct nat_nmask = 255.255.255.255 debug_level = NONE virtual Nginx { active = 1 address = 82.81.215.141 eth0:1 vip_nmask = 255.255.255.224 port = 80 send = "GET / HTTP/1.0\r\n\r\n" expect = "HTTP" use_regex = 0 load_monitor = none scheduler = wlc # Suppose to be RR - changed only to test if the scheduler is the problem - same effect protocol = tcp timeout = 6 reentry = 15 quiesce_server = 1 server bweb1.my-domain.com { address = 82.81.215.138 active = 1 weight = 1 } server bweb2.my-domain.com { address = 82.81.215.139 active = 1 weight = 1 } server bweb3.my-domain.com { address = 82.81.215.140 active = 0 weight = 1 } } # ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 82.81.215.141:80 wlc -> 82.81.215.139:80 Route 1 0 0 -> 82.81.215.138:80 Route 1 0 0 # ps auxw|egrep "nanny|ipv|lvs|pulse" root 2668 0.0 0.0 8456 692 ? Ss Jan16 0:00 /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs root 2670 0.0 0.0 8456 688 ? Ss Jan16 0:00 /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs root 5795 0.0 0.0 8488 372 ? Ss Jan17 0:00 pulse root 5798 0.0 0.0 8476 656 ? Ss Jan17 0:00 /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf root 5812 0.0 0.0 8456 692 ? Ss Jan17 0:00 /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs root 5813 0.0 0.0 8456 692 ? Ss Jan17 0:00 /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs ####### One of the servers ( the one that does not answer. though its identical to the other ) # arptables -L -n Chain IN (policy ACCEPT) target source-ip destination-ip source-hw destination-hw hlen op hrd pro DROP 0.0.0.0/0 82.81.215.141 00/00 00/00 any 0000/0000 0000/0000 0000/0000 Chain OUT (policy ACCEPT) target source-ip destination-ip source-hw destination-hw hlen op hrd pro mangle 0.0.0.0/0 82.81.215.141 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 82.81.215.139 Chain FORWARD (policy ACCEPT) target source-ip destination-ip source-hw destination-hw hlen op hrd pro # ifconfig eth0 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 inet addr:82.81.215.139 Bcast:82.81.215.159 Mask:255.255.255.224 inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:602454 errors:0 dropped:0 overruns:0 frame:0 TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:51144864 (48.7 MiB) TX bytes:251901147 (240.2 MiB) Interrupt:169 Memory:dcff0000-dd000000 eth0:1 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 inet addr:82.81.215.141 Bcast:82.81.215.159 Mask:255.255.255.224 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:169 Memory:dcff0000-dd000000 Thanks for any idea that might shade some light on this topic :) Best, Miki -------------------------------------------------- Michael Ben-Nes - Internet Consultant and Director. http://www.epoch.co.il - weaving the Net. Cellular: 054-4848113 -------------------------------------------------- _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users