In short: ip_vs_conn_in_get() does not match on fwmark, so incoming packets to the backup LVS that were forwarded from the master LVS will match a synchronized connection and thus be sent through ipvs on the backup LVS, which is also the destination realserver. ipvs will loop the packet, causing the node to hang. Without conn sync, the nodes work fine (though of course breaking existing connections when failing over). Tested on Linux 2.6.33.
Here's my setup: client ----+ 10.0.0.3 | vip: 10.0.0.10 / \ / \ +------------+ +------------+ | LVS A (mst)| | LVS B (bkp)| |Realserver A| |Realserver B| | 10.0.0.5 | | 10.0.0.6 | +------------+ +------------+ Both nodes are set up with the vip on lo:10, an iptables rule to set the fwmark if the request does not come from the other LVS and arp_ignore=1, arp_announce=2 on all interfaces. See net/iptables/ sysctl config for LVS master [3] and backup [4]. The realservers run lighttpd on port 9999 and bind to 0.0.0.0. Both nodes have an identical keepalived.conf, except for the priority. See full keepalived.conf for LVS A [5]. The important parts of it are shown below: virtual_server fwmark 10 { lb_algo rr lb_kind DR real_server 10.0.0.5 9999 {...} real_server 10.0.0.6 9999 {...} } The config includes notify_master/notify_backup scripts that start/stop the ipvs connection synchronization daemon. For testing purposes, the sync threshold is tweaked to sync after the TCP 3-way handshake is done (2 incoming packets seen: SYN and ACK): net.ipv4.vs.sync_threshold="2 10" The debug kernel output in [1] shows how the connection fails when the client queries the vip, LVS A is master, and the connection is forwarded to realserver B. The debug kernel output in [2] shows how the connection works when the client queries the vip, LVS B is the master, and the connection is forwarded to realserver B (itself), i.e. with no connection synchronization. Questions: 1. Should the ip_vs_conn_in_get() function also take fwmark into consideration when matching incoming packets to its list of established ipvs connections? 2. Is this the right way of setting up a two-node LVS setup with localnodes and connection synchronization on a modern kernel? (Assuming the conn sync would not break) thanks! S. *** [1]: Example of fail LVS A is master, balances to realserver B. The output below is from LVS B / realserver B kern.log after: * adding LOG entries to iptables -t filter, chain INPUT and OUTPUT * setting net.ipv4.vs.debug_level to 13 (max) * stripping away some crud, cleaning timestamps, etc * adding <notes> on progress Interesting lines: 11, 21, 28 1 <Connection from client to VIP> 2 [52.351] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:lvsA_mac:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 SYN 3 [52.351] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 not hit 4 [52.351] IPVS: lookup/out TCP 10.0.0.3:54590->10.0.0.10:9999 not hit 5 [52.351] IPVS: lookup service: fwm 0 TCP 10.0.0.10:9999 not hit 6 [52.351] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=54590 ACK SYN 7 [52.457] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:lvsA_mac:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK 8 [52.457] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 not hit 9 [52.457] IPVS: lookup/out TCP 10.0.0.3:54590->10.0.0.10:9999 not hit 10 <TCP handshake complete> 11 <IPVS state is synchronized from MASTER to BACKUP> 12 [52.869] IPVS: packet type=2 proto=17 daddr=224.0.0.81 ignored 13 [52.869] IPVS: Enter: ip_vs_receive, net/netfilter/ipvs/ip_vs_sync.c line 722 14 [52.869] IPVS: Leave: ip_vs_receive, net/netfilter/ipvs/ip_vs_sync.c line 733 15 [52.869] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 not hit 16 [52.869] IPVS: lookup service: fwm 0 TCP 10.0.0.10:9999 not hit 17 [53.353] IPVS: packet type=5 proto=2 daddr=224.0.0.81 ignored 18 <One line of data sent from client to VIP> 19 [60.906] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:lvsA_mac:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK PSH 20 <Packet matches synchronized state> 21 [60.906] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 hit 22 [60.906] IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 756 23 <IPVS forwards the packet to the local interface> 24 [60.906] filter-OUTPUT: IN= OUT=lo SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK PSH 25 [60.906] IPVS: Leave: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 789 26 [61.011] filter-INPUT : IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK PSH 27 <Packet matches synchronized state again ...> 28 [61.019] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 hit 29 [61.019] IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 756 30 <IPVS repeats the forwarding in a loop, machine stops responding> 31 [61.030] filter-OUTPUT: IN= OUT=lo SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK PSH 32 [61.041] IPVS: Leave: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 789 33 [61.074] filter-INPUT : IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=54590 DPT=9999 ACK PSH 34 [61.083] IPVS: lookup/in TCP 10.0.0.3:54590->10.0.0.10:9999 hit 35 [61.084] IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 756 36 <etc, etc> Note that the incoming packet is not fwmarked, and that the ipvs lookup/in check does not try to match on fwmark. *** [2] Example of success LVS B is master, balances to realserver B (itself). The output below is from LVS B / realserver B kern.log: Interesting lines: 13-16 1 <Connection from client to VIP> 2 [74.370] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 SYN 3 [74.370] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 4 [74.370] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 5 [74.370] IPVS: lookup service: fwm 0 TCP 10.0.0.10:9999 not hit 6 [74.370] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK SYN 7 [74.461] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 ACK 8 [74.471] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 9 [74.471] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 10 <TCP handshake complete> 11 <One line of data sent from client to VIP> 12 [76.894] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 ACK PSH 13 <Packet does not match synchronized state (there is none)> 14 [76.894] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 15 [76.894] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 16 <Packet exchange continues as normal> 17 [76.894] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK 18 [77.062] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 ACK PSH 19 [77.062] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 20 [77.062] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 21 [77.062] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK 22 [77.300] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 ACK PSH 23 [77.309] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 24 [77.309] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 25 [77.320] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK 26 [77.402] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK PSH 27 [77.439] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 ACK 28 [77.450] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 29 [77.450] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 30 [77.463] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK FIN 31 [77.508] filter-INPUT : IN=eth0 OUT= MAC=lvsB_mac:00:18:8b:6a:3d:a2:08:00 SRC=10.0.0.3 DST=10.0.0.10 SPT=38258 DPT=9999 ACK FIN 32 [77.518] IPVS: lookup/in TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 33 [77.518] IPVS: lookup/out TCP 10.0.0.3:38258->10.0.0.10:9999 not hit 34 [77.531] filter-OUTPUT: IN= OUT=eth0 SRC=10.0.0.10 DST=10.0.0.3 SPT=9999 DPT=38258 ACK 35 <etc, etc> *** [3]: LVS master / Realserver A: 1: lo: <LOOPBACK,UP,LOWER_UP> inet 127.0.0.1/8 scope host lo inet 10.0.0.10/32 scope global lo:10 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> inet 10.0.0.5/24 brd 10.0.0.255 scope global eth0 iptables -t mangle -A PREROUTING -d 10.0.0.10 -p tcp -m tcp \ --dport 9999 -m mac ! --mac-source <realserverB_mac> \ -j MARK --set-mark 10 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 net.ipv4.vs.sync_threshold = 2 10 *** [4]: LVS backup / Realserver B: 1: lo: <LOOPBACK,UP,LOWER_UP> inet 127.0.0.1/8 scope host lo inet 10.0.0.10/32 scope global lo:10 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> inet 10.0.0.6/24 brd 10.0.0.255 scope global eth0 iptables -t mangle -A PREROUTING -d 10.0.0.10 -p tcp -m tcp \ --dport 9999 -m mac ! --mac-source <realserverA_mac> \ -j MARK --set-mark 10 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 net.ipv4.vs.sync_threshold = 2 10 *** [5]: LVS master keepalived.conf global_defs { lvs_id testlvs } vrrp_sync_group test { group { VI_1 } } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 10 priority 100 advert_int 1 notify_master /etc/keepalived/master.sh notify_backup /etc/keepalived/backup.sh notify_fault /etc/keepalived/fault.sh authentication { auth_type pass auth_pass hulahoop } virtual_ipaddress { 10.0.0.10 } nopreempt } virtual_server fwmark 10 { lb_algo rr lb_kind DR persistence_timeout 0 delay_loop 20 protocol TCP real_server 10.0.0.5 9999 { weight 1 TCP_CHECK { connect_timeout 20 } } real_server 10.0.0.6 9999 { weight 1 TCP_CHECK { connect_timeout 20 } } } *** _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users