Thanks Julian this helps me understand it a lot better. Are you suggesting using masquerading method? That isn't an ideal option for me unless of course it is the only option.
To see how much further I could get using DR, I removed the redirect and added the following to both real servers: iptables -t nat -A PREROUTING -p tcp -m tcp --destination 172.17.0.24 --dport 80 -j DNAT --to-destination 172.17.0.24:50000 After the DNAT update it now sends packets to the real server 2, however the port is not what the client expects. The problem is that the real server 2 receives packets on the port mapped port 50000 instead of port 80. Here is debug output when it connects to real server 2: Jan 28 23:58:57 pc01 kernel: IPVS: lookup service: fwm 100 TCP 172.17.0.24:50000 hit Jan 28 23:58:57 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling... Jan 28 23:58:57 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns 0 refcnt 1 weight 100 Jan 28 23:58:57 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:38193 v: 172.17.0.24:50000 d:172.17.0.17:50000 fwd:R s:4 conn->flags:183 conn->refcnt:1 dest->refcnt:2 Jan 28 23:58:57 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:38193 v: 172.17.0.24:50000 d:172.17.0.17:50000 conn->flags:101C3 conn->refcnt:2 Jan 28 23:58:57 pc01 kernel: IPVS: TCP input [S...] 172.17.0.17:50000-> 172.17.0.2:38193 state: NONE->SYN_RECV conn->refcnt:2 Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 1009 Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:57 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 1031 Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:22-> 172.17.0.2:41024 not hit Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:22-> 172.17.0.2:41024 not hit Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:41024-> 172.17.0.16:22 not hit Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:41024-> 172.17.0.16:22 not hit Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:57 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:38193-> 172.17.0.24:50000 not hit Jan 28 23:58:57 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:38193-> 172.17.0.24:50000 hit Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 1009 Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:57 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 1031 Jan 28 23:58:57 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 So we see above that the virtual address is 172.17.0.24:50000 ideally that would be port 80. Or destination address 172.17.0.17 of the RIP2 should be port 80. The following is the tcpdump on real server 2 showing that it is transmitting to the client with the unexpected port mapping of 50000 (so the connect hangs): tcpdump -iany -nn port 80 or port 50000 # (nothing was on the loopback just bond0) 23:58:59.446423 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [S], seq 1458168690, win 14600, options [mss 1460,sackOK,TS val 447300324 ecr 0,nop,wscale 7], length 0 23:58:59.446423 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [S], seq 1458168690, win 14600, options [mss 1460,sackOK,TS val 447300324 ecr 0,nop,wscale 7], length 0 23:58:59.446484 IP 172.17.0.24.50000 > 172.17.0.2.38193: Flags [S.], seq 59199797, ack 1458168691, win 28960, options [mss 1460,sackOK,TS val 353113117 ecr 447300324,nop,wscale 7], length 0 23:58:59.446487 IP 172.17.0.24.50000 > 172.17.0.2.38193: Flags [S.], seq 59199797, ack 1458168691, win 28960, options [mss 1460,sackOK,TS val 353113117 ecr 447300324,nop,wscale 7], length 0 23:58:59.446839 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [R], seq 1458168691, win 0, length 0 23:58:59.446839 IP 172.17.0.2.38193 > 172.17.0.24.50000: Flags [R], seq 1458168691, win 0, length 0 Here is debug output when it connects to real server 1: Jan 28 23:58:47 pc01 kernel: IPVS: lookup service: fwm 100 TCP 172.17.0.24:50000 hit Jan 28 23:58:47 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling... Jan 28 23:58:47 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns 0 refcnt 1 weight 100 Jan 28 23:58:47 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:38192 v: 172.17.0.24:50000 d:172.17.0.16:50000 fwd:R s:65276 conn->flags:183 conn->refcnt:1 dest->refcnt:2 Jan 28 23:58:47 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:38192 v: 172.17.0.24:50000 d:172.17.0.16:50000 conn->flags:101C3 conn->refcnt:2 Jan 28 23:58:47 pc01 kernel: IPVS: TCP input [S...] 172.17.0.16:50000-> 172.17.0.2:38192 state: NONE->SYN_RECV conn->refcnt:2 Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 1009 Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.24:50000-> 172.17.0.2:38192 not hit Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.24:50000-> 172.17.0.2:38192 not hit Jan 28 23:58:47 pc01 kernel: IPVS: lookup service: fwm 0 TCP 172.17.0.2:38192 not hit Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.16:22-> 172.17.0.2:41024 not hit Jan 28 23:58:47 pc01 kernel: IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1116 Jan 28 23:58:47 pc01 kernel: IPVS: lookup/out TCP 172.17.0.2:38192-> 172.17.0.24:50000 not hit Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.16:22-> 172.17.0.2:41024 not hit Jan 28 23:58:47 pc01 kernel: IPVS: lookup/in TCP 172.17.0.2:38192-> 172.17.0.24:50000 hit Jan 28 23:58:47 pc01 kernel: IPVS: TCP input [..A.] 172.17.0.16:50000-> 172.17.0.2:38192 state: SYN_RECV->ESTABLISHED conn->refcnt:2 The output of tcpdump shows that the connection is good on real server 1 -> client: tcpdump -iany -nn port 80 or port 50000 # (nothing was on the loopback just bond0) 23:58:47.241028 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [S], seq 2188819762, win 14600, options [mss 1460,sackOK,TS val 447290123 ecr 0,nop,wscale 7], length 0 23:58:47.241028 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [S], seq 2188819762, win 14600, options [mss 1460,sackOK,TS val 447290123 ecr 0,nop,wscale 7], length 0 23:58:47.241128 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [S.], seq 709044054, ack 2188819763, win 28960, options [mss 1460,sackOK,TS val 353091780 ecr 447290123,nop,wscale 7], length 0 23:58:47.241131 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [S.], seq 709044054, ack 2188819763, win 28960, options [mss 1460,sackOK,TS val 353091780 ecr 447290123,nop,wscale 7], length 0 23:58:47.241308 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780], length 0 23:58:47.241308 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780], length 0 23:58:47.241409 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [P.], seq 1:174, ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780], length 173 23:58:47.241409 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [P.], seq 1:174, ack 1, win 115, options [nop,nop,TS val 447290123 ecr 353091780], length 173 23:58:47.241443 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0 23:58:47.241446 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0 23:58:47.241569 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [F.], seq 1, ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0 23:58:47.241573 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [F.], seq 1, ack 174, win 235, options [nop,nop,TS val 353091780 ecr 447290123], length 0 23:58:47.241824 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0 23:58:47.241824 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [.], ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0 23:58:47.241907 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [F.], seq 174, ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0 23:58:47.241907 IP 172.17.0.2.38192 > 172.17.0.24.80: Flags [F.], seq 174, ack 2, win 115, options [nop,nop,TS val 447290124 ecr 353091780], length 0 23:58:47.241944 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 175, win 235, options [nop,nop,TS val 353091781 ecr 447290124], length 0 23:58:47.241946 IP 172.17.0.24.80 > 172.17.0.2.38192: Flags [.], ack 175, win 235, options [nop,nop,TS val 353091781 ecr 447290124], length 0 Thanks again for spending time debugging this. Jacoby On Tue, Jan 28, 2014 at 1:16 AM, Julian Anastasov <j...@ssi.bg> wrote: > > Hello, > > On Mon, 27 Jan 2014, Jacoby Hickerson wrote: > > > Certainly and that makes sense, I will consolidate what I've emailed > before > > with the additional information here. > > > > # PC info: Linux 3.12.5 for real servers 1 and 2, and Linux 3.9.10 for > the > > client box. > > > > There are 3 boxes total, client box, director/RIP1( real server 1) and > RIP2 > > (real server 2): > > - client box: > > inet 172.17.0.2/16 brd 172.17.255.255 scope global eth1 #CIP > > > > - director which is the same as real server 1 (RIP1). The client is on a > > separate box. > > inet 172.17.0.16/16 brd 172.17.255.255 scope global bond0 > > #RIP1 > > inet 172.17.0.24/16 brd 172.17.255.255 scope global secondary bond0:2 > #VIP > > > > - real server 2 (RIP2) > > inet 172.17.0.24/32 scope global lo:0 #VIP on > loopback > > inet 172.17.0.17/16 brd 172.17.255.255 scope global bond0 #RIP2 > > > > # ipvs setup on real server 1 (RIP1) only > > ipvsadm -C > > ipvsadm -A -f 100 -s rr > > ipvsadm -a -f 100 -r 172.17.0.16 -w 100 > > ipvsadm -a -f 100 -r 172.17.0.17 -w 100 > > > > # iptable rules (these rules are set for both real server 1 and real > server > > 2) > > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp > > --dport 80 -j MARK --set-xmark 0x64/0xffffffff > > iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT > > --to-ports 50000 > > iptables -t nat -A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT > > --to-ports 50000 > > > > The test I'm conducting is an http get from the client box connecting to > the > > VIP: > > - Issue the following command on the client box: > > curl -v 'http://172.17.0.24' > > > > On both real servers there is an nginx webserver listening on port 50000 > > > > I also turned on debugging and ran the curl command with port mapping > using > > level 12 debug (this is output when the issue occurs of no load > balancing). > > Debug output on real server 1 after executing the curl command the first > > time: > > > > Jan 24 23:05:44 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns > 0 > > refcnt 1 weight 100 > > The debug output was very helpful. > > Looks like -j REDIRECT combined with DR is a bad idea. > When packet comes to IPVS the daddr is already 172.17.0.16, > see the "v:172.17.0.16" line below: > > > Jan 24 23:05:44 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37455 > > v:172.17.0.16:50130 d:172.17.0.17:50130 fwd:R s:65276 conn->flags:183 > > conn->refcnt:1 dest->refcnt:2 > > The remote real server 2 is not configured for > such VIP (172.17.0.16). I don't remember when was > -j REDIRECT used for IPVS setups, may be for transparent > proxy setups. > > Why not just use NAT method for both servers > without any REDIRECT rules? > > Even -j DNAT --to-destination VIP:50000 has better > chance to use VIP instead of first IP. > > > Jan 24 23:05:44 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37455 > > v:172.17.0.16:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2 > > Jan 24 23:05:44 pc01 kernel: IPVS: TCP input [S...] > > 172.17.0.17:50130->172.17.0.2:37455 state: NONE->SYN_RECV conn->refcnt:2 > > Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit, > > net/netfilter/ipvs/ip_vs_xmit.c line 1009 > > Jan 24 23:05:44 pc01 kernel: IPVS: Enter: ip_vs_out, > > net/netfilter/ipvs/ip_vs_core.c line 1116 > > Jan 24 23:05:44 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit, > > net/netfilter/ipvs/ip_vs_xmit.c line 1031 > > Above "ip_vs_xmit.c line 1031" means packet was > sent to remote real server 2 (172.17.0.17) but due to > -j REDIRECT the daddr is 172.17.0.16. > > ... > > > Debug output on real server 1 after executing the curl command a second > > time: > > > > Jan 24 23:05:45 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling... > > Jan 24 23:05:45 pc01 kernel: IPVS: RR: server 172.17.0.16:0 activeconns > 0 > > refcnt 1 weight 100 > > Jan 24 23:05:45 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:37456 > > v:172.17.0.16:50130 d:172.17.0.16:50130 fwd:R s:65276 conn->flags:183 > > conn->refcnt:1 dest->refcnt:2 > > Jan 24 23:05:45 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:37456 > > v:172.17.0.16:50130 d:172.17.0.16:50130 conn->flags:101C3 conn->refcnt:2 > > Jan 24 23:05:45 pc01 kernel: IPVS: TCP input [S...] > > 172.17.0.16:50130->172.17.0.2:37456 state: NONE->SYN_RECV conn->refcnt:2 > > Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit, > > net/netfilter/ipvs/ip_vs_xmit.c line 1009 > > Jan 24 23:05:45 pc01 kernel: IPVS: Enter: ip_vs_out, > > net/netfilter/ipvs/ip_vs_core.c line 1116 > > No "ip_vs_xmit.c line 1031" here, packet was delivered > locally with NF_ACCEPT, so it goes to local real server > as per the "d:172.17.0.16" info. > > > Jan 24 23:05:45 pc01 kernel: IPVS: lookup/out TCP > > 172.17.0.16:50130->172.17.0.2:37456 hit > > ... > > > Below is an example of good results when connecting directly to port > 50000. > > So, no -j REDIRECT => no problem? > > > For this scenario I removed port 80 and updated iptables with fwmark for > > port 50000: > > iptables -t mangle -A PREROUTING -d 172.17.0.24/32 ! -i lo -p tcp -m tcp > > --dport 50000 -j MARK --set-xmark 0x64/0xffffffff > > > > Debug output on real server 1 when not port mapping first test (curl -v > > 'http://172.17.0.24:50000'): > > > > Jan 25 00:19:37 pc01 kernel: IPVS: ip_vs_rr_schedule(): Scheduling... > > Jan 25 00:19:37 pc01 kernel: IPVS: RR: server 172.17.0.17:0 activeconns > 0 > > refcnt 1 weight 100 > > Jan 25 00:19:37 pc01 kernel: IPVS: Bind-dest TCP c:172.17.0.2:42815 > > v:172.17.0.24:50130 d:172.17.0.17:50130 fwd:R s:4 conn->flags:183 > > conn->refcnt:1 dest->refcnt:2 > > Yep, "v:172.17.0.24" means no -j REDIRECT was used. > > > Jan 25 00:19:37 pc01 kernel: IPVS: Schedule fwd:R c:172.17.0.2:42815 > > v:172.17.0.24:50130 d:172.17.0.17:50130 conn->flags:101C3 conn->refcnt:2 > > Jan 25 00:19:37 pc01 kernel: IPVS: TCP input [S...] > > 172.17.0.17:50130->172.17.0.2:42815 state: NONE->SYN_RECV conn->refcnt:2 > > Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_dr_xmit, > > net/netfilter/ipvs/ip_vs_xmit.c line 1009 > > Jan 25 00:19:37 pc01 kernel: IPVS: new dst 172.17.0.17, src 172.17.0.16, > > refcnt=1 > > Jan 25 00:19:37 pc01 kernel: IPVS: Enter: ip_vs_out, > > net/netfilter/ipvs/ip_vs_core.c line 1116 > > Jan 25 00:19:37 pc01 kernel: IPVS: Leave: ip_vs_dr_xmit, > > net/netfilter/ipvs/ip_vs_xmit.c line 1031 > > Regards > > -- > Julian Anastasov <j...@ssi.bg> > _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users