On Thu, 18 Jun 2015 10:45:12 +0300 (EEST) Julian Anastasov <j...@ssi.bg> wrote: > Hello, Hi, > On Wed, 17 Jun 2015, Michael Vallaly wrote: > > > IPVS clusters with realservers on a remote L3 network (FWM 254), IPVS > > encapsulates the original packet (src 8.8.8.8 dst 4.4.4.4) in a IP > > header (src 192.168.254.15 dst 192.168.254.48) and emits the packet via > > the vlan500 interface passing it to the L2 (mac address) of > > 192.168.254.1. > > > > This worked swimmingly well until I noticed that very intermittently > > the following happens: > > > > IPVS encapsulates the original packet (src 8.8.8.8 dst 4.4.4.4) in a IP > > header (src 172.23.10.11 dst 192.168.254.48) and emits the packet via > > the vlan500 interface passing it to the mac address of 192.168.254.1. > > (NOTE: the use of 172.23.10.11 rather than the expected 192.168.254.48 > > source IP in the TUN header) > > May be IP 192.168.254.15 was removed? > Sorry I fat fingered the 192.168.254.15 IP address in the email. The sentence should have read "the use of 172.23.10.11 rather than the expected _192.168.254.15_ source IP in the TUN header"
> __ip_vs_get_out_rt should provide previous saddr but > after commit 026ace060dfe ("ipvs: optimize dst usage for real server") > we always provide 0.0.0.0 as initial source, so > do_output_route4 should always get fresh source address > and then will get second route with this source. > If my understanding of this is correct this means that the IPVS TUN code just binds its socket to all interfaces, which leverages the kernel routing code to select the "best" interface? As far as I can tell this seems to affect all traffic emitted from the IPVS TUN code for this IPVS cluster during the timeframe. Eg. I see no "valid" src_ip packets emitted for the remote L3 realserver during the timeframe. I confirmed this happens even with multiple local realservers in the same cluster as well. > So, now on dst_cache refresh we do not try > to preserve the previous saddr. > > If you see different address here, it means > it is returned by routing. The routing cache does > not keep source addresses but nexthops can remember > source returned by fib_info_update_nh_saddr. It > should be from the same subnet because you have > "via 192.168.254.1". Otherwise, first address from > device or system is returned. > I was under the impression that the routing cache was removed from the kernel after 3.6? I see there is some sort of i4flow caching that seems to be done now in the fib_trie, but I am not very familiar with it. Do you know of a way to dump/monitor the route nexthop information? I had attempted previous to my email to use "ip monitor route/neigh/link" and I don't see any netlink events around/during the incorrect packet emission timeframes. > Also __ip_vs_dst_cache_reset is called on > dest add/edit, for dests coming from trash... > I'll think more on this problem but for now I don't > see what can be the cause. To be explicitly clear the IPVS config for these tests is static (the realserver IP never changes), and no interfaces / route / policy changes take place on the machine. I have been suspecting that since the packets get emitted correctly > 99% of the time, the routing code obviously works the majority of the time, but maybe we are missing a "use" count, and the route to the NH eventually expires? Maybe there is a corner case code path that is getting executed? Unfortunately I don't understand why even if this was the case a route lookup would ever select a SRC IP from a different interface entirely. (Especially since the packet gets emitted out the correct interface (vlan500). Given that the policy route is in place which explicitly defines what interface / next hop to use, and there is a local IP bound to the interface which is directly connected to the nexthop subnet, why would any other interface IP be returned from the routing code?) Additionally strange/convenient seems to be the observed use of SRC_IP of interfaces currently being used by other IPVS clusters. Eg. On a box with 20 vlan interfaces, 100% of the errant SRC_IPs (7) were sourced from VLANs actively running other IPVS clusters (with locally connected realservers), the remaining 13 vlan interfacess IPs never got used by the IPVS xmit code. (but maybe I just need to wait longer). I don't see, as example the consistent use of only the first route/interface in the main routing table, which would seem to me to be a more natural failback/last resort. > > > So in the last 24 hours out of 1.5M packets emitted by IPVS on vlan500 > > (FWM 254) I had 349 packets which get emitted with the wrong source IP > > address in the Tunnel IP header. The periods where the wrong source IP > > is used by IPVS seem to last for ~2-5min at a time, and affects all > > traffic in the LVS cluster with remote L3 realservers. > > Do you see same IP/routing config when this > happens? Yup no IP/routing changes are being made to the system. I can fairly easily reproduce this behavior, and would be happy to try / provide any additional discovery/testing to get to the bottom of this. All suggestions / speculation welcome ;) Thanks! -Mike > > Regards > > -- > Julian Anastasov <j...@ssi.bg> > > _______________________________________________ > Please read the documentation before posting - it's available at: > http://www.linuxvirtualserver.org/ > > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org > Send requests to lvs-users-requ...@linuxvirtualserver.org > or go to http://lists.graemef.net/mailman/listinfo/lvs-users -- Michael Vallaly <l...@nolatency.com> _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users