On Thu, 18 Jun 2015 10:45:12 +0300 (EEST)
Julian Anastasov <j...@ssi.bg> wrote:
 
>       Hello,
Hi, 
 
> On Wed, 17 Jun 2015, Michael Vallaly wrote:
> 
> > IPVS clusters with realservers on a remote L3 network (FWM 254), IPVS
> > encapsulates the original packet (src 8.8.8.8 dst 4.4.4.4) in a IP
> > header (src 192.168.254.15 dst 192.168.254.48) and emits the packet via
> > the vlan500 interface passing it to the L2 (mac address) of
> > 192.168.254.1. 
> > 
> > This worked swimmingly well until I noticed that very intermittently
> > the following happens:
> > 
> > IPVS encapsulates the original packet (src 8.8.8.8 dst 4.4.4.4) in a IP
> > header (src 172.23.10.11 dst 192.168.254.48) and emits the packet via
> > the vlan500 interface passing it to the mac address of 192.168.254.1.
> > (NOTE: the use of 172.23.10.11 rather than the expected 192.168.254.48
> > source IP in the TUN header)
> 
>       May be IP 192.168.254.15 was removed?
> 
Sorry I fat fingered the 192.168.254.15 IP address in the email. The
sentence should have read "the use of 172.23.10.11 rather than the
expected _192.168.254.15_ source IP in the TUN header"

>       __ip_vs_get_out_rt should provide previous saddr but
> after commit 026ace060dfe ("ipvs: optimize dst usage for real server")
> we always provide 0.0.0.0 as initial source, so
> do_output_route4 should always get fresh source address
> and then will get second route with this source.
> 
If my understanding of this is correct this means that the IPVS TUN
code just binds its socket to all interfaces, which leverages the
kernel routing code to select the "best" interface?

As far as I can tell this seems to affect all traffic emitted from the
IPVS TUN code for this IPVS cluster during the timeframe. Eg. I see no
"valid" src_ip packets emitted for the remote L3 realserver during the
timeframe. I confirmed this happens even with multiple local
realservers in the same cluster as well.

>       So, now on dst_cache refresh we do not try
> to preserve the previous saddr.
> 
>       If you see different address here, it means
> it is returned by routing. The routing cache does
> not keep source addresses but nexthops can remember
> source returned by fib_info_update_nh_saddr. It
> should be from the same subnet because you have
> "via 192.168.254.1". Otherwise, first address from
> device or system is returned.
> 
I was under the impression that the routing cache was removed from the
kernel after 3.6? I see there is some sort of i4flow caching that
seems to be done now in the fib_trie, but I am not very familiar with
it. Do you know of a way to dump/monitor the route nexthop information?
I had attempted previous to my email to use "ip monitor
route/neigh/link" and I don't see any netlink events around/during the
incorrect packet emission timeframes.

>       Also __ip_vs_dst_cache_reset is called on
> dest add/edit, for dests coming from trash...
> I'll think more on this problem but for now I don't
> see what can be the cause.

To be explicitly clear the IPVS config for these tests is static (the
realserver IP never changes), and no interfaces / route / policy changes
take place on the machine. 

I have been suspecting that since the packets get emitted correctly >
99% of the time, the routing code obviously works the majority of the
time, but maybe we are missing a "use" count, and the route to
the NH eventually expires? Maybe there is a corner case code path
that is getting executed? Unfortunately I don't understand why even if
this was the case a route lookup would ever select a SRC IP from a
different interface entirely. (Especially since the packet gets emitted
out the correct interface (vlan500). Given that the policy route is in
place which explicitly defines what interface / next hop to use, and
there is a local IP bound to the interface which is directly connected
to the nexthop subnet, why would any other interface IP be returned from
the routing code?) 

Additionally strange/convenient seems to be the observed use of SRC_IP
of interfaces currently being used by other IPVS clusters.

Eg. On a box with 20 vlan interfaces, 100% of the errant SRC_IPs
(7) were sourced from VLANs actively running other IPVS clusters (with
locally connected realservers), the remaining 13 vlan interfacess IPs
never got used by the IPVS xmit code. (but maybe I just need to wait
longer). I don't see, as example the consistent use of only the first
route/interface in the main routing table, which would seem to me to
be a more natural failback/last resort.

> 
> > So in the last 24 hours out of 1.5M packets emitted by IPVS on vlan500
> > (FWM 254) I had 349 packets which get emitted with the wrong source IP
> > address in the Tunnel IP header. The periods where the wrong source IP
> > is used by IPVS seem to last for ~2-5min at a time, and affects all
> > traffic in the LVS cluster with remote L3 realservers. 
> 
>       Do you see same IP/routing config when this
> happens?

Yup no IP/routing changes are being made to the system. 

I can fairly easily reproduce this behavior, and would be happy to
try / provide any additional discovery/testing to get to the bottom of
this.

All suggestions / speculation welcome ;)

Thanks!

-Mike

> 
> Regards
> 
> --
> Julian Anastasov <j...@ssi.bg>
> 
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
> 
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-requ...@linuxvirtualserver.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users


-- 
Michael Vallaly <l...@nolatency.com>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-requ...@linuxvirtualserver.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Reply via email to