----- dan...@iogearbox.net wrote: > On 03/15/2018 10:21 AM, Shmulik Ladkani wrote: > > Regarding the premise of this commit, this "reduces" the > > ipvs/orphan/mark scrubbing in the following *non* xnet situations: > > > > 1. mac2vlan port xmit to other macvlan ports in Bridge Mode > > 2. similarly for ipvlan > > 3. veth xmit > > 4. l2tp_eth_dev_recv > > 5. bpf redirect/clone_redirect ingress actions > > > > Regarding l2tp recv, this commit seems to align the srubbing > behavior > > with ip tunnels (full scrub only if crossing netns, see > ip_tunnel_rcv). > > > > Regarding veth xmit, it does makes sense to preserve the fields if > not > > crossing netns. This is also the case when one uses tc mirred. > > > > Regarding bpf redirect, well, it depends on the expectations of each > bpf > > program. > > I'd argue that preserving the fields (at least the mark field) in > the > > *non* xnet makes sense and provides more information and therefore > more > > capabilities; Alas this might change behavior already being relied > on. > > > > Maybe Daniel can comment on the matter. > > Overall I think it might be nice to not need scrubbing skb in such > cases, > although my concern would be that this has potential to break > existing > setups when they would expect mark being zero on other veth peer in > any > case since it's the behavior for a long time already. The safer > option > would be to have some sort of explicit opt-in e.g. on link creation to > let > the skb->mark pass through unscrubbed. This would definitely be a > useful > option e.g. when mark is set in the netns facing veth via > clsact/egress > on xmit and when the container is unprivileged anyway. > > Thanks, > Daniel
I see your point in regards to backwards comparability. However, not scrubbing skb when it cross netns via some kernel functions compared to others is basically a bug which could easily break with a little bit of more refactoring. Therefore, it seems a bit weird to me to from now on, we will force every user on link creation to consider that once there was a bug leading to this weird behavior on specific netdevs. Thus, I suggest to maybe control this via a global /proc/sys/net file instead. -Liran