Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
Le 14/03/2016 18:57, Mahesh Bandewar a écrit : On Sun, Mar 13, 2016 at 8:53 PM, David Miller wrote: [snip] Furthermore, when you walk across the ns boundary, that old device has to disappear. That's why that is the device assigned to skb->dev. The layer boundaries are not that well maintained. We do check for the xfrm policies in L4 and expect the skb->dev pointing to the L3 device. So unless we have a way to derive a L3 dev from skb->dev, I don't think xfrm will work. Unless some Xfrm-expert asserts that this is not needed. Adding a hook "at the right place" to do the switch is probably the better way. For xfrm, you will need to handle it in this hook or rearrange things. I don't think that a quick and easy solution will be possible.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
On Sun, Mar 13, 2016 at 8:53 PM, David Miller wrote: > > Please stop pretending that this device switching is ok, it's not. +1 This is what I have been complaining about since v1...
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
On Sun, Mar 13, 2016 at 5:01 PM, Mahesh Bandewar wrote: >>> If I understand correctly (and as Cong already said), information are >>> leaking >>> between netns during the input phase. On the tx side, skb_scrub_packet() is >>> called, but not on the rx side. I think it's wrong. There should be an >>> explicit >>> boundary. >> >> That is not what I am complaining about. >> >> I dislike the trick of switching skb->dev pointer with skb->dev->l3_dev. >> This is not how we switch netns, nor the way how netns works. >> > How it is different from what we are doing currently? > > Current: Use skb->dev for L3 processing and derive netns from skb->dev > Proposal: use skb->dev->l3_dev for L3 processing and derive netns from > skb->dev->l3_dev If you ever read the part you quote below, you will have the answer. > >> Look at veth pair or dev_change_net_namespace(), each time when we >> switch netns, we need to do a full reregistration or a full reentrance, we >> never just switch some pointers to switch netns. This is why I said it breaks >> isolation. ^ You miss this part. >> >> Also, it is ugly to hide such a ipvlan-specific pointer for half of the RX >> code >> path. > I think I have already mentioned, I'm adding RX code now and later > I'll add TX code to use > l3_dev to make it symmetric. This way all L3 (Tx/Rx) will use this > device reference > always. You are trying to convince me by telling me you will add more ugly code?? Seriously??
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
On Sun, Mar 13, 2016 at 8:53 PM, David Miller wrote: > From: Mahesh Bandewar > Date: Sun, 13 Mar 2016 19:29:58 -0700 > >> On Sun, Mar 13, 2016 at 6:50 PM, David Miller wrote: >>> It doesn't matter whether doing so or not makes sense. >>> >>> You're going to have to find a way to do both, and also I'm concerned >>> about how you're leaking the source namespace's "stuff" into the >>> destination's. That's very worrisome to me. >> >> If we add a new mode (e.g. L3s) and preserve current mode as is it, >> then that should address your first concern. > > Also, I don't want all of this device translation stuff all over the > place. > I could add skb->dev. Is that OK? Then non of this translation / helper-stuff is required. I'm definitely open for suggestions. > Furthermore, when you walk across the ns boundary, that old device has > to disappear. That's why that is the device assigned to skb->dev. > The layer boundaries are not that well maintained. We do check for the xfrm policies in L4 and expect the skb->dev pointing to the L3 device. So unless we have a way to derive a L3 dev from skb->dev, I don't think xfrm will work. Unless some Xfrm-expert asserts that this is not needed. > Please stop pretending that this device switching is ok, it's not.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
From: Mahesh Bandewar Date: Sun, 13 Mar 2016 19:29:58 -0700 > On Sun, Mar 13, 2016 at 6:50 PM, David Miller wrote: >> It doesn't matter whether doing so or not makes sense. >> >> You're going to have to find a way to do both, and also I'm concerned >> about how you're leaking the source namespace's "stuff" into the >> destination's. That's very worrisome to me. > > If we add a new mode (e.g. L3s) and preserve current mode as is it, > then that should address your first concern. Also, I don't want all of this device translation stuff all over the place. Furthermore, when you walk across the ns boundary, that old device has to disappear. That's why that is the device assigned to skb->dev. Please stop pretending that this device switching is ok, it's not.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
On Sun, Mar 13, 2016 at 6:50 PM, David Miller wrote: > From: Mahesh Bandewar > Date: Wed, 9 Mar 2016 13:49:49 -0800 > >> This does happen as expected for egress traffic however on ingress >> traffic, the IPvlan packet-handler changes the skb->dev and this >> forces packet to be processed with the IPvlan slave and it's >> associated ns. This causes above mentioned problem and few other >> which are not yet reported / attempted. e.g. IPsec with L3 mode or >> even ingress routing. > > And now your changes make it so that we can't run netfilter rules on > the slave's device within the slave's ns. > > If someone is doing that now you're breaking things for them. > Yes, I agree. If someone is using netfilter hooks in slave-ns, then this would break for them. If we do this as a separate mode (e.g. L3s) keeping the current mode as it is. Would that be an acceptable solution? > It doesn't matter whether doing so or not makes sense. > > You're going to have to find a way to do both, and also I'm concerned > about how you're leaking the source namespace's "stuff" into the > destination's. That's very worrisome to me. > If we add a new mode (e.g. L3s) and preserve current mode as is it, then that should address your first concern. Now about leaking ns "stuff". I do not want to leak stuff across ns. In the current L3 mode there is an issue that Nicolas has pointed out and I'll fix that soon. In the new mode (L3s); right from the beginning the L2 and L3 processing always (and consistently) will happen with master device (proposed skb->dev->l3_dev). > There is no way I'm applying this as-is, sorry.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
From: Mahesh Bandewar Date: Wed, 9 Mar 2016 13:49:49 -0800 > This does happen as expected for egress traffic however on ingress > traffic, the IPvlan packet-handler changes the skb->dev and this > forces packet to be processed with the IPvlan slave and it's > associated ns. This causes above mentioned problem and few other > which are not yet reported / attempted. e.g. IPsec with L3 mode or > even ingress routing. And now your changes make it so that we can't run netfilter rules on the slave's device within the slave's ns. If someone is doing that now you're breaking things for them. It doesn't matter whether doing so or not makes sense. You're going to have to find a way to do both, and also I'm concerned about how you're leaking the source namespace's "stuff" into the destination's. That's very worrisome to me. There is no way I'm applying this as-is, sorry.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
>> If I understand correctly (and as Cong already said), information are >> leaking >> between netns during the input phase. On the tx side, skb_scrub_packet() is >> called, but not on the rx side. I think it's wrong. There should be an >> explicit >> boundary. > > That is not what I am complaining about. > > I dislike the trick of switching skb->dev pointer with skb->dev->l3_dev. > This is not how we switch netns, nor the way how netns works. > How it is different from what we are doing currently? Current: Use skb->dev for L3 processing and derive netns from skb->dev Proposal: use skb->dev->l3_dev for L3 processing and derive netns from skb->dev->l3_dev > Look at veth pair or dev_change_net_namespace(), each time when we > switch netns, we need to do a full reregistration or a full reentrance, we > never just switch some pointers to switch netns. This is why I said it breaks > isolation. > > Also, it is ugly to hide such a ipvlan-specific pointer for half of the RX > code > path. I think I have already mentioned, I'm adding RX code now and later I'll add TX code to use l3_dev to make it symmetric. This way all L3 (Tx/Rx) will use this device reference always.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
> If I understand correctly (and as Cong already said), information are > leaking > between netns during the input phase. On the tx side, skb_scrub_packet() is > called, but not on the rx side. I think it's wrong. There should be an > explicit > boundary. > I think we used to do dev_forward_skb() in the RX path which used to do skb_scrub_packet(). When we added the optimization to avoid queuing second time just to deliver packet to the slave (by doing RX_HANDLER_ANOTHER), we lost the skb_scrub_packet() which we used to get automatically. I believe we can add that! Thanks for pointing it out. > Another small comment: maybe finding another name than l3_dev could help to > avoid confusion with the existing l3mdev. I have absolutely no preference for the name. I used it to indicate this is the *L3 device*. Please suggest.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
On Thu, Mar 10, 2016 at 1:47 AM, Nicolas Dichtel wrote: > Le 09/03/2016 22:49, Mahesh Bandewar a écrit : >> >> From: Mahesh Bandewar >> >> One of the major request (for enhancement) that I have received >> from various users of IPvlan in L3 mode is its inability to handle >> IPtables. >> >> While looking at the code and how we handle ingress, the problem >> can be attributed to the asymmetry in the way packets get processed >> for IPvlan devices configured in L3 mode. L3 mode is supposed to >> be restrictive and all the L3 decisions need to be taken for the >> traffic in master's ns. This does happen as expected for egress >> traffic however on ingress traffic, the IPvlan packet-handler >> changes the skb->dev and this forces packet to be processed with >> the IPvlan slave and it's associated ns. This causes above mentioned >> problem and few other which are not yet reported / attempted. e.g. >> IPsec with L3 mode or even ingress routing. >> >> This could have been solved if we had a way to handover packet to >> slave and associated ns after completing the L3 phase. This is a >> non-trivial issue to fix especially looking at IPsec code. >> >> This patch series attempts to solve this problem by introducing the >> device pointer l3_dev which resides in net_device structure in the >> RX cache line. We initialize the l3_dev to self. This would mean >> there is no complex logic to when-and-how-to initialize it. Now >> the stack will use this dev pointer during the L3 phase. This should >> not alter any existing properties / behavior and also there should >> not be any additional penalties since it resides in the same RX >> cache line. > > If I understand correctly (and as Cong already said), information are > leaking > between netns during the input phase. On the tx side, skb_scrub_packet() is > called, but not on the rx side. I think it's wrong. There should be an > explicit > boundary. That is not what I am complaining about. I dislike the trick of switching skb->dev pointer with skb->dev->l3_dev. This is not how we switch netns, nor the way how netns works. Look at veth pair or dev_change_net_namespace(), each time when we switch netns, we need to do a full reregistration or a full reentrance, we never just switch some pointers to switch netns. This is why I said it breaks isolation. Also, it is ugly to hide such a ipvlan-specific pointer for half of the RX code path.
Re: [PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
Le 09/03/2016 22:49, Mahesh Bandewar a écrit : From: Mahesh Bandewar One of the major request (for enhancement) that I have received from various users of IPvlan in L3 mode is its inability to handle IPtables. While looking at the code and how we handle ingress, the problem can be attributed to the asymmetry in the way packets get processed for IPvlan devices configured in L3 mode. L3 mode is supposed to be restrictive and all the L3 decisions need to be taken for the traffic in master's ns. This does happen as expected for egress traffic however on ingress traffic, the IPvlan packet-handler changes the skb->dev and this forces packet to be processed with the IPvlan slave and it's associated ns. This causes above mentioned problem and few other which are not yet reported / attempted. e.g. IPsec with L3 mode or even ingress routing. This could have been solved if we had a way to handover packet to slave and associated ns after completing the L3 phase. This is a non-trivial issue to fix especially looking at IPsec code. This patch series attempts to solve this problem by introducing the device pointer l3_dev which resides in net_device structure in the RX cache line. We initialize the l3_dev to self. This would mean there is no complex logic to when-and-how-to initialize it. Now the stack will use this dev pointer during the L3 phase. This should not alter any existing properties / behavior and also there should not be any additional penalties since it resides in the same RX cache line. If I understand correctly (and as Cong already said), information are leaking between netns during the input phase. On the tx side, skb_scrub_packet() is called, but not on the rx side. I think it's wrong. There should be an explicit boundary. Another small comment: maybe finding another name than l3_dev could help to avoid confusion with the existing l3mdev.
[PATCH next v2 0/7] Introduce l3_dev pointer for L3 processing
From: Mahesh Bandewar One of the major request (for enhancement) that I have received from various users of IPvlan in L3 mode is its inability to handle IPtables. While looking at the code and how we handle ingress, the problem can be attributed to the asymmetry in the way packets get processed for IPvlan devices configured in L3 mode. L3 mode is supposed to be restrictive and all the L3 decisions need to be taken for the traffic in master's ns. This does happen as expected for egress traffic however on ingress traffic, the IPvlan packet-handler changes the skb->dev and this forces packet to be processed with the IPvlan slave and it's associated ns. This causes above mentioned problem and few other which are not yet reported / attempted. e.g. IPsec with L3 mode or even ingress routing. This could have been solved if we had a way to handover packet to slave and associated ns after completing the L3 phase. This is a non-trivial issue to fix especially looking at IPsec code. This patch series attempts to solve this problem by introducing the device pointer l3_dev which resides in net_device structure in the RX cache line. We initialize the l3_dev to self. This would mean there is no complex logic to when-and-how-to initialize it. Now the stack will use this dev pointer during the L3 phase. This should not alter any existing properties / behavior and also there should not be any additional penalties since it resides in the same RX cache line. Now coming back to IPvlan setup. The typical IPvlan L3 setup where master is in default-ns while the slaves are put inside different (slave) net-ns. During the initialization phase, the driver can make l3_dev for each slave to point to its master-device. This will solve the current IPT problem as well as make the packet processing symmetric from stack perspective. For all other packet processing since dev->l3_dev is same as dev, there should not be any behavioral or functional change. Mahesh Bandewar (7): dev: Add netif_get_l3_dev() helper dev: Update packet dispatcher to pass l3_dev as an input device ipv4: Use l3_dev for L3 ingress processing ipv6: Use l3_dev for L3 ingress processing raw: Use l3_dev for L3 ingress raw packet delivery xfrm: Use l3_dev for xfrm policy checks. ipvlan: Implement L3-symmetric mode. drivers/net/ipvlan/ipvlan_main.c | 16 +--- include/linux/netdevice.h| 6 ++ include/net/xfrm.h | 2 +- net/core/dev.c | 10 +++--- net/ipv4/ip_input.c | 12 +++- net/ipv4/ip_options.c| 3 ++- net/ipv4/raw.c | 11 +-- net/ipv6/ip6_input.c | 14 -- net/ipv6/raw.c | 2 +- net/ipv6/route.c | 7 --- net/xfrm/xfrm_policy.c | 4 ++-- 11 files changed, 52 insertions(+), 35 deletions(-) -- 2.7.0.rc3.207.g0ac5344