Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote: > On Sat, 3 Dec 2016 10:22:28 +0100 (CET) > Michał Mirosławwrote: > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > > intact through linux networking stack. > > > > Signed-off-by: Michał Mirosław > > --- > > > > Dear NetDevs > > > > I guess this needs to be split to the prep..convert[]..finish sequence, > > but if you like it as is, then it's ready. > > > > The biggest question is if the modified interface and vlan_present > > is the way to go. This can be changed to use vlan_proto != 0 instead > > of an extra flag bit. > > > > As I can't test most of the driver changes, please look at them carefully. > > OVS and bridge eyes are especially welcome. > > > > Best Regards, > > Michał Mirosław > > Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)? > > If so then you need to be more verbose in the commit log, and lots more > work is needed. You need to rename fields and validate every place a > driver is using DEI bit to make sure it really does the right thing > on that hardware. It is not just a mechanical change. There are not many mentions of CFI bit in the Linux tree. Places that used it as VLAN_TAG_PRESENT are fixed with this patchset. Other uses are: - VLAN code: ignored - ebt_vlan: ignored - OVS: cleared because of netlink API assumptions - DSA: transferred to/from (E)DSA tag - drivers: gianfar: uses properly in filtering rules - drivers: cnic: false-positive (uses only VLAN ID, CFI bit marks the field 'valid') - drivers: qedr: false-positive (like cnic) So unless there is something hidden in the hardware, no driver does anything special with the CFI bit. After this patchset only OVS will need further modifications to be able to support handling of DEI bit. Best Regards, Michał Mirosław
Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Tue, Dec 13, 2016 at 6:03 PM, Michał Mirosławwrote: > On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote: >> On Sat, 3 Dec 2016 10:22:28 +0100 (CET) >> Michał Mirosław wrote: >> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed >> > intact through linux networking stack. >> > >> > Signed-off-by: Michał Mirosław >> > --- >> > >> > Dear NetDevs >> > >> > I guess this needs to be split to the prep..convert[]..finish sequence, >> > but if you like it as is, then it's ready. >> > >> > The biggest question is if the modified interface and vlan_present >> > is the way to go. This can be changed to use vlan_proto != 0 instead >> > of an extra flag bit. >> > >> > As I can't test most of the driver changes, please look at them carefully. >> > OVS and bridge eyes are especially welcome. >> > >> > Best Regards, >> > Michał Mirosław >> Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)? >> >> If so then you need to be more verbose in the commit log, and lots more >> work is needed. You need to rename fields and validate every place a >> driver is using DEI bit to make sure it really does the right thing >> on that hardware. It is not just a mechanical change. > > My main motivation is to be able to see the bit intact in tcpdump and be > able to pass it untouched through at least a veth pair. It would be great > if all devices didn't do something stupid with the bit, but it's not > something I am able to make happen. imo "be able to pass untouched through veth" is not good enough justification for such invasive patches. I'm still not sure that all of these changes don't affect user space.
Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote: > On Sat, 3 Dec 2016 10:22:28 +0100 (CET) > Michał Mirosławwrote: > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > > intact through linux networking stack. > > > > Signed-off-by: Michał Mirosław > > --- > > > > Dear NetDevs > > > > I guess this needs to be split to the prep..convert[]..finish sequence, > > but if you like it as is, then it's ready. > > > > The biggest question is if the modified interface and vlan_present > > is the way to go. This can be changed to use vlan_proto != 0 instead > > of an extra flag bit. > > > > As I can't test most of the driver changes, please look at them carefully. > > OVS and bridge eyes are especially welcome. > > > > Best Regards, > > Michał Mirosław > Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)? > > If so then you need to be more verbose in the commit log, and lots more > work is needed. You need to rename fields and validate every place a > driver is using DEI bit to make sure it really does the right thing > on that hardware. It is not just a mechanical change. My main motivation is to be able to see the bit intact in tcpdump and be able to pass it untouched through at least a veth pair. It would be great if all devices didn't do something stupid with the bit, but it's not something I am able to make happen. Best Regards, Michał Mirosław
Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Sat, 3 Dec 2016 10:22:28 +0100 (CET) Michał Mirosławwrote: > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > intact through linux networking stack. > > Signed-off-by: Michał Mirosław > --- > > Dear NetDevs > > I guess this needs to be split to the prep..convert[]..finish sequence, > but if you like it as is, then it's ready. > > The biggest question is if the modified interface and vlan_present > is the way to go. This can be changed to use vlan_proto != 0 instead > of an extra flag bit. > > As I can't test most of the driver changes, please look at them carefully. > OVS and bridge eyes are especially welcome. > > Best Regards, > Michał Mirosław Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)? If so then you need to be more verbose in the commit log, and lots more work is needed. You need to rename fields and validate every place a driver is using DEI bit to make sure it really does the right thing on that hardware. It is not just a mechanical change.
Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Mon, Dec 05, 2016 at 11:52:47PM +0100, Michał Mirosław wrote: > On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote: > > On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote: > > > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote: > > > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote: > > > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be > > > > > passed > > > > > intact through linux networking stack. > > > > This appears to change the established Open vSwitch userspace API. You > > > > can see that simply from the way that it changes the documentation for > > > > the userspace API. If I'm right about that, then this change will break > > > > all userspace programs that use the Open vSwitch kernel module, > > > > including Open vSwitch itself. > > > > > > If I understood the code correctly, it does change expected meaning for > > > the (unlikely?) case of header truncated just before the VLAN TCI - it > > > will > > > be impossible to differentiate this case from the VLAN TCI == 0. > > > > > > I guess this is a problem with OVS API, because it doesn't directly show > > > the "missing" state of elements, but relies on an "invalid" value. > > > > That particular corner case should not be a huge problem in any case. > > > > The real problem is that this appears to break the common case use of > > VLANs in Open vSwitch. After this patch, parse_vlan() in > > net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the > > accelerated version of them or the version in the skb data) into > > sw_flow_key members. OK, that's fine on it's own. However, I don't see > > any corresponding change to the code in flow_netlink.c to compensate for > > the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT) > > was always required to be set to 1 in flow matches inside Netlink > > messages sent from userspace, and the kernel always set it to 1 in > > corresponding messages sent to userspace. > > > > In other words, if I'm reading this change correctly: > > > > * With a kernel before this change, userspace always had to set > > VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would > > reject the flow match. > > > > * With a kernel after this change, userspace must not set > > VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow > > match but nothing will ever match because packets do not actually > > have the CFI bit set. > > > > Take a look at this code that the patch deletes from > > validate_vlan_from_nlattrs(), for example, and see how it insisted that > > VLAN_TAG_PRESENT was set: > > > > if (!(tci & htons(VLAN_TAG_PRESENT))) { > > if (tci) { > > OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT > > bit set.", > > (inner) ? "C-VLAN" : "VLAN"); > > return -EINVAL; > > } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) { > > /* Corner case for truncated VLAN header. */ > > OVS_NLERR(log, "Truncated %s header has non-zero encap > > attribute.", > > (inner) ? "C-VLAN" : "VLAN"); > > return -EINVAL; > > } > > } > > > > Please let me know if I'm overlooking something. > > Hmm. So the easiest change without disrupting current userspace, would be > to flip the CFI bit on the way to/from OVS userspace. Does this seem > correct? That sounds correct. (The bit should not be flipped in the mask.) Thanks, Ben.
Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote: > On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote: > > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote: > > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote: > > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > > > > intact through linux networking stack. > > > This appears to change the established Open vSwitch userspace API. You > > > can see that simply from the way that it changes the documentation for > > > the userspace API. If I'm right about that, then this change will break > > > all userspace programs that use the Open vSwitch kernel module, > > > including Open vSwitch itself. > > > > If I understood the code correctly, it does change expected meaning for > > the (unlikely?) case of header truncated just before the VLAN TCI - it will > > be impossible to differentiate this case from the VLAN TCI == 0. > > > > I guess this is a problem with OVS API, because it doesn't directly show > > the "missing" state of elements, but relies on an "invalid" value. > > That particular corner case should not be a huge problem in any case. > > The real problem is that this appears to break the common case use of > VLANs in Open vSwitch. After this patch, parse_vlan() in > net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the > accelerated version of them or the version in the skb data) into > sw_flow_key members. OK, that's fine on it's own. However, I don't see > any corresponding change to the code in flow_netlink.c to compensate for > the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT) > was always required to be set to 1 in flow matches inside Netlink > messages sent from userspace, and the kernel always set it to 1 in > corresponding messages sent to userspace. > > In other words, if I'm reading this change correctly: > > * With a kernel before this change, userspace always had to set > VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would > reject the flow match. > > * With a kernel after this change, userspace must not set > VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow > match but nothing will ever match because packets do not actually > have the CFI bit set. > > Take a look at this code that the patch deletes from > validate_vlan_from_nlattrs(), for example, and see how it insisted that > VLAN_TAG_PRESENT was set: > > if (!(tci & htons(VLAN_TAG_PRESENT))) { > if (tci) { > OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT > bit set.", > (inner) ? "C-VLAN" : "VLAN"); > return -EINVAL; > } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) { > /* Corner case for truncated VLAN header. */ > OVS_NLERR(log, "Truncated %s header has non-zero encap > attribute.", > (inner) ? "C-VLAN" : "VLAN"); > return -EINVAL; > } > } > > Please let me know if I'm overlooking something. Hmm. So the easiest change without disrupting current userspace, would be to flip the CFI bit on the way to/from OVS userspace. Does this seem correct? Best Regards, Michał Mirosław
Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote: > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote: > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote: > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > > > intact through linux networking stack. > > This appears to change the established Open vSwitch userspace API. You > > can see that simply from the way that it changes the documentation for > > the userspace API. If I'm right about that, then this change will break > > all userspace programs that use the Open vSwitch kernel module, > > including Open vSwitch itself. > > If I understood the code correctly, it does change expected meaning for > the (unlikely?) case of header truncated just before the VLAN TCI - it will > be impossible to differentiate this case from the VLAN TCI == 0. > > I guess this is a problem with OVS API, because it doesn't directly show > the "missing" state of elements, but relies on an "invalid" value. That particular corner case should not be a huge problem in any case. The real problem is that this appears to break the common case use of VLANs in Open vSwitch. After this patch, parse_vlan() in net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the accelerated version of them or the version in the skb data) into sw_flow_key members. OK, that's fine on it's own. However, I don't see any corresponding change to the code in flow_netlink.c to compensate for the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT) was always required to be set to 1 in flow matches inside Netlink messages sent from userspace, and the kernel always set it to 1 in corresponding messages sent to userspace. In other words, if I'm reading this change correctly: * With a kernel before this change, userspace always had to set VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would reject the flow match. * With a kernel after this change, userspace must not set VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow match but nothing will ever match because packets do not actually have the CFI bit set. Take a look at this code that the patch deletes from validate_vlan_from_nlattrs(), for example, and see how it insisted that VLAN_TAG_PRESENT was set: if (!(tci & htons(VLAN_TAG_PRESENT))) { if (tci) { OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) { /* Corner case for truncated VLAN header. */ OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.", (inner) ? "C-VLAN" : "VLAN"); return -EINVAL; } } Please let me know if I'm overlooking something. Thanks, Ben.
Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote: > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote: > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > > intact through linux networking stack. > > > > Signed-off-by: Michał Mirosław> > --- > > > > Dear NetDevs > > > > I guess this needs to be split to the prep..convert[]..finish sequence, > > but if you like it as is, then it's ready. > > > > The biggest question is if the modified interface and vlan_present > > is the way to go. This can be changed to use vlan_proto != 0 instead > > of an extra flag bit. > > > > As I can't test most of the driver changes, please look at them carefully. > > OVS and bridge eyes are especially welcome. > > This appears to change the established Open vSwitch userspace API. You > can see that simply from the way that it changes the documentation for > the userspace API. If I'm right about that, then this change will break > all userspace programs that use the Open vSwitch kernel module, > including Open vSwitch itself. If I understood the code correctly, it does change expected meaning for the (unlikely?) case of header truncated just before the VLAN TCI - it will be impossible to differentiate this case from the VLAN TCI == 0. I guess this is a problem with OVS API, because it doesn't directly show the "missing" state of elements, but relies on an "invalid" value. I can probably change the code to mask this change in the OVS code (by keeping the CFI/DEI bit unusable). This somehow feels wrong, though. Best Regards, Michał Mirosław
Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote: > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed > intact through linux networking stack. > > Signed-off-by: Michał Mirosław> --- > > Dear NetDevs > > I guess this needs to be split to the prep..convert[]..finish sequence, > but if you like it as is, then it's ready. > > The biggest question is if the modified interface and vlan_present > is the way to go. This can be changed to use vlan_proto != 0 instead > of an extra flag bit. > > As I can't test most of the driver changes, please look at them carefully. > OVS and bridge eyes are especially welcome. This appears to change the established Open vSwitch userspace API. You can see that simply from the way that it changes the documentation for the userspace API. If I'm right about that, then this change will break all userspace programs that use the Open vSwitch kernel module, including Open vSwitch itself.
[PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed intact through linux networking stack. Signed-off-by: Michał Mirosław--- Dear NetDevs I guess this needs to be split to the prep..convert[]..finish sequence, but if you like it as is, then it's ready. The biggest question is if the modified interface and vlan_present is the way to go. This can be changed to use vlan_proto != 0 instead of an extra flag bit. As I can't test most of the driver changes, please look at them carefully. OVS and bridge eyes are especially welcome. Best Regards, Michał Mirosław --- Documentation/networking/openvswitch.txt | 14 -- arch/arm/net/bpf_jit_32.c| 14 +++--- arch/mips/net/bpf_jit.c | 17 +++ arch/powerpc/net/bpf_jit_comp.c | 14 +++--- arch/sparc/net/bpf_jit_comp.c| 14 +++--- drivers/infiniband/hw/cxgb4/cm.c | 2 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 8 ++-- drivers/net/ethernet/broadcom/cnic.c | 2 +- drivers/net/ethernet/emulex/benet/be_main.c | 4 +- drivers/net/ethernet/freescale/gianfar_ethtool.c | 7 +-- drivers/net/ethernet/ibm/ibmvnic.c | 5 +- drivers/net/ethernet/marvell/sky2.c | 4 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c | 9 ++-- drivers/net/hyperv/hyperv_net.h | 2 +- drivers/net/hyperv/netvsc_drv.c | 14 +++--- drivers/net/hyperv/rndis_filter.c| 5 +- include/linux/if_vlan.h | 23 ++--- include/linux/skbuff.h | 10 +++- lib/test_bpf.c | 14 +++--- net/8021q/vlan_core.c| 2 +- net/bridge/br_netfilter_hooks.c | 14 +++--- net/bridge/br_private.h | 2 +- net/bridge/br_vlan.c | 6 +-- net/core/dev.c | 8 ++-- net/core/filter.c| 17 +++ net/core/skbuff.c| 2 +- net/ipv4/ip_tunnel_core.c| 2 +- net/netfilter/nfnetlink_queue.c | 5 +- net/openvswitch/actions.c| 13 ++--- net/openvswitch/flow.c | 2 +- net/openvswitch/flow.h | 4 +- net/openvswitch/flow_netlink.c | 61 net/sched/act_vlan.c | 2 +- 33 files changed, 152 insertions(+), 170 deletions(-) diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt index b3b9ac6..e7ca27d 100644 --- a/Documentation/networking/openvswitch.txt +++ b/Documentation/networking/openvswitch.txt @@ -219,20 +219,6 @@ this: eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0) -As another example, consider a packet with an Ethernet type of 0x8100, -indicating that a VLAN TCI should follow, but which is truncated just -after the Ethernet type. The flow key for this packet would include -an all-zero-bits vlan and an empty encap attribute, like this: - -eth(...), eth_type(0x8100), vlan(0), encap() - -Unlike a TCP packet with source and destination ports 0, an -all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka -VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan -attribute expressly to allow this situation to be distinguished. -Thus, the flow key in this second example unambiguously indicates a -missing or malformed VLAN TCI. - Other rules --- diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index 93d0b6d..aff9dfa 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -915,17 +915,17 @@ static int build_body(struct jit_ctx *ctx) emit(ARM_LDR_I(r_A, r_skb, off), ctx); break; case BPF_ANC | SKF_AD_VLAN_TAG: - case BPF_ANC | SKF_AD_VLAN_TAG_PRESENT: ctx->seen |= SEEN_SKB; BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2); off = offsetof(struct sk_buff, vlan_tci); emit(ARM_LDRH_I(r_A, r_skb, off), ctx); - if (code == (BPF_ANC | SKF_AD_VLAN_TAG)) - OP_IMM3(ARM_AND, r_A, r_A, ~VLAN_TAG_PRESENT, ctx); - else { - OP_IMM3(ARM_LSR, r_A, r_A, 12, ctx); - OP_IMM3(ARM_AND, r_A, r_A, 0x1, ctx); - } + break; + case BPF_ANC | SKF_AD_VLAN_TAG_PRESENT: + off = PKT_VLAN_PRESENT_OFFSET(); + emit(ARM_LDRB_I(r_A, r_skb, off), ctx); + if