RE: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
> hello Jiri and Yotam, > > On Thu, 2017-09-28 at 19:34 +0200, Jiri Pirko wrote: > > From: Yotam Gigi > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 19e64bf..ada8214 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -772,6 +772,7 @@ struct sk_buff { > > __u8remcsum_offload:1; > > #ifdef CONFIG_NET_SWITCHDEV > > __u8offload_fwd_mark:1; > > + __u8offload_mr_fwd_mark:1; > > I had a look at the pahole output: > > $ make allyesconfig > $ make net/core/skbuff.o > $ pahole net/core/skbuff.o | grep -C7 tc_from_ingress > > > __u8 ipvs_property:1; /* 147: 7 1 */ > __u8 inner_protocol_type:1; /* 147: 6 1 */ > __u8 remcsum_offload:1;/* 147: 5 1 */ > __u8 offload_fwd_mark:1; /* 147: 4 1 */ > __u8 tc_skip_classify:1; /* 147: 3 1 */ > __u8 tc_at_ingress:1; /* 147: 2 1 */ > __u8 tc_redirected:1; /* 147: 1 1 */ > __u8 tc_from_ingress:1;/* 147: 0 1 */ > __u16 tc_index; /* 148 2 */ > > /* XXX 2 bytes hole, try to pack */ > > union { > __wsum csum; /* 4 */ > struct { > > apparently there are no more spare bits to use at that offset: therefore, > adding 'offload_mr_fwd_mark' before 'tc_skip_classify' will make > 'tc_from_ingress' slip at offset 148, and tc_index at offset 150. > I think you can use that 2-bytes hole below tc_index, and also move the > offload_fwd_mark bit there, as we use both when > CONFIG_NET_SWITCHDEV is > enabled. This way we will also gain one spare bit, without changing the > struct size or worsening the cacheline alignments. > > what do you think? Your pahole output still shows a 2B hole until the following union which is 4B-aligned. While it's true tc_index moves to offset 150, the union will not move [I.e., stay at offset 152] so the layout doesn't really change [greatly] nor the size of the struct. And we have the benefit of all the bits remaining consecutive.
Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
hello Jiri and Yotam, On Thu, 2017-09-28 at 19:34 +0200, Jiri Pirko wrote: > From: Yotam Gigi > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 19e64bf..ada8214 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -772,6 +772,7 @@ struct sk_buff { > __u8remcsum_offload:1; > #ifdef CONFIG_NET_SWITCHDEV > __u8offload_fwd_mark:1; > + __u8offload_mr_fwd_mark:1; I had a look at the pahole output: $ make allyesconfig $ make net/core/skbuff.o $ pahole net/core/skbuff.o | grep -C7 tc_from_ingress __u8 ipvs_property:1; /* 147: 7 1 */ __u8 inner_protocol_type:1; /* 147: 6 1 */ __u8 remcsum_offload:1;/* 147: 5 1 */ __u8 offload_fwd_mark:1; /* 147: 4 1 */ __u8 tc_skip_classify:1; /* 147: 3 1 */ __u8 tc_at_ingress:1; /* 147: 2 1 */ __u8 tc_redirected:1; /* 147: 1 1 */ __u8 tc_from_ingress:1;/* 147: 0 1 */ __u16 tc_index; /* 148 2 */ /* XXX 2 bytes hole, try to pack */ union { __wsum csum; /* 4 */ struct { apparently there are no more spare bits to use at that offset: therefore, adding 'offload_mr_fwd_mark' before 'tc_skip_classify' will make 'tc_from_ingress' slip at offset 148, and tc_index at offset 150. I think you can use that 2-bytes hole below tc_index, and also move the offload_fwd_mark bit there, as we use both when CONFIG_NET_SWITCHDEV is enabled. This way we will also gain one spare bit, without changing the struct size or worsening the cacheline alignments. what do you think? regards, -- davide
Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
Thu, Sep 28, 2017 at 07:49:03PM CEST, and...@lunn.ch wrote: >On Thu, Sep 28, 2017 at 07:34:09PM +0200, Jiri Pirko wrote: >> From: Yotam Gigi >> >> Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is >> used to allow partial offloading of MFC multicast routes. > >> The reason why the already existing "offload_fwd_mark" bit cannot be used >> is that a switchdev driver would want to make the distinction between a >> packet that has already gone through L2 forwarding but did not go through >> multicast forwarding, and a packet that has already gone through both L2 >> and multicast forwarding. > >Hi Jiri > >So we are talking about l2 vs l3. So why not call this >offload_l3_fwd_mark? > >Is there anything really specific to multicast here? Currently it is, not sure if it is going to be used for anything else later on. In case it will be, it could be renamed very easily. > > Thanks > Andrew
Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
On Thu, Sep 28, 2017 at 07:34:09PM +0200, Jiri Pirko wrote: > From: Yotam Gigi > > Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is > used to allow partial offloading of MFC multicast routes. > The reason why the already existing "offload_fwd_mark" bit cannot be used > is that a switchdev driver would want to make the distinction between a > packet that has already gone through L2 forwarding but did not go through > multicast forwarding, and a packet that has already gone through both L2 > and multicast forwarding. Hi Jiri So we are talking about l2 vs l3. So why not call this offload_l3_fwd_mark? Is there anything really specific to multicast here? Thanks Andrew
[patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
From: Yotam Gigi Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is used to allow partial offloading of MFC multicast routes. Switchdev drivers can offload MFC multicast routes to the hardware by registering to the FIB notification chain. When one of the route output interfaces is not offload-able, i.e. has different parent ID, the route cannot be fully offloaded by the hardware. Examples to non-offload-able devices are a management NIC, dummy device, pimreg device, etc. Similar problem exists in the bridge module, as one bridge can hold interfaces with different parent IDs. At the bridge, the problem is solved by the offload_fwd_mark skb field. Currently, when a route cannot go through full offload, the only solution for a switchdev driver is not to offload it at all and let the packet go through slow path. Using the offload_mr_fwd_mark field, a driver can indicate that a packet was already forwarded by hardware to all the devices with the same parent ID as the input device. Further patches in this patch-set are going to enhance ipmr to skip multicast forwarding to devices with the same parent ID if a packets is marked with that field. The reason why the already existing "offload_fwd_mark" bit cannot be used is that a switchdev driver would want to make the distinction between a packet that has already gone through L2 forwarding but did not go through multicast forwarding, and a packet that has already gone through both L2 and multicast forwarding. For example: when a packet is ingressing from a switchport enslaved to a bridge, which is configured with multicast forwarding, the following scenarios are possible: - The packet can be trapped to the CPU due to exception while multicast forwarding (for example, MTU error). In that case, it had already gone through L2 forwarding in the hardware, thus A switchdev driver would want to set the skb->offload_fwd_mark and not the skb->offload_mr_fwd_mark. - The packet can also be trapped due to a pimreg/dummy device used as one of the output interfaces. In that case, it can go through both L2 and (partial) multicast forwarding inside the hardware, thus a switchdev driver would want to set both the skb->offload_fwd_mark and skb->offload_mr_fwd_mark. Signed-off-by: Yotam Gigi Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 19e64bf..ada8214 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -772,6 +772,7 @@ struct sk_buff { __u8remcsum_offload:1; #ifdef CONFIG_NET_SWITCHDEV __u8offload_fwd_mark:1; + __u8offload_mr_fwd_mark:1; #endif #ifdef CONFIG_NET_CLS_ACT __u8tc_skip_classify:1; -- 2.9.5