RE: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field

2017-09-29 Thread Yuval Mintz
> hello Jiri and Yotam,
> 
> On Thu, 2017-09-28 at 19:34 +0200, Jiri Pirko wrote:
> > From: Yotam Gigi 
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 19e64bf..ada8214 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -772,6 +772,7 @@ struct sk_buff {
> > __u8remcsum_offload:1;
> >  #ifdef CONFIG_NET_SWITCHDEV
> > __u8offload_fwd_mark:1;
> > +   __u8offload_mr_fwd_mark:1;
> 
> I had a look at the pahole output:
> 
> $ make allyesconfig
> $ make net/core/skbuff.o
> $ pahole net/core/skbuff.o | grep -C7 tc_from_ingress
> 
> 
> __u8   ipvs_property:1;  /*   147: 7  1 */
> __u8   inner_protocol_type:1; /*   147: 6  1 */
> __u8   remcsum_offload:1;/*   147: 5  1 */
> __u8   offload_fwd_mark:1;   /*   147: 4  1 */
> __u8   tc_skip_classify:1;   /*   147: 3  1 */
> __u8   tc_at_ingress:1;  /*   147: 2  1 */
> __u8   tc_redirected:1;  /*   147: 1  1 */
> __u8   tc_from_ingress:1;/*   147: 0  1 */
> __u16  tc_index; /*   148 2 */
> 
> /* XXX 2 bytes hole, try to pack */
> 
> union {
> __wsum csum; /*   4 */
> struct {
> 
> apparently there are no more spare bits to use at that offset: therefore,
> adding 'offload_mr_fwd_mark' before 'tc_skip_classify' will make
> 'tc_from_ingress' slip at offset 148, and tc_index at offset 150.
> I think you can use that 2-bytes hole below tc_index, and also move the
> offload_fwd_mark bit there, as we use both when
> CONFIG_NET_SWITCHDEV is
> enabled. This way we will also gain one spare bit, without changing the
> struct size or worsening the cacheline alignments.
> 
> what do you think?

Your pahole output still shows a 2B hole until the following union
which is 4B-aligned.
While it's true tc_index moves to offset 150, the union will not move 
[I.e., stay at offset 152] so the layout doesn't really change [greatly]
nor the size of the struct. And we have the benefit of all the bits
remaining consecutive.



Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field

2017-09-29 Thread Davide Caratti
hello Jiri and Yotam,

On Thu, 2017-09-28 at 19:34 +0200, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 19e64bf..ada8214 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -772,6 +772,7 @@ struct sk_buff {
>   __u8remcsum_offload:1;
>  #ifdef CONFIG_NET_SWITCHDEV
>   __u8offload_fwd_mark:1;
> + __u8offload_mr_fwd_mark:1;

I had a look at the pahole output:

$ make allyesconfig
$ make net/core/skbuff.o
$ pahole net/core/skbuff.o | grep -C7 tc_from_ingress


__u8   ipvs_property:1;  /*   147: 7  1 */
__u8   inner_protocol_type:1; /*   147: 6  1 */
__u8   remcsum_offload:1;/*   147: 5  1 */
__u8   offload_fwd_mark:1;   /*   147: 4  1 */
__u8   tc_skip_classify:1;   /*   147: 3  1 */
__u8   tc_at_ingress:1;  /*   147: 2  1 */
__u8   tc_redirected:1;  /*   147: 1  1 */
__u8   tc_from_ingress:1;/*   147: 0  1 */
__u16  tc_index; /*   148 2 */

/* XXX 2 bytes hole, try to pack */

union {
__wsum csum; /*   4 */
struct {

apparently there are no more spare bits to use at that offset: therefore,
adding 'offload_mr_fwd_mark' before 'tc_skip_classify' will make
'tc_from_ingress' slip at offset 148, and tc_index at offset 150.
I think you can use that 2-bytes hole below tc_index, and also move the
offload_fwd_mark bit there, as we use both when CONFIG_NET_SWITCHDEV is
enabled. This way we will also gain one spare bit, without changing the
struct size or worsening the cacheline alignments.

what do you think?

regards,
-- 
davide


Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field

2017-09-29 Thread Jiri Pirko
Thu, Sep 28, 2017 at 07:49:03PM CEST, and...@lunn.ch wrote:
>On Thu, Sep 28, 2017 at 07:34:09PM +0200, Jiri Pirko wrote:
>> From: Yotam Gigi 
>> 
>> Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
>> used to allow partial offloading of MFC multicast routes.
>
>> The reason why the already existing "offload_fwd_mark" bit cannot be used
>> is that a switchdev driver would want to make the distinction between a
>> packet that has already gone through L2 forwarding but did not go through
>> multicast forwarding, and a packet that has already gone through both L2
>> and multicast forwarding.
>
>Hi Jiri
>
>So we are talking about l2 vs l3. So why not call this
>offload_l3_fwd_mark?
>
>Is there anything really specific to multicast here?

Currently it is, not sure if it is going to be used for anything else
later on. In case it will be, it could be renamed very easily.


>
>   Thanks
>  Andrew


Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field

2017-09-28 Thread Andrew Lunn
On Thu, Sep 28, 2017 at 07:34:09PM +0200, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
> used to allow partial offloading of MFC multicast routes.

> The reason why the already existing "offload_fwd_mark" bit cannot be used
> is that a switchdev driver would want to make the distinction between a
> packet that has already gone through L2 forwarding but did not go through
> multicast forwarding, and a packet that has already gone through both L2
> and multicast forwarding.

Hi Jiri

So we are talking about l2 vs l3. So why not call this
offload_l3_fwd_mark?

Is there anything really specific to multicast here?

   Thanks
  Andrew


[patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field

2017-09-28 Thread Jiri Pirko
From: Yotam Gigi 

Similarly to the offload_fwd_mark field, the offload_mr_fwd_mark field is
used to allow partial offloading of MFC multicast routes.

Switchdev drivers can offload MFC multicast routes to the hardware by
registering to the FIB notification chain. When one of the route output
interfaces is not offload-able, i.e. has different parent ID, the route
cannot be fully offloaded by the hardware. Examples to non-offload-able
devices are a management NIC, dummy device, pimreg device, etc.

Similar problem exists in the bridge module, as one bridge can hold
interfaces with different parent IDs. At the bridge, the problem is solved
by the offload_fwd_mark skb field.

Currently, when a route cannot go through full offload, the only solution
for a switchdev driver is not to offload it at all and let the packet go
through slow path.

Using the offload_mr_fwd_mark field, a driver can indicate that a packet
was already forwarded by hardware to all the devices with the same parent
ID as the input device. Further patches in this patch-set are going to
enhance ipmr to skip multicast forwarding to devices with the same parent
ID if a packets is marked with that field.

The reason why the already existing "offload_fwd_mark" bit cannot be used
is that a switchdev driver would want to make the distinction between a
packet that has already gone through L2 forwarding but did not go through
multicast forwarding, and a packet that has already gone through both L2
and multicast forwarding.

For example: when a packet is ingressing from a switchport enslaved to a
bridge, which is configured with multicast forwarding, the following
scenarios are possible:
 - The packet can be trapped to the CPU due to exception while multicast
   forwarding (for example, MTU error). In that case, it had already gone
   through L2 forwarding in the hardware, thus A switchdev driver would
   want to set the skb->offload_fwd_mark and not the
   skb->offload_mr_fwd_mark.
 - The packet can also be trapped due to a pimreg/dummy device used as one
   of the output interfaces. In that case, it can go through both L2 and
   (partial) multicast forwarding inside the hardware, thus a switchdev
   driver would want to set both the skb->offload_fwd_mark and
   skb->offload_mr_fwd_mark.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 include/linux/skbuff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 19e64bf..ada8214 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -772,6 +772,7 @@ struct sk_buff {
__u8remcsum_offload:1;
 #ifdef CONFIG_NET_SWITCHDEV
__u8offload_fwd_mark:1;
+   __u8offload_mr_fwd_mark:1;
 #endif
 #ifdef CONFIG_NET_CLS_ACT
__u8tc_skip_classify:1;
-- 
2.9.5