Re: [PATCH 1/2 net-next] net/mlx5e: TC: Fix IS_ERR() vs NULL checks

2020-09-28 Thread Ariel Levkovich
On Sep 28, 2020, at 13:42, Dan Carpenter  wrote:
> 
> The mlx5_tc_ct_init() function doesn't return error pointers it returns
> NULL.  Also we need to set the error codes on this path.
> 
> Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows")
> Signed-off-by: Dan Carpenter 
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 104b1c339de0..438fbcf478d1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -5224,8 +5224,10 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv)
> 
>tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv->fs.tc.mod_hdr,
> MLX5_FLOW_NAMESPACE_KERNEL);
> -if (IS_ERR(tc->ct))
> +if (!tc->ct) {
> +err = -ENOMEM;
>goto err_ct;
> +}

Hi Dan,
That was implement like that on purpose. If mlx5_tc_init_ct returns NULL it 
means the device doesn’t support CT offload which can happen with older devices 
or old FW on the devices.
However, in this case we want to continue with the rest of the Tc 
initialization because we can still support other TC offloads. No need to fail 
the entire TC init in this case. Only if mlx5_tc_init_ct return err_ptr that 
means the tc init failed not because of lack of support but due to a real error 
and only then we want to fail the rest of the tc init.

Your change will break compatibility for devices/FW versions that don’t have CT 
offload support.

Ariel

> 
>tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event;
>err = register_netdevice_notifier_dev_net(priv->netdev,
> @@ -5300,8 +5300,10 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
>   esw_chains(esw),
>   &esw->offloads.mod_hdr,
>   MLX5_FLOW_NAMESPACE_FDB);
> -if (IS_ERR(uplink_priv->ct_priv))
> +if (!uplink_priv->ct_priv) {
> +err = -ENOMEM;
>goto err_ct;
> +}
> 
>mapping = mapping_create(sizeof(struct tunnel_match_key),
> TUNNEL_INFO_BITS_MASK, true);
> -- 
> 2.28.0
> 


[PATCH net-next v4 2/2] net/sched: cls_flower: Add hash info to flow classification

2020-07-22 Thread Ariel Levkovich
Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 16 
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 2fd93389d091..ef145320ee99 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -579,6 +579,9 @@ enum {
 
TCA_FLOWER_KEY_MPLS_OPTS,
 
+   TCA_FLOWER_KEY_HASH,/* u32 */
+   TCA_FLOWER_KEY_HASH_MASK,   /* u32 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..ff739e0d86fc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -64,6 +64,7 @@ struct fl_flow_key {
};
} tp_range;
struct flow_dissector_key_ct ct;
+   struct flow_dissector_key_hash hash;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -318,6 +319,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
skb_flow_dissect_ct(skb, &mask->dissector, &skb_key,
fl_ct_info_to_flower_map,
ARRAY_SIZE(fl_ct_info_to_flower_map));
+   skb_flow_dissect_hash(skb, &mask->dissector, &skb_key);
skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
 
f = fl_mask_lookup(mask, &skb_key);
@@ -694,6 +696,9 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_CT_LABELS_MASK] = { .type = NLA_BINARY,
.len = 128 / BITS_PER_BYTE },
[TCA_FLOWER_FLAGS]  = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH_MASK]  = { .type = NLA_U32 },
+
 };
 
 static const struct nla_policy
@@ -1625,6 +1630,10 @@ static int fl_set_key(struct net *net, struct nlattr 
**tb,
 
fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+   fl_set_key_val(tb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+  &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+  sizeof(key->hash.hash));
+
if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
ret = fl_set_enc_opt(tb, key, mask, extack);
if (ret)
@@ -1739,6 +1748,8 @@ static void fl_init_dissector(struct flow_dissector 
*dissector,
 FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 FLOW_DISSECTOR_KEY_CT, ct);
+   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+FLOW_DISSECTOR_KEY_HASH, hash);
 
skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -2959,6 +2970,11 @@ static int fl_dump_key(struct sk_buff *skb, struct net 
*net,
if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
goto nla_put_failure;
 
+   if (fl_dump_key_val(skb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+&mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+sizeof(key->hash.hash)))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
-- 
2.25.2



[PATCH net-next v4 1/2] net/flow_dissector: add packet hash dissection

2020-07-22 Thread Ariel Levkovich
Retreive a hash value from the SKB and store it
in the dissector key for future matching.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/linux/skbuff.h   |  4 
 include/net/flow_dissector.h |  9 +
 net/core/flow_dissector.c| 17 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0c0377fc00c2..beb7fe2c7809 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1342,6 +1342,10 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 struct flow_dissector *flow_dissector,
 void *target_container);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container);
+
 static inline __u32 skb_get_hash(struct sk_buff *skb)
 {
if (!skb->l4_hash && !skb->sw_hash)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..5cc0540ce3f7 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -243,6 +243,14 @@ struct flow_dissector_key_ct {
u32 ct_labels[4];
 };
 
+/**
+ * struct flow_dissector_key_hash:
+ * @hash: hash value
+ */
+struct flow_dissector_key_hash {
+   u32 hash;
+};
+
 enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -271,6 +279,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
+   FLOW_DISSECTOR_KEY_HASH, /* struct flow_dissector_key_hash */
 
FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..c114f0e3ef4f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -392,6 +392,23 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_flow_dissect_tunnel_info);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container)
+{
+   struct flow_dissector_key_hash *key;
+
+   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_HASH))
+   return;
+
+   key = skb_flow_dissector_target(flow_dissector,
+   FLOW_DISSECTOR_KEY_HASH,
+   target_container);
+
+   key->hash = skb_get_hash_raw(skb);
+}
+EXPORT_SYMBOL(skb_flow_dissect_hash);
+
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
-- 
2.25.2



[PATCH net-next v4 0/2] TC datapath hash api

2020-07-22 Thread Ariel Levkovich
Hash based packet classification allows user to set up rules that
provide load balancing of traffic across multiple vports and
for ECMP path selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on a their hash value and distribute
the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The series introduces an extention to the cls flower classifier
and allows user to add rules that match on the hash value that
is stored in skb->hash while assuming the value was set prior to
the classification.

Setting the skb->hash can be done in various ways and is not defined
in this series - for example:
1. By the device driver upon processing an rx packet.
2. Using tc action bpf with a program which computes and sets the
skb->hash value.

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

v3 -> v4:
 *Drop hash setting code leaving only the classidication parts.
  Setting the hash will be possible via existing tc action bpf.

v2 -> v3:
 *Split hash algorithm option into 2 different actions.
  Asym_l4 available via act_skbedit and bpf via new act_hash.

Ariel Levkovich (2):
  net/flow_dissector: add packet hash dissection
  net/sched: cls_flower: Add hash info to flow classification

 include/linux/skbuff.h   |  4 
 include/net/flow_dissector.h |  9 +
 include/uapi/linux/pkt_cls.h |  3 +++
 net/core/flow_dissector.c| 17 +
 net/sched/cls_flower.c   | 16 
 5 files changed, 49 insertions(+)

-- 
2.25.2



Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash

2020-07-16 Thread Ariel Levkovich

On 7/15/20 2:49 PM, Daniel Borkmann wrote:

On 7/15/20 3:30 PM, Ariel Levkovich wrote:

On 7/15/20 2:12 AM, Cong Wang wrote:
On Mon, Jul 13, 2020 at 8:17 PM Ariel Levkovich 
 wrote:

On 7/13/20 6:04 PM, Cong Wang wrote:
On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich 
 wrote:

Allow user to set a packet's hash value using a bpf program.

The user provided BPF program is required to compute and return
a hash value for the packet which is then stored in skb->hash.

Can be done by act_bpf, right?

Right. We already agreed on that.

Nevertheless, as I mentioned, act_bpf is not offloadable.

Device driver has no clue what the program does.

What about offloading act_skbedit? You care about offloading
the skb->hash computation, not about bpf.

Thanks.


That's true but act_skedit provides (according to the current design) 
hash


computation using a kernel implemented algorithm.

HW not necessarily can offload this kernel based jhash function and 
therefore


we introduce the bpf option. With bpf the user can provide an 
implemenation


of a hash function that the HW can actually offload and that way user

maintains consistency between SW hash calculation and HW.

For example, in cases where offload is added dynamically as traffic 
flows, like


in the OVS case, first packets will go to SW and hash will be 
calculated on them


using bpf that emulates the HW hash so that this packet will get the 
same hash


result that it will later get in HW when the flow is offloaded.


If there's a strong objection to adding a new action,

IMO, we can include the bpf option in act_skbedit - action skbedit 
hash bpf 


What do u think?


Please don't. From a BPF pov this is all very misleading since it 
might wrongly suggest
to the user that existing means aka {cls,act}_bpf in tc are not 
capable of already doing
this. They are capable for several years already though. Also, it is 
very confusing that
act_hash or 'skbedit hash bpf' can do everything that {cls,act}_bpf 
can do already, so
much beyond setting a hash value (e.g. you could set tunnel keys etc 
from there). Given
act_hash is only about offloading but nothing else, did you consider 
for the BPF alternative
to just use plain old classic BPF given you only need to parse the pkt 
and calc the hash

val but nothing more?


You can do almost everything with act_bpf and yet there are explicit 
actions to set a tunnel


key and add/remove MPLS header (and more...).

What do u mean by classic BPF? How will that help with the offload?

It will still go via act_bpf without any indication on what type of 
program is this, won't it?


Thanks,

Ariel




Re: [PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit

2020-07-15 Thread Ariel Levkovich

On 7/13/20 1:11 PM, Davide Caratti wrote:

On Sun, 2020-07-12 at 00:28 +0300, Ariel Levkovich wrote:

Extend act_skbedit api to allow writing into skb->hash
field.


[...]


Usage example:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action skbedit hash asym_l4 basis 5 \
action goto chain 2

hello Ariel, thanks for the patch!


Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
  include/net/tc_act/tc_skbedit.h|  2 ++
  include/uapi/linux/tc_act/tc_skbedit.h |  7 +
  net/sched/act_skbedit.c| 38 ++
  3 files changed, 47 insertions(+)

this diffstat is ok for l4 hash calculation :)


diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 00bfee70609e..44a8a4625556 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -18,6 +18,8 @@ struct tcf_skbedit_params {
u32 mask;
u16 queue_mapping;
u16 ptype;
+   u32 hash_alg;
+   u32 hash_basis;
struct rcu_head rcu;
  };
  
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h

index 800e93377218..5877811b093b 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -29,6 +29,11 @@
  #define SKBEDIT_F_PTYPE   0x8
  #define SKBEDIT_F_MASK0x10
  #define SKBEDIT_F_INHERITDSFIELD  0x20
+#define SKBEDIT_F_HASH 0x40
+
+enum {
+   TCA_SKBEDIT_HASH_ALG_ASYM_L4,
+};

nit:

it's a common practice, when specifying enums in the uAPI, to set the
first value  "UNSPEC", and last one as "MAX":

enum {
TCA_SKBEDIT_HASH_ALG_UNSPEC,
TCA_SKBEDIT_HASH_ALG_ASYM_L4,
__TCA_SKBEDIT_HASH_ALG_MAX
};

see below the rationale:



Agree. Missed that. Actual enums should start at 1.




  struct tc_skbedit {
tc_gen;
@@ -45,6 +50,8 @@ enum {
TCA_SKBEDIT_PTYPE,
TCA_SKBEDIT_MASK,
TCA_SKBEDIT_FLAGS,
+   TCA_SKBEDIT_HASH,
+   TCA_SKBEDIT_HASH_BASIS,
__TCA_SKBEDIT_MAX
  };
  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b125b2be4467..2cc66c798afb 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -66,6 +66,20 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct 
tc_action *a,
}
if (params->flags & SKBEDIT_F_PTYPE)
skb->pkt_type = params->ptype;
+
+   if (params->flags & SKBEDIT_F_HASH) {
+   u32 hash;
+
+   hash = skb_get_hash(skb);
+   /* If a hash basis was provided, add it into
+* hash calculation here and re-set skb->hash
+* to the new result with sw_hash indication
+* and keeping the l4 hash indication.
+*/
+   hash = jhash_1word(hash, params->hash_basis);
+   __skb_set_sw_hash(skb, hash, skb->l4_hash);
+   }

in this way you don't need to define a value in 'flags'
(SKBEDIT_F_HASH), you just need to check if params->hash_alg is not
zero:
if (params->hash_alg) {

}


return action;
  
  err:

@@ -91,6 +105,8 @@ static const struct nla_policy 
skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
[TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) },
[TCA_SKBEDIT_MASK]  = { .len = sizeof(u32) },
[TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) },
+   [TCA_SKBEDIT_HASH]  = { .len = sizeof(u32) },
+   [TCA_SKBEDIT_HASH_BASIS]= { .len = sizeof(u32) },
  };
  
  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,

@@ -107,6 +123,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
struct tcf_skbedit *d;
u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
u16 *queue_mapping = NULL, *ptype = NULL;
+   u32 hash_alg, hash_basis = 0;
bool exists = false;
int ret = 0, err;
u32 index;
@@ -156,6 +173,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
flags |= SKBEDIT_F_INHERITDSFIELD;
}
  
+	if (tb[TCA_SKBEDIT_HASH] != NULL) {

+   hash_alg = nla_get_u32(tb[TCA_SKBEDIT_HASH]);
+   if (hash_alg > TCA_SKBEDIT_HASH_ALG_ASYM_L4)
+   return -EINVAL;

moreover, even without doing the strict validation, when somebody in the
future will extend the uAPI, he will not need to change the line above.
The following test will validate all good values of hash_alg:

if (!hash_alg || hash_alg >= __TCA_SKBEDIT_HASH_ALG_MAX) {
NL_SET_ERR_MSG_MOD(extack, "hash_alg is out of range");
return -EINVAL;
}

WDYT?

thanks!


I actually thought about it during implementation. Dropped it eventually.


Thanks for the comments.






Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash

2020-07-15 Thread Ariel Levkovich

On 7/15/20 2:12 AM, Cong Wang wrote:

On Mon, Jul 13, 2020 at 8:17 PM Ariel Levkovich  wrote:

On 7/13/20 6:04 PM, Cong Wang wrote:

On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich  wrote:

Allow user to set a packet's hash value using a bpf program.

The user provided BPF program is required to compute and return
a hash value for the packet which is then stored in skb->hash.

Can be done by act_bpf, right?

Right. We already agreed on that.

Nevertheless, as I mentioned, act_bpf is not offloadable.

Device driver has no clue what the program does.

What about offloading act_skbedit? You care about offloading
the skb->hash computation, not about bpf.

Thanks.



That's true but act_skedit provides (according to the current design) hash

computation using a kernel implemented algorithm.

HW not necessarily can offload this kernel based jhash function and 
therefore


we introduce the bpf option. With bpf the user can provide an implemenation

of a hash function that the HW can actually offload and that way user

maintains consistency between SW hash calculation and HW.

For example, in cases where offload is added dynamically as traffic 
flows, like


in the OVS case, first packets will go to SW and hash will be calculated 
on them


using bpf that emulates the HW hash so that this packet will get the 
same hash


result that it will later get in HW when the flow is offloaded.


If there's a strong objection to adding a new action,

IMO, we can include the bpf option in act_skbedit - action skbedit hash 
bpf 


What do u think?

Thanks,

Ariel




Re: [PATCH net-next v3 2/4] net/sched: Introduce action hash

2020-07-13 Thread Ariel Levkovich

On 7/13/20 6:04 PM, Cong Wang wrote:

On Sat, Jul 11, 2020 at 2:28 PM Ariel Levkovich  wrote:

Allow user to set a packet's hash value using a bpf program.

The user provided BPF program is required to compute and return
a hash value for the packet which is then stored in skb->hash.

Can be done by act_bpf, right?


Right. We already agreed on that.

Nevertheless, as I mentioned, act_bpf is not offloadable.

Device driver has no clue what the program does.




Using this action to set the skb->hash is an alternative to setting
it with act_skbedit and can be useful for future HW offload support
when the HW hash function is different then the kernel's hash
function.
By using a bpg program that emulates the HW hash function user
can ensure hash consistency between the SW and the HW.

It sounds weird that the sole reason to add a new action is
because of HW offloading. What prevents us extending the
existing actions to support HW offloading?

Thanks.
Something like adding a parameter to act_bpf that provides information 
on the program


and what it does?


Ariel




[PATCH net-next v3 2/4] net/sched: Introduce action hash

2020-07-11 Thread Ariel Levkovich
Allow user to set a packet's hash value using a bpf program.

The user provided BPF program is required to compute and return
a hash value for the packet which is then stored in skb->hash.

Using this action to set the skb->hash is an alternative to setting
it with act_skbedit and can be useful for future HW offload support
when the HW hash function is different then the kernel's hash
function.
By using a bpg program that emulates the HW hash function user
can ensure hash consistency between the SW and the HW.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash object-file  section \
action goto chain 2

Matching on the result:
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

v1 -> v2:
 *Handle egress case for bpf hash properly.
 *Check for valid bpf fd before referencing it.
 *Fixed missing unlocking of tcf_lock.

v2 -> v3:
 *Move hash algorithm asym_l4 to act_skbedit.
  This action only supports bpf option now.
 
Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/net/act_api.h   |   2 +
 include/net/tc_act/tc_hash.h|  20 ++
 include/uapi/linux/pkt_cls.h|   1 +
 include/uapi/linux/tc_act/tc_hash.h |  25 ++
 net/sched/Kconfig   |  11 +
 net/sched/Makefile  |   1 +
 net/sched/act_hash.c| 348 
 net/sched/cls_api.c |   1 +
 8 files changed, 409 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c3934880670..b7e5d060bd2f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#define ACT_BPF_NAME_LEN   256
+
 struct tcf_idrinfo {
struct mutexlock;
struct idr  action_idr;
diff --git a/include/net/tc_act/tc_hash.h b/include/net/tc_act/tc_hash.h
new file mode 100644
index ..4a9e36664813
--- /dev/null
+++ b/include/net/tc_act/tc_hash.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __NET_TC_HASH_H
+#define __NET_TC_HASH_H
+
+#include 
+#include 
+
+struct tcf_hash_params {
+   struct bpf_prog *prog;
+   const char *bpf_name;
+   struct rcu_head rcu;
+};
+
+struct tcf_hash {
+   struct tc_action common;
+   struct tcf_hash_params __rcu *hash_p;
+};
+#define to_hash(a) ((struct tcf_hash *)a)
+
+#endif /* __NET_TC_HASH_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 7576209d96f9..2fd93389d091 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -135,6 +135,7 @@ enum tca_id {
TCA_ID_MPLS,
TCA_ID_CT,
TCA_ID_GATE,
+   TCA_ID_HASH,
/* other actions go here */
__TCA_ID_MAX = 255
 };
diff --git a/include/uapi/linux/tc_act/tc_hash.h 
b/include/uapi/linux/tc_act/tc_hash.h
new file mode 100644
index ..08937f097ed7
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_hash.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef __LINUX_TC_HASH_H
+#define __LINUX_TC_HASH_H
+
+#include 
+
+enum {
+   TCA_HASH_UNSPEC,
+   TCA_HASH_TM,
+   TCA_HASH_PARMS,
+   TCA_HASH_BPF_FD,
+   TCA_HASH_BPF_NAME,
+   TCA_HASH_BPF_ID,
+   TCA_HASH_BPF_TAG,
+   TCA_HASH_PAD,
+   __TCA_HASH_MAX,
+};
+#define TCA_HASH_MAX (__TCA_HASH_MAX - 1)
+
+struct tc_hash {
+   tc_gen;
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 84badf00647e..e9725bb40f4f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -993,6 +993,17 @@ config NET_ACT_GATE
  To compile this code as a module, choose M here: the
  module will be called act_gate.
 
+config NET_ACT_HASH
+   tristate "Hash calculation action"
+   depends on NET_CLS_ACT
+   help
+ Say Y here to perform hash calculation on packet headers.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_hash.
+
 config NET_IFE_SKBMARK
tristate "Support to encoding decoding skb mark on IFE action"
depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 66bbf9a98f9e..2d1415fb57db 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_NET_ACT_CONNMARK)+= act_connmark.o
 obj-$(CONFIG_NET_ACT_CTINFO)   += act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)   += act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o
+obj-$(CONFIG_NET_ACT_HASH)  +

[PATCH net-next v3 3/4] net/flow_dissector: add packet hash dissection

2020-07-11 Thread Ariel Levkovich
Retreive a hash value from the SKB and store it
in the dissector key for future matching.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/linux/skbuff.h   |  4 
 include/net/flow_dissector.h |  9 +
 net/core/flow_dissector.c| 17 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0c0377fc00c2..beb7fe2c7809 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1342,6 +1342,10 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 struct flow_dissector *flow_dissector,
 void *target_container);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container);
+
 static inline __u32 skb_get_hash(struct sk_buff *skb)
 {
if (!skb->l4_hash && !skb->sw_hash)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..5cc0540ce3f7 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -243,6 +243,14 @@ struct flow_dissector_key_ct {
u32 ct_labels[4];
 };
 
+/**
+ * struct flow_dissector_key_hash:
+ * @hash: hash value
+ */
+struct flow_dissector_key_hash {
+   u32 hash;
+};
+
 enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -271,6 +279,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
+   FLOW_DISSECTOR_KEY_HASH, /* struct flow_dissector_key_hash */
 
FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..c114f0e3ef4f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -392,6 +392,23 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_flow_dissect_tunnel_info);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container)
+{
+   struct flow_dissector_key_hash *key;
+
+   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_HASH))
+   return;
+
+   key = skb_flow_dissector_target(flow_dissector,
+   FLOW_DISSECTOR_KEY_HASH,
+   target_container);
+
+   key->hash = skb_get_hash_raw(skb);
+}
+EXPORT_SYMBOL(skb_flow_dissect_hash);
+
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
-- 
2.25.2



[PATCH net-next v3 1/4] net/sched: Add skb->hash field editing via act_skbedit

2020-07-11 Thread Ariel Levkovich
Extend act_skbedit api to allow writing into skb->hash
field.

To modify skb->hash user selects the hash algorithm
to use for the hash computation and can provide a
hash basis value to be used in the calculation.
The hash value will be calculated on the packet in the
datapath and will be set into skb->hash field.

Current implementation supports only the asymmetric l4 hash
algorithm that first checks whether the skb->hash was already
set with l4 hash value (possibly by the device driver) and uses
the existing value. If hash value wasn't set, it computes the
hash value in place using the kernel implementation of the
Jenkins hash algorithm.

Usage example:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action skbedit hash asym_l4 basis 5 \
action goto chain 2

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/net/tc_act/tc_skbedit.h|  2 ++
 include/uapi/linux/tc_act/tc_skbedit.h |  7 +
 net/sched/act_skbedit.c| 38 ++
 3 files changed, 47 insertions(+)

diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 00bfee70609e..44a8a4625556 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -18,6 +18,8 @@ struct tcf_skbedit_params {
u32 mask;
u16 queue_mapping;
u16 ptype;
+   u32 hash_alg;
+   u32 hash_basis;
struct rcu_head rcu;
 };
 
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h 
b/include/uapi/linux/tc_act/tc_skbedit.h
index 800e93377218..5877811b093b 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -29,6 +29,11 @@
 #define SKBEDIT_F_PTYPE0x8
 #define SKBEDIT_F_MASK 0x10
 #define SKBEDIT_F_INHERITDSFIELD   0x20
+#define SKBEDIT_F_HASH 0x40
+
+enum {
+   TCA_SKBEDIT_HASH_ALG_ASYM_L4,
+};
 
 struct tc_skbedit {
tc_gen;
@@ -45,6 +50,8 @@ enum {
TCA_SKBEDIT_PTYPE,
TCA_SKBEDIT_MASK,
TCA_SKBEDIT_FLAGS,
+   TCA_SKBEDIT_HASH,
+   TCA_SKBEDIT_HASH_BASIS,
__TCA_SKBEDIT_MAX
 };
 #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b125b2be4467..2cc66c798afb 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -66,6 +66,20 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct 
tc_action *a,
}
if (params->flags & SKBEDIT_F_PTYPE)
skb->pkt_type = params->ptype;
+
+   if (params->flags & SKBEDIT_F_HASH) {
+   u32 hash;
+
+   hash = skb_get_hash(skb);
+   /* If a hash basis was provided, add it into
+* hash calculation here and re-set skb->hash
+* to the new result with sw_hash indication
+* and keeping the l4 hash indication.
+*/
+   hash = jhash_1word(hash, params->hash_basis);
+   __skb_set_sw_hash(skb, hash, skb->l4_hash);
+   }
+
return action;
 
 err:
@@ -91,6 +105,8 @@ static const struct nla_policy 
skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
[TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) },
[TCA_SKBEDIT_MASK]  = { .len = sizeof(u32) },
[TCA_SKBEDIT_FLAGS] = { .len = sizeof(u64) },
+   [TCA_SKBEDIT_HASH]  = { .len = sizeof(u32) },
+   [TCA_SKBEDIT_HASH_BASIS]= { .len = sizeof(u32) },
 };
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -107,6 +123,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
struct tcf_skbedit *d;
u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
u16 *queue_mapping = NULL, *ptype = NULL;
+   u32 hash_alg, hash_basis = 0;
bool exists = false;
int ret = 0, err;
u32 index;
@@ -156,6 +173,17 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
flags |= SKBEDIT_F_INHERITDSFIELD;
}
 
+   if (tb[TCA_SKBEDIT_HASH] != NULL) {
+   hash_alg = nla_get_u32(tb[TCA_SKBEDIT_HASH]);
+   if (hash_alg > TCA_SKBEDIT_HASH_ALG_ASYM_L4)
+   return -EINVAL;
+
+   flags |= SKBEDIT_F_HASH;
+
+   if (tb[TCA_SKBEDIT_HASH_BASIS])
+   hash_basis = nla_get_u32(tb[TCA_SKBEDIT_HASH_BASIS]);
+   }
+
parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
index = parm->index;
err = tcf_idr_check_alloc(tn, &index, a, bind);
@@ -213,6 +241,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr 
*nla,
params_new->mask = 0x;
if (flags & SKBEDIT_F_MASK)
params_new->mask = *mask;
+   if (flags & SKBEDIT_F_HASH) {
+   params_new->

[PATCH net-next v3 0/4] ] TC datapath hash api

2020-07-11 Thread Ariel Levkovich
Supporting datapath hash allows user to set up rules that provide
load balancing of traffic across multiple vports and for ECMP path
selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on hashing on the packet headers
and distribute the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The datapath hash functionality is achieved in two steps -
performing the hash action and then matching on the result, as
part of the packet's classification.

To compute the hash value, the api offers 2 methods:
1. Linux implementation of an asymmetric hash algorithm
which is performed on the L4 headers of the packet.
This method is usable via an extention to act_skbedit and
allows user to provide a basis value to be included in
the computation.

2. User provided bpf program that implements
a hash computation algorithm. This option is usable
via a new type of tc action - action_hash.

Through both methods, the hash value is calculated
and stored in the skb->hash field so it can be matched
later as a key in the cls flower classifier.
where the hash function can be standard asymetric hashing that Linux
offers or alternatively user can provide a bpf program that
performs hash calculation on a packet.

Usage is as follows:

For hash calculation:
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash object-file  section \
action goto chain 2

Or:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action skbedit hash asym_l4 basis  \
action goto chain 2

Matching on hash result:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2


v2 -> v3:
 *Split hash algorithm option into 2 different actions.
  Asym_l4 available via act_skbedit and bpf via new act_hash.

Ariel Levkovich (4):
  net/sched: Add skb->hash field editing via act_skbedit
  net/sched: Introduce action hash
  net/flow_dissector: add packet hash dissection
  net/sched: cls_flower: Add hash info to flow classification

 include/linux/skbuff.h |   4 +
 include/net/act_api.h  |   2 +
 include/net/flow_dissector.h   |   9 +
 include/net/tc_act/tc_hash.h   |  20 ++
 include/net/tc_act/tc_skbedit.h|   2 +
 include/uapi/linux/pkt_cls.h   |   4 +
 include/uapi/linux/tc_act/tc_hash.h|  25 ++
 include/uapi/linux/tc_act/tc_skbedit.h |   7 +
 net/core/flow_dissector.c  |  17 ++
 net/sched/Kconfig  |  11 +
 net/sched/Makefile |   1 +
 net/sched/act_hash.c   | 348 +
 net/sched/act_skbedit.c|  38 +++
 net/sched/cls_api.c|   1 +
 net/sched/cls_flower.c |  16 ++
 15 files changed, 505 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

-- 
2.25.2



[PATCH net-next v3 4/4] net/sched: cls_flower: Add hash info to flow classification

2020-07-11 Thread Ariel Levkovich
Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 16 
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 2fd93389d091..ef145320ee99 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -579,6 +579,9 @@ enum {
 
TCA_FLOWER_KEY_MPLS_OPTS,
 
+   TCA_FLOWER_KEY_HASH,/* u32 */
+   TCA_FLOWER_KEY_HASH_MASK,   /* u32 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..ff739e0d86fc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -64,6 +64,7 @@ struct fl_flow_key {
};
} tp_range;
struct flow_dissector_key_ct ct;
+   struct flow_dissector_key_hash hash;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -318,6 +319,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
skb_flow_dissect_ct(skb, &mask->dissector, &skb_key,
fl_ct_info_to_flower_map,
ARRAY_SIZE(fl_ct_info_to_flower_map));
+   skb_flow_dissect_hash(skb, &mask->dissector, &skb_key);
skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
 
f = fl_mask_lookup(mask, &skb_key);
@@ -694,6 +696,9 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_CT_LABELS_MASK] = { .type = NLA_BINARY,
.len = 128 / BITS_PER_BYTE },
[TCA_FLOWER_FLAGS]  = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH_MASK]  = { .type = NLA_U32 },
+
 };
 
 static const struct nla_policy
@@ -1625,6 +1630,10 @@ static int fl_set_key(struct net *net, struct nlattr 
**tb,
 
fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+   fl_set_key_val(tb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+  &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+  sizeof(key->hash.hash));
+
if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
ret = fl_set_enc_opt(tb, key, mask, extack);
if (ret)
@@ -1739,6 +1748,8 @@ static void fl_init_dissector(struct flow_dissector 
*dissector,
 FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 FLOW_DISSECTOR_KEY_CT, ct);
+   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+FLOW_DISSECTOR_KEY_HASH, hash);
 
skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -2959,6 +2970,11 @@ static int fl_dump_key(struct sk_buff *skb, struct net 
*net,
if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
goto nla_put_failure;
 
+   if (fl_dump_key_val(skb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+&mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+sizeof(key->hash.hash)))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
-- 
2.25.2



Re: [PATCH net-next v2 0/3] ] TC datapath hash api

2020-07-09 Thread Ariel Levkovich

Not sure it was sent so trying again...


On 7/5/20 8:28 PM, Cong Wang wrote:

On Sun, Jul 5, 2020 at 2:50 PM Jamal Hadi Salim  wrote:

BTW, nothing in skbedit is against computing what the new metadata
should be.

Yup.


IMO: A good arguement to not make it part of skbedit is if it adds
unnecessary complexity to skbedit or policy definitions.


TCA_HASH_ALG_L4 literally has 4 lines of code, has no way
to add any unnecessary complexity to skbedit. (The BPF algorithm
does not belong to skbedit, obviously.)

Thanks.


Moving TCA_HASH_ALG_L4 to skbedit is very simple, I agree.

However, supporting the bpf option via act_bpf is still problematic as 
it is not offloadable.


We need some kind of indication that this is a hash computation program 
and therefore


it requires specific identifier in the form of a new action.

Ariel


Re: [PATCH net-next v2 0/3] ] TC datapath hash api

2020-07-05 Thread Ariel Levkovich



On 7/3/20 7:22 AM, Jamal Hadi Salim wrote:

Hi,

Several comments:
1) I agree with previous comments that you should
look at incorporating this into skbedit.
Unless incorporating into skbedit introduces huge
complexity, IMO it belongs there.


Hi Jamal,

I agree that using skbedit makes some sense and can provide the same 
functionality.


However I believe that from a concept point of view, using it is wrong.

In my honest opinion, the concept here is to perform some calculation on 
the packet itself and its headers while the skb->hash field


is the storage location of the calculation result (in SW).

Furthermore, looking forward to HW offload support, the HW devices will 
be offloading the hash calculation and


not rewriting skb metadata fields. Therefore the action should be the 
hash, not skbedit.


Another thing that I can mention, which is kind of related to what I 
wrote above, is that for all existing skbedit supported fields,


user typically provides a desired value of his choosing to set to a skb 
metadata field.


Here, the value is unknown and probably not a real concern to the user.


To sum it up, I look at this as performing some operation on the packet 
rather then just


setting an skb metadata field and therefore it requires an explicit, new 
action.



What do you think?




2) I think it would make sense to create a skb hash classifier
instead of tying this entirely to flower i.e i should not
have to change u32 just so i can support hash classification.
So policy would be something of the sort:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action skbedit hash bpf object-file  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
handle 0x0 skbhash  flowid 1:11 mask 0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
handle 0x1 skbhash  flowid 1:11 mask 0xf  \
action mirred egress redirect dev ens1f0_2

IOW, we maintain current modularity as opposed
to dumping everything into flower.
Ive always wanted to write the skbhash classifier but
time was scarce. At one point i had some experiment
where I would copy skb hash into mark in the driver
and use fw classifier for further processing.
It was ugly.


I agree but perhaps we should make it a separate effort and not block 
this one.


I still think we should have support via flower. This is the HW offload 
path eventually.



Regards,

Ariel



cheers,
jamal

On 2020-07-01 2:47 p.m., Ariel Levkovich wrote:

Supporting datapath hash allows user to set up rules that provide
load balancing of traffic across multiple vports and for ECMP path
selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on hashing on the packet headers
and distribute the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The datapath hash functionality is achieved in two steps -
performing the hash action and then matching on the result, as
part of the packet's classification.

The api allows user to define a filter with a tc hash action
where the hash function can be standard asymetric hashing that Linux
offers or alternatively user can provide a bpf program that
performs hash calculation on a packet.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action hash bpf asym_l4 basis  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

Ariel Levkovich (3):
   net/sched: Introduce action hash
   net/flow_dissector: add packet hash dissection
   net/sched: cls_flower: Add hash info to flow classification

  include/linux/skbuff.h  |   4 +
  include/net/act_api.h   |   2 +
  include/net/flow_dissector.h    |   9 +
  include/net/tc_act/tc_hash.h    |  22 ++
  include/uapi/linux/pkt_cls.h    |   4 +
  include/uapi/linux/tc_act/tc_hash.h |  32 +++
  net/core/flow_dissector.c   |  17 ++
  net/sched/Kconfig   |  11 +
  net/sched/Makefile  |   1 +
  net/sched/act_hash.c    | 389 
  net/sched/cls_api.c |   1 +
  net/sched/cls_flower.c  |  16 ++
  12 files changed, 508 insertions(+)
  create mode 100644 include/net/tc_act/tc_hash.h
  create mode 100644 include/uapi/linux/tc_act/tc_hash.h
  create mode 100644 net/sched/act_hash.c





[PATCH net-next v2 0/3] ] TC datapath hash api

2020-07-01 Thread Ariel Levkovich
Supporting datapath hash allows user to set up rules that provide
load balancing of traffic across multiple vports and for ECMP path
selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on hashing on the packet headers
and distribute the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The datapath hash functionality is achieved in two steps -
performing the hash action and then matching on the result, as
part of the packet's classification.

The api allows user to define a filter with a tc hash action
where the hash function can be standard asymetric hashing that Linux
offers or alternatively user can provide a bpf program that
performs hash calculation on a packet.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action hash bpf asym_l4 basis  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

Ariel Levkovich (3):
  net/sched: Introduce action hash
  net/flow_dissector: add packet hash dissection
  net/sched: cls_flower: Add hash info to flow classification

 include/linux/skbuff.h  |   4 +
 include/net/act_api.h   |   2 +
 include/net/flow_dissector.h|   9 +
 include/net/tc_act/tc_hash.h|  22 ++
 include/uapi/linux/pkt_cls.h|   4 +
 include/uapi/linux/tc_act/tc_hash.h |  32 +++
 net/core/flow_dissector.c   |  17 ++
 net/sched/Kconfig   |  11 +
 net/sched/Makefile  |   1 +
 net/sched/act_hash.c| 389 
 net/sched/cls_api.c |   1 +
 net/sched/cls_flower.c  |  16 ++
 12 files changed, 508 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

-- 
2.25.2



[PATCH net-next v2 1/3] net/sched: Introduce action hash

2020-07-01 Thread Ariel Levkovich
Allow setting a hash value to a packet for a future match.

The action will determine the packet's hash result according to
the selected hash type.

The first option is to select a basic asymmetric l4 hash calculation
on the packet headers which will either use the skb->hash value as
if such was already calculated and set by the device driver, or it
will perform the kernel jenkins hash function on the packet which will
generate the result otherwise.

The other option is for user to provide an BPF program which is
dedicated to calculate the hash. In such case the program is loaded
and used by tc to perform the hash calculation and provide it to
the hash action to be stored in skb->hash field.

The BPF option can be useful for future HW offload support of the hash
calculation by emulating the HW hash function when it's different than
the kernel's but yet we want to maintain consistency between the SW and
the HW.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action hash asym_l4 basis  \
action goto chain 2

Matching on the result:
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

v1 -> v2:
 *Handle egress case for bpf hash properly.
 *Check for valid bpf fd before referencing it.
 *Fixed missing unlocking of tcf_lock.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/net/act_api.h   |   2 +
 include/net/tc_act/tc_hash.h|  22 ++
 include/uapi/linux/pkt_cls.h|   1 +
 include/uapi/linux/tc_act/tc_hash.h |  32 +++
 net/sched/Kconfig   |  11 +
 net/sched/Makefile  |   1 +
 net/sched/act_hash.c| 389 
 net/sched/cls_api.c |   1 +
 8 files changed, 459 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c3934880670..b7e5d060bd2f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#define ACT_BPF_NAME_LEN   256
+
 struct tcf_idrinfo {
struct mutexlock;
struct idr  action_idr;
diff --git a/include/net/tc_act/tc_hash.h b/include/net/tc_act/tc_hash.h
new file mode 100644
index ..f03bb19709e5
--- /dev/null
+++ b/include/net/tc_act/tc_hash.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __NET_TC_HASH_H
+#define __NET_TC_HASH_H
+
+#include 
+#include 
+
+struct tcf_hash_params {
+   enum tc_hash_alg alg;
+   u32 basis;
+   struct bpf_prog *prog;
+   const char *bpf_name;
+   struct rcu_head rcu;
+};
+
+struct tcf_hash {
+   struct tc_action common;
+   struct tcf_hash_params __rcu *hash_p;
+};
+#define to_hash(a) ((struct tcf_hash *)a)
+
+#endif /* __NET_TC_HASH_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 7576209d96f9..2fd93389d091 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -135,6 +135,7 @@ enum tca_id {
TCA_ID_MPLS,
TCA_ID_CT,
TCA_ID_GATE,
+   TCA_ID_HASH,
/* other actions go here */
__TCA_ID_MAX = 255
 };
diff --git a/include/uapi/linux/tc_act/tc_hash.h 
b/include/uapi/linux/tc_act/tc_hash.h
new file mode 100644
index ..ff3063f70ee0
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_hash.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef __LINUX_TC_HASH_H
+#define __LINUX_TC_HASH_H
+
+#include 
+
+enum tc_hash_alg {
+   TCA_HASH_ALG_L4,
+   TCA_HASH_ALG_BPF,
+};
+
+enum {
+   TCA_HASH_UNSPEC,
+   TCA_HASH_TM,
+   TCA_HASH_PARMS,
+   TCA_HASH_ALG,
+   TCA_HASH_BASIS,
+   TCA_HASH_BPF_FD,
+   TCA_HASH_BPF_NAME,
+   TCA_HASH_BPF_ID,
+   TCA_HASH_BPF_TAG,
+   TCA_HASH_PAD,
+   __TCA_HASH_MAX,
+};
+#define TCA_HASH_MAX (__TCA_HASH_MAX - 1)
+
+struct tc_hash {
+   tc_gen;
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 84badf00647e..e9725bb40f4f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -993,6 +993,17 @@ config NET_ACT_GATE
  To compile this code as a module, choose M here: the
  module will be called act_gate.
 
+config NET_ACT_HASH
+   tristate "Hash calculation action"
+   depends on NET_CLS_ACT
+   help
+ Say Y here to perform hash calculation on packet headers.
+
+ If unsure, say N.
+
+ To compile this co

[PATCH net-next v2 2/3] net/flow_dissector: add packet hash dissection

2020-07-01 Thread Ariel Levkovich
Retreive a hash value from the SKB and store it
in the dissector key for future matching.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/linux/skbuff.h   |  4 
 include/net/flow_dissector.h |  9 +
 net/core/flow_dissector.c| 17 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0c0377fc00c2..beb7fe2c7809 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1342,6 +1342,10 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 struct flow_dissector *flow_dissector,
 void *target_container);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container);
+
 static inline __u32 skb_get_hash(struct sk_buff *skb)
 {
if (!skb->l4_hash && !skb->sw_hash)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..5cc0540ce3f7 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -243,6 +243,14 @@ struct flow_dissector_key_ct {
u32 ct_labels[4];
 };
 
+/**
+ * struct flow_dissector_key_hash:
+ * @hash: hash value
+ */
+struct flow_dissector_key_hash {
+   u32 hash;
+};
+
 enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -271,6 +279,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
+   FLOW_DISSECTOR_KEY_HASH, /* struct flow_dissector_key_hash */
 
FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..c114f0e3ef4f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -392,6 +392,23 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_flow_dissect_tunnel_info);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container)
+{
+   struct flow_dissector_key_hash *key;
+
+   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_HASH))
+   return;
+
+   key = skb_flow_dissector_target(flow_dissector,
+   FLOW_DISSECTOR_KEY_HASH,
+   target_container);
+
+   key->hash = skb_get_hash_raw(skb);
+}
+EXPORT_SYMBOL(skb_flow_dissect_hash);
+
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
-- 
2.25.2



[PATCH net-next v2 3/3] net/sched: cls_flower: Add hash info to flow classification

2020-07-01 Thread Ariel Levkovich
Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 16 
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 2fd93389d091..ef145320ee99 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -579,6 +579,9 @@ enum {
 
TCA_FLOWER_KEY_MPLS_OPTS,
 
+   TCA_FLOWER_KEY_HASH,/* u32 */
+   TCA_FLOWER_KEY_HASH_MASK,   /* u32 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..ff739e0d86fc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -64,6 +64,7 @@ struct fl_flow_key {
};
} tp_range;
struct flow_dissector_key_ct ct;
+   struct flow_dissector_key_hash hash;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -318,6 +319,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
skb_flow_dissect_ct(skb, &mask->dissector, &skb_key,
fl_ct_info_to_flower_map,
ARRAY_SIZE(fl_ct_info_to_flower_map));
+   skb_flow_dissect_hash(skb, &mask->dissector, &skb_key);
skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
 
f = fl_mask_lookup(mask, &skb_key);
@@ -694,6 +696,9 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_CT_LABELS_MASK] = { .type = NLA_BINARY,
.len = 128 / BITS_PER_BYTE },
[TCA_FLOWER_FLAGS]  = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH_MASK]  = { .type = NLA_U32 },
+
 };
 
 static const struct nla_policy
@@ -1625,6 +1630,10 @@ static int fl_set_key(struct net *net, struct nlattr 
**tb,
 
fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+   fl_set_key_val(tb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+  &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+  sizeof(key->hash.hash));
+
if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
ret = fl_set_enc_opt(tb, key, mask, extack);
if (ret)
@@ -1739,6 +1748,8 @@ static void fl_init_dissector(struct flow_dissector 
*dissector,
 FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 FLOW_DISSECTOR_KEY_CT, ct);
+   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+FLOW_DISSECTOR_KEY_HASH, hash);
 
skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -2959,6 +2970,11 @@ static int fl_dump_key(struct sk_buff *skb, struct net 
*net,
if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
goto nla_put_failure;
 
+   if (fl_dump_key_val(skb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+&mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+sizeof(key->hash.hash)))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
-- 
2.25.2



Re: [PATCH net-next 1/3] net/sched: Introduce action hash

2020-06-19 Thread Ariel Levkovich

On 6/19/20 12:13 PM, Davide Caratti wrote:

hello Ariel,

(I'm doing a resend because I suspect that my original reply was dropped
somewhere).

Thanks for your patch! some comments/questions below:

On Fri, 2020-06-19 at 01:15 +0300, Ariel Levkovich wrote:

Allow setting a hash value to a packet for a future match.

The action will determine the packet's hash result according to
the selected hash type.
The first option is to select a basic asymmetric l4 hash calculation
on the packet headers which will either use the skb->hash value as
if such was already calculated and set by the device driver, or it
will perform the kernel jenkins hash function on the packet which will
generate the result otherwise.

If I understand correctly, this new tc action is going to change the skb
metadata based on some operation done on the packet. Linux has already a
tc module that does this job, it's act_skbedit.

Wouldn't it be possible to extend act_skbedit instead of adding a new tc
action? that would save us from some bugs we already encountered in the
past (maybe I spotted a couple of them below), and we would also leverage on
the existing tests.


The other option is for user to provide an BPF program which is
dedicated to calculate the hash. In such case the program is loaded
and used by tc to perform the hash calculation and provide it to
the hash action to be stored in skb->hash field.

The BPF option can be useful for future HW offload support of the hash
calculation by emulating the HW hash function when it's different than
the kernel's but yet we want to maintain consistency between the SW and
the HW.

Like Daniel noticed, this can be done by act_bpf. Using 'jump'
or 'goto_chain' control actions, it should be possible to get to the same
result combining act_skbedit and act_bpf. WDYT?



Hi Davide and Daniel,

First of all, thanks for your review and comments.


I'll try to address both of your comments regarding existing 
alternatives to this new action


here so that we can have a single thread about it.

Act_bpf indeed can provide a similar functionality. Furthermore, there 
are already existing BPF helpers


to allow user to change skb->hash within the BPF program, so there's no 
need to perform act_skbedit


after act_bpf.


However, since we are trying to offer the user multiple methods to 
calculate the hash, and not only


using a BPF program, act_bpf on its own is not enough.

If we are looking at HW offload (as Daniel mentioned), like I mentioned 
in the cover letter,


it is important that SW will be able to get the same hash result as in 
HW for a certain packet.


When certain HW is not able to forward TC the hash result, using a BPF 
program that mimics the


HW hash function is useful to maintain consistency but there are cases 
where the HW can and


does forward the hash value via the received packet's metadata and the 
vendor driver already


fills in the skb->hash with this value. In such cases BPF program usage 
can be avoided.


So to sum it up, this api is offering user both ways to calculate the hash:

1. Use the value that is already there (If the vendor driver already set 
it. If not, calculate using Linux jhash).


2. Use a given BPF program to calculate the hash and to set skb->hash 
with it.



It's true, you can cover both cases with BPF - meaning, always use BPF 
even if HW/driver can provide hash


to TC in other means but we thought about giving an option to avoid 
writing and using BPF when


it's not necessary.


Appreciate your further comments and thoughts about this and of course, 
the code comments


will be addressed and fixed.


Ariel





Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file  \
action goto chain 2

[...]


diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c3934880670..b7e5d060bd2f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -12,6 +12,8 @@
  #include 
  #include 
  
+#define ACT_BPF_NAME_LEN	256

+

(BTW, line above seems to be a leftover. Correct?)


  struct tcf_idrinfo {
struct mutexlock;
struct idr  action_idr;


[...]


new file mode 100644
index ..40a5c34f8745
--- /dev/null
+++ b/net/sched/act_hash.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* -
+ * net/sched/act_hash.c  Hash calculation action
+ *
+ * Author:   Ariel Levkovich 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define ACT_HASH_BPF_NAME_LEN  256
+
+static unsigned int hash_net_id;
+static struct tc_action_ops act_hash_ops;
+
+static int tcf_hash_act(struct sk_buff *skb, const struct tc_action *a,
+   struct tcf_result *res)
+{
+   struct tcf_hash *h = to_hash(a);
+   struct tcf_has

[PATCH net-next 3/3] net/sched: cls_flower: Add hash info to flow classification

2020-06-18 Thread Ariel Levkovich
Adding new cls flower keys for hash value and hash
mask and dissect the hash info from the skb into
the flow key towards flow classication.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 16 
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 2fd93389d091..ef145320ee99 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -579,6 +579,9 @@ enum {
 
TCA_FLOWER_KEY_MPLS_OPTS,
 
+   TCA_FLOWER_KEY_HASH,/* u32 */
+   TCA_FLOWER_KEY_HASH_MASK,   /* u32 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..ff739e0d86fc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -64,6 +64,7 @@ struct fl_flow_key {
};
} tp_range;
struct flow_dissector_key_ct ct;
+   struct flow_dissector_key_hash hash;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -318,6 +319,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
skb_flow_dissect_ct(skb, &mask->dissector, &skb_key,
fl_ct_info_to_flower_map,
ARRAY_SIZE(fl_ct_info_to_flower_map));
+   skb_flow_dissect_hash(skb, &mask->dissector, &skb_key);
skb_flow_dissect(skb, &mask->dissector, &skb_key, 0);
 
f = fl_mask_lookup(mask, &skb_key);
@@ -694,6 +696,9 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_CT_LABELS_MASK] = { .type = NLA_BINARY,
.len = 128 / BITS_PER_BYTE },
[TCA_FLOWER_FLAGS]  = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_HASH_MASK]  = { .type = NLA_U32 },
+
 };
 
 static const struct nla_policy
@@ -1625,6 +1630,10 @@ static int fl_set_key(struct net *net, struct nlattr 
**tb,
 
fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+   fl_set_key_val(tb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+  &mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+  sizeof(key->hash.hash));
+
if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
ret = fl_set_enc_opt(tb, key, mask, extack);
if (ret)
@@ -1739,6 +1748,8 @@ static void fl_init_dissector(struct flow_dissector 
*dissector,
 FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 FLOW_DISSECTOR_KEY_CT, ct);
+   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+FLOW_DISSECTOR_KEY_HASH, hash);
 
skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -2959,6 +2970,11 @@ static int fl_dump_key(struct sk_buff *skb, struct net 
*net,
if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
goto nla_put_failure;
 
+   if (fl_dump_key_val(skb, &key->hash.hash, TCA_FLOWER_KEY_HASH,
+&mask->hash.hash, TCA_FLOWER_KEY_HASH_MASK,
+sizeof(key->hash.hash)))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
-- 
2.25.2



[PATCH net-next 1/3] net/sched: Introduce action hash

2020-06-18 Thread Ariel Levkovich
Allow setting a hash value to a packet for a future match.

The action will determine the packet's hash result according to
the selected hash type.

The first option is to select a basic asymmetric l4 hash calculation
on the packet headers which will either use the skb->hash value as
if such was already calculated and set by the device driver, or it
will perform the kernel jenkins hash function on the packet which will
generate the result otherwise.

The other option is for user to provide an BPF program which is
dedicated to calculate the hash. In such case the program is loaded
and used by tc to perform the hash calculation and provide it to
the hash action to be stored in skb->hash field.

The BPF option can be useful for future HW offload support of the hash
calculation by emulating the HW hash function when it's different than
the kernel's but yet we want to maintain consistency between the SW and
the HW.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action hash asym_l4 basis  \
action goto chain 2

Matching on the result:
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/net/act_api.h   |   2 +
 include/net/tc_act/tc_hash.h|  22 ++
 include/uapi/linux/pkt_cls.h|   1 +
 include/uapi/linux/tc_act/tc_hash.h |  32 +++
 net/sched/Kconfig   |  11 +
 net/sched/Makefile  |   1 +
 net/sched/act_hash.c| 376 
 net/sched/cls_api.c |   1 +
 8 files changed, 446 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c3934880670..b7e5d060bd2f 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#define ACT_BPF_NAME_LEN   256
+
 struct tcf_idrinfo {
struct mutexlock;
struct idr  action_idr;
diff --git a/include/net/tc_act/tc_hash.h b/include/net/tc_act/tc_hash.h
new file mode 100644
index ..f03bb19709e5
--- /dev/null
+++ b/include/net/tc_act/tc_hash.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __NET_TC_HASH_H
+#define __NET_TC_HASH_H
+
+#include 
+#include 
+
+struct tcf_hash_params {
+   enum tc_hash_alg alg;
+   u32 basis;
+   struct bpf_prog *prog;
+   const char *bpf_name;
+   struct rcu_head rcu;
+};
+
+struct tcf_hash {
+   struct tc_action common;
+   struct tcf_hash_params __rcu *hash_p;
+};
+#define to_hash(a) ((struct tcf_hash *)a)
+
+#endif /* __NET_TC_HASH_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 7576209d96f9..2fd93389d091 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -135,6 +135,7 @@ enum tca_id {
TCA_ID_MPLS,
TCA_ID_CT,
TCA_ID_GATE,
+   TCA_ID_HASH,
/* other actions go here */
__TCA_ID_MAX = 255
 };
diff --git a/include/uapi/linux/tc_act/tc_hash.h 
b/include/uapi/linux/tc_act/tc_hash.h
new file mode 100644
index ..ff3063f70ee0
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_hash.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef __LINUX_TC_HASH_H
+#define __LINUX_TC_HASH_H
+
+#include 
+
+enum tc_hash_alg {
+   TCA_HASH_ALG_L4,
+   TCA_HASH_ALG_BPF,
+};
+
+enum {
+   TCA_HASH_UNSPEC,
+   TCA_HASH_TM,
+   TCA_HASH_PARMS,
+   TCA_HASH_ALG,
+   TCA_HASH_BASIS,
+   TCA_HASH_BPF_FD,
+   TCA_HASH_BPF_NAME,
+   TCA_HASH_BPF_ID,
+   TCA_HASH_BPF_TAG,
+   TCA_HASH_PAD,
+   __TCA_HASH_MAX,
+};
+#define TCA_HASH_MAX (__TCA_HASH_MAX - 1)
+
+struct tc_hash {
+   tc_gen;
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 84badf00647e..e9725bb40f4f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -993,6 +993,17 @@ config NET_ACT_GATE
  To compile this code as a module, choose M here: the
  module will be called act_gate.
 
+config NET_ACT_HASH
+   tristate "Hash calculation action"
+   depends on NET_CLS_ACT
+   help
+ Say Y here to perform hash calculation on packet headers.
+
+ If unsure, say N.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_hash.
+
 config NET_IFE_SKBMARK
tristate "Support to enco

[PATCH net-next 2/3] net/flow_dissector: add packet hash dissection

2020-06-18 Thread Ariel Levkovich
Retreive a hash value from the SKB and store it
in the dissector key for future matching.

Signed-off-by: Ariel Levkovich 
Reviewed-by: Jiri Pirko 
---
 include/linux/skbuff.h   |  4 
 include/net/flow_dissector.h |  9 +
 net/core/flow_dissector.c| 17 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0c0377fc00c2..beb7fe2c7809 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1342,6 +1342,10 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 struct flow_dissector *flow_dissector,
 void *target_container);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container);
+
 static inline __u32 skb_get_hash(struct sk_buff *skb)
 {
if (!skb->l4_hash && !skb->sw_hash)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index a7eba43fe4e4..5cc0540ce3f7 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -243,6 +243,14 @@ struct flow_dissector_key_ct {
u32 ct_labels[4];
 };
 
+/**
+ * struct flow_dissector_key_hash:
+ * @hash: hash value
+ */
+struct flow_dissector_key_hash {
+   u32 hash;
+};
+
 enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -271,6 +279,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
FLOW_DISSECTOR_KEY_META, /* struct flow_dissector_key_meta */
FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
+   FLOW_DISSECTOR_KEY_HASH, /* struct flow_dissector_key_hash */
 
FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d02df0b6d0d9..c114f0e3ef4f 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -392,6 +392,23 @@ skb_flow_dissect_tunnel_info(const struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_flow_dissect_tunnel_info);
 
+void skb_flow_dissect_hash(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container)
+{
+   struct flow_dissector_key_hash *key;
+
+   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_HASH))
+   return;
+
+   key = skb_flow_dissector_target(flow_dissector,
+   FLOW_DISSECTOR_KEY_HASH,
+   target_container);
+
+   key->hash = skb_get_hash_raw(skb);
+}
+EXPORT_SYMBOL(skb_flow_dissect_hash);
+
 static enum flow_dissect_ret
 __skb_flow_dissect_mpls(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
-- 
2.25.2



[PATCH net-next 0/3] TC datapath hash api

2020-06-18 Thread Ariel Levkovich
Supporting datapath hash allows user to set up rules that provide
load balancing of traffic across multiple vports and for ECMP path
selection while keeping the number of rule at minimum.

Instead of matching on exact flow spec, which requires a rule per
flow, user can define rules based on hashing on the packet headers
and distribute the flows to different buckets. The number of rules
in this case will be constant and equal to the number of buckets.

The datapath hash functionality is achieved in two steps -
performing the hash action and then matching on the result, as
part of the packet's classification.

The api allows user to define a filter with a tc hash action
where the hash function can be standard asymetric hashing that Linux
offers or alternatively user can provide a bpf program that
performs hash calculation on a packet.

Usage is as follows:

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp \
action hash bpf object-file  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto udp \
action hash bpf asym_l4 basis  \
action goto chain 2

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x0/0xf  \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower hash 0x1/0xf  \
action mirred egress redirect dev ens1f0_2

Ariel Levkovich (3):
  net/sched: Introduce action hash
  net/flow_dissector: add packet hash dissection
  net/sched: cls_flower: Add hash info to flow classification

 include/linux/skbuff.h  |   4 +
 include/net/act_api.h   |   2 +
 include/net/flow_dissector.h|   9 +
 include/net/tc_act/tc_hash.h|  22 ++
 include/uapi/linux/pkt_cls.h|   4 +
 include/uapi/linux/tc_act/tc_hash.h |  32 +++
 net/core/flow_dissector.c   |  17 ++
 net/sched/Kconfig   |  11 +
 net/sched/Makefile  |   1 +
 net/sched/act_hash.c| 376 
 net/sched/cls_api.c |   1 +
 net/sched/cls_flower.c  |  16 ++
 12 files changed, 495 insertions(+)
 create mode 100644 include/net/tc_act/tc_hash.h
 create mode 100644 include/uapi/linux/tc_act/tc_hash.h
 create mode 100644 net/sched/act_hash.c

-- 
2.25.2



Re: BPF for hash calculation

2020-05-19 Thread Ariel Levkovich
On May 19, 2020, at 22:13, Ariel Levkovich  wrote:
> 
> Hi Daniel, Aleksei
> 
> I’m working on a feature to add support for datapath hash actions and 
> matching via TC API.
> 
> One of the options we want to offer users there is to provide their own hash 
> calculation function, in the form of a BPF program.
> The program should accept struct __sk_buff and return a hash value.
> 
> After a little research and testing I noticed that some key parameters in 
> struct __sk_buff are sertricted to BPF_PROG_TYPE_SK_SKB program types while I 
> was planning to re-use the existing  BPF_PROG_TYPE_SCHED_ACT program type for 
> the act_hash purpose.
> The key parameters I’m referring to are fields like remote_ip4, local_ip4, 
> src/dst_port (flow key fields) that are most likely going to be relevant for 
> hash calculation on a packet.
> 
> So my question to you is basically what is the best option here? The way I 
> see it the options are:
> 1. Remove restrictions on these fields for SCHED_ACT program type in 
> kernel/bpf/verifier.c
> 2. Add new program type SCHED_ACT_HASH with permission to access these fields.
> 3. More of a question - is there another way to access the flow keys via 
> struct __sk_buff?
> 
> Appreciate your advice and thanks in advance.
> 
> Best Regards,
> 
> Ariel Levkovich
> Staff engineer, Mellanox SW

Forgot to CC relevant mailing list.
After further digging it seems there’s a pointer to a flow keys struct that is 
not restricted to SK_SKB prog types and should be our best option. The question 
remains whether we can reuse SCHED_ACT type or a new type for hash calculation 
should be added?




[PATCH net-next 1/2] net: bonding: Inherit MPLS features from slave devices

2019-06-03 Thread Ariel Levkovich
When setting the bonding interface net device features,
the kernel code doesn't address the slaves' MPLS features
and doesn't inherit them.

Therefore, HW offloads that enhance performance such as
checksumming and TSO are disabled for MPLS tagged traffic
flowing via the bonding interface.

The patch add the inheritance of the MPLS features from the
slave devices with a similar logic to setting the bonding device's
VLAN and encapsulation features.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
Signed-off-by: Ariel Levkovich 
---
 drivers/net/bonding/bond_main.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 062fa7e..46b5f6d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1077,12 +1077,16 @@ static netdev_features_t bond_fix_features(struct 
net_device *dev,
 #define BOND_ENC_FEATURES  (NETIF_F_HW_CSUM | NETIF_F_SG | \
 NETIF_F_RXCSUM | NETIF_F_ALL_TSO)
 
+#define BOND_MPLS_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \
+NETIF_F_ALL_TSO)
+
 static void bond_compute_features(struct bonding *bond)
 {
unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
IFF_XMIT_DST_RELEASE_PERM;
netdev_features_t vlan_features = BOND_VLAN_FEATURES;
netdev_features_t enc_features  = BOND_ENC_FEATURES;
+   netdev_features_t mpls_features  = BOND_MPLS_FEATURES;
struct net_device *bond_dev = bond->dev;
struct list_head *iter;
struct slave *slave;
@@ -1093,6 +1097,7 @@ static void bond_compute_features(struct bonding *bond)
if (!bond_has_slaves(bond))
goto done;
vlan_features &= NETIF_F_ALL_FOR_ALL;
+   mpls_features &= NETIF_F_ALL_FOR_ALL;
 
bond_for_each_slave(bond, slave, iter) {
vlan_features = netdev_increment_features(vlan_features,
@@ -1101,6 +1106,11 @@ static void bond_compute_features(struct bonding *bond)
enc_features = netdev_increment_features(enc_features,
 
slave->dev->hw_enc_features,
 BOND_ENC_FEATURES);
+
+   mpls_features = netdev_increment_features(mpls_features,
+ 
slave->dev->mpls_features,
+ BOND_MPLS_FEATURES);
+
dst_release_flag &= slave->dev->priv_flags;
if (slave->dev->hard_header_len > max_hard_header_len)
max_hard_header_len = slave->dev->hard_header_len;
@@ -1114,6 +1124,7 @@ static void bond_compute_features(struct bonding *bond)
bond_dev->vlan_features = vlan_features;
bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
NETIF_F_GSO_UDP_L4;
+   bond_dev->mpls_features = mpls_features;
bond_dev->gso_max_segs = gso_max_segs;
netif_set_gso_max_size(bond_dev, gso_max_size);
 
-- 
1.8.3.1



[PATCH net-next 2/2] net: vlan: Inherit MPLS features from parent device

2019-06-03 Thread Ariel Levkovich
During the creation of the VLAN interface net device,
the various device features and offloads are being set based
on the parent device's features.
The code initiates the basic, vlan and encapsulation features
but doesn't address the MPLS features set and they remain blank.
As a result, all device offloads that have significant performance
effect are disabled for MPLS traffic going via this VLAN device such
as checksumming and TSO.

This patch makes sure that MPLS features are also set for the
VLAN device based on the parent which will allow HW offloads of
checksumming and TSO to be performed on MPLS tagged packets.

Signed-off-by: Ariel Levkovich 
---
 net/8021q/vlan_dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f044ae5..585e73d 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -582,6 +582,7 @@ static int vlan_dev_init(struct net_device *dev)
 
dev->vlan_features = real_dev->vlan_features & ~NETIF_F_ALL_FCOE;
dev->hw_enc_features = vlan_tnl_features(real_dev);
+   dev->mpls_features = real_dev->mpls_features;
 
/* ipv6 shared card related stuff */
dev->dev_id = real_dev->dev_id;
-- 
1.8.3.1



[PATCH net-next 0/2] Support MPLS features in bonding and vlan net devices

2019-06-03 Thread Ariel Levkovich
Netdevice HW MPLS features are not passed from device driver's netdevice to
upper netdevice, specifically VLAN and bonding netdevice which are created
by the kernel when needed.

This prevents enablement and usage of HW offloads, such as TSO and checksumming
for MPLS tagged traffic when running via VLAN or bonding interface.

The patches introduce changes to the initialization steps of the VLAN and 
bonding
netdevices to inherit the MPLS features from lower netdevices to allow the HW
offloads.

Ariel Levkovich (2):
  net: bonding: Inherit MPLS features from slave devices
  net: vlan: Inherit MPLS features from parent device

 drivers/net/bonding/bond_main.c | 11 +++
 net/8021q/vlan_dev.c|  1 +
 2 files changed, 12 insertions(+)

-- 
1.8.3.1



[PATCH RFC 2/2] net: vlan: Inherit MPLS features from parent device

2019-05-24 Thread Ariel Levkovich
During the creation of the VLAN interface net device,
the various device features and offloads are being set based
on the parent device's features.
The code initiates the basic, vlan and encapsulation features
but doesn't address the MPLS features set and they remain blank.
As a result, all device offloads that have significant performance
effect are disabled for MPLS traffic going via this VLAN device such
as checksumming and TSO.

This patch makes sure that MPLS features are also set for the
VLAN device based on the parent which will allow HW offloads of
checksumming and TSO to be performed on MPLS tagged packets.

Signed-off-by: Ariel Levkovich 
---
 net/8021q/vlan_dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f044ae5..585e73d 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -582,6 +582,7 @@ static int vlan_dev_init(struct net_device *dev)
 
dev->vlan_features = real_dev->vlan_features & ~NETIF_F_ALL_FCOE;
dev->hw_enc_features = vlan_tnl_features(real_dev);
+   dev->mpls_features = real_dev->mpls_features;
 
/* ipv6 shared card related stuff */
dev->dev_id = real_dev->dev_id;
-- 
1.8.3.1



[PATCH RFC 0/2] Support MPLS features in bonding and vlan net devices

2019-05-24 Thread Ariel Levkovich
Netdevice HW MPLS features are not passed from device driver's netdevice to
upper netdevice, specifically VLAN and bonding netdevice which are created
by the kernel when needed.

This prevents enablement and usage of HW offloads, such as TSO and checksumming
for MPLS tagged traffic when running via VLAN or bonding interface.

The patches introduce changes to the initialization steps of the VLAN and 
bonding
netdevices to inherit the MPLS features from lower netdevices to allow the HW
offloads.

Ariel Levkovich (2):
  net: bonding: Inherit MPLS features from slave devices
  net: vlan: Inherit MPLS features from parent device

 drivers/net/bonding/bond_main.c | 11 +++
 net/8021q/vlan_dev.c|  1 +
 2 files changed, 12 insertions(+)

-- 
1.8.3.1



[PATCH RFC 1/2] net: bonding: Inherit MPLS features from slave devices

2019-05-24 Thread Ariel Levkovich
When setting the bonding interface net device features,
the kernel code doesn't address the slaves' MPLS features
and doesn't inherit them.

Therefore, HW offloads that enhance performance such as
checksumming and TSO are disabled for MPLS tagged traffic
flowing via the bonding interface.

The patch add the inheritance of the MPLS features from the
slave devices with a similar logic to setting the bonding device's
VLAN and encapsulation features.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
Signed-off-by: Ariel Levkovich 
---
 drivers/net/bonding/bond_main.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 062fa7e..46b5f6d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1077,12 +1077,16 @@ static netdev_features_t bond_fix_features(struct 
net_device *dev,
 #define BOND_ENC_FEATURES  (NETIF_F_HW_CSUM | NETIF_F_SG | \
 NETIF_F_RXCSUM | NETIF_F_ALL_TSO)
 
+#define BOND_MPLS_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | \
+NETIF_F_ALL_TSO)
+
 static void bond_compute_features(struct bonding *bond)
 {
unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
IFF_XMIT_DST_RELEASE_PERM;
netdev_features_t vlan_features = BOND_VLAN_FEATURES;
netdev_features_t enc_features  = BOND_ENC_FEATURES;
+   netdev_features_t mpls_features  = BOND_MPLS_FEATURES;
struct net_device *bond_dev = bond->dev;
struct list_head *iter;
struct slave *slave;
@@ -1093,6 +1097,7 @@ static void bond_compute_features(struct bonding *bond)
if (!bond_has_slaves(bond))
goto done;
vlan_features &= NETIF_F_ALL_FOR_ALL;
+   mpls_features &= NETIF_F_ALL_FOR_ALL;
 
bond_for_each_slave(bond, slave, iter) {
vlan_features = netdev_increment_features(vlan_features,
@@ -1101,6 +1106,11 @@ static void bond_compute_features(struct bonding *bond)
enc_features = netdev_increment_features(enc_features,
 
slave->dev->hw_enc_features,
 BOND_ENC_FEATURES);
+
+   mpls_features = netdev_increment_features(mpls_features,
+ 
slave->dev->mpls_features,
+ BOND_MPLS_FEATURES);
+
dst_release_flag &= slave->dev->priv_flags;
if (slave->dev->hard_header_len > max_hard_header_len)
max_hard_header_len = slave->dev->hard_header_len;
@@ -1114,6 +1124,7 @@ static void bond_compute_features(struct bonding *bond)
bond_dev->vlan_features = vlan_features;
bond_dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL |
NETIF_F_GSO_UDP_L4;
+   bond_dev->mpls_features = mpls_features;
bond_dev->gso_max_segs = gso_max_segs;
netif_set_gso_max_size(bond_dev, gso_max_size);
 
-- 
1.8.3.1



RE: [PATCH net-next 6/9] net/mlx4_en: Adding support of turning off link autonegotiation via ethtool

2017-01-16 Thread Ariel Levkovich
+   MLX4_PTYS_AN_DISABLE_CAP   = 1 << 5,

It's there.

Best Regards,

Ariel Levkovich
Staff Engineer, SW
Mellanox Technologies
Beit Mellanox, Yokneam, P.O.Box 586, Israel 20692 
Office: +972-74723-7652, Mobile: +972-52-3947-704 
www.mellanox.com 
Follow us on Facebook and Twitter

> -Original Message-
> From: Or Gerlitz [mailto:gerlitz...@gmail.com]
> Sent: Monday, January 16, 2017 4:48 PM
> To: Tariq Toukan ; Ariel Levkovich
> 
> Cc: David S. Miller ; Linux Netdev List
> ; Eran Ben Elisha 
> Subject: Re: [PATCH net-next 6/9] net/mlx4_en: Adding support of turning
> off link autonegotiation via ethtool
> 
> On Mon, Jan 16, 2017 at 7:30 PM, Tariq Toukan 
> wrote:
> 0644
> > --- a/include/linux/mlx4/device.h
> > +++ b/include/linux/mlx4/device.h
> > @@ -1539,8 +1539,13 @@ enum mlx4_ptys_proto {
> > MLX4_PTYS_EN = 1<<2,
> >  };
> >
> > +enum mlx4_ptys_flags {
> > +   MLX4_PTYS_AN_DISABLE_CAP   = 1 << 5,
> > +   MLX4_PTYS_AN_DISABLE_ADMIN = 1 << 6, };
> > +
> >  struct mlx4_ptys_reg {
> > -   u8 resrvd1;
> > +   u8 flags;
> > u8 local_port;
> > u8 resrvd2;
> > u8 proto_mask;
> 
> Hi Ariel,
> 
> I didn't see any new dev cap bit, what happens with FW version which don't
> support this, is the result well defined? what is it?
> 
> Or.