Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread Jamal Hadi Salim

On 16-02-25 04:46 PM, Daniel Borkmann wrote:

On 02/25/2016 01:20 PM, Jamal Hadi Salim wrote:



Let me think about it. Likely it will be subsequent patches - I just
want to get this set out first.


Yes, I mean one of the key motivation was "[...] to horizontally scale
packet processing at scope of a chasis or rack [...]".



By adding more processing points horizontally. Please read the paper;
I think we are getting to a point where this discussion is no longer
productive.


So for people
who don't have that NIC with embedded Cavium processor, they might
already hit scalability issues for encode/decode right there.



A lock on a policy that already matches is not a serious
scaling problem that you need to offload to an embedded NIC.
Do note the point here is to scale by adding more machines.
And this shit is deployed and has proven it does scale.

cheers,
jamal


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread John Fastabend
On 16-02-25 01:46 PM, Daniel Borkmann wrote:
> On 02/25/2016 01:20 PM, Jamal Hadi Salim wrote:
>> On 16-02-24 12:37 PM, Daniel Borkmann wrote:
>>> On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:
 From: Jamal Hadi Salim 
>>> [...]
 +static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
 +[TCA_IFE_PARMS] = {.len = sizeof(struct tc_ife)},
 +[TCA_IFE_DMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
 +[TCA_IFE_SMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
>>>
>>> This is buggy btw ...
>>
>> I am sure i cutnpasted that from somewhere. Thanks for catching
>> it; I will remove NLA_BINARY ref.
> 
> Yeah, NLA_BINARY seems to be a bit of a misleading name. We should
> probably audit, if there are more such users already in the tree.
> 

At some point in the past (maybe a year ago?) I went through and
fixed a handful of these but yeah it seems to be a common error.

> [...]
>>> Maybe try to make this lockless in the fast path? Otherwise placing
>>> this on ingress / egress (f.e. clsact) doesn't really scale.
>>
>> Let me think about it. Likely it will be subsequent patches - I just
>> want to get this set out first.
> 
> Yes, I mean one of the key motivation was "[...] to horizontally scale
> packet processing at scope of a chasis or rack [...]". So for people
> who don't have that NIC with embedded Cavium processor, they might
> already hit scalability issues for encode/decode right there.
> 
> Thanks again,
> Daniel



Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread Daniel Borkmann

On 02/25/2016 01:20 PM, Jamal Hadi Salim wrote:

On 16-02-24 12:37 PM, Daniel Borkmann wrote:

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

[...]

+static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
+[TCA_IFE_PARMS] = {.len = sizeof(struct tc_ife)},
+[TCA_IFE_DMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
+[TCA_IFE_SMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},


This is buggy btw ...


I am sure i cutnpasted that from somewhere. Thanks for catching
it; I will remove NLA_BINARY ref.


Yeah, NLA_BINARY seems to be a bit of a misleading name. We should
probably audit, if there are more such users already in the tree.

[...]

Maybe try to make this lockless in the fast path? Otherwise placing
this on ingress / egress (f.e. clsact) doesn't really scale.


Let me think about it. Likely it will be subsequent patches - I just
want to get this set out first.


Yes, I mean one of the key motivation was "[...] to horizontally scale
packet processing at scope of a chasis or rack [...]". So for people
who don't have that NIC with embedded Cavium processor, they might
already hit scalability issues for encode/decode right there.

Thanks again,
Daniel


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread Jamal Hadi Salim

On 16-02-24 12:37 PM, Daniel Borkmann wrote:

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

[...]

+static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
+[TCA_IFE_PARMS] = {.len = sizeof(struct tc_ife)},
+[TCA_IFE_DMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
+[TCA_IFE_SMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},


This is buggy btw ...



I am sure i cutnpasted that from somewhere. Thanks for catching
it; I will remove NLA_BINARY ref.


+[TCA_IFE_TYPE] = {.type = NLA_U16},
+};


[...]


+if (parm->flags & IFE_ENCODE) {
+ife_type = *(u16 *) nla_data(tb[TCA_IFE_TYPE]);


( We have accessors for such things. Please also check coding style
   and white space things in your series, there's couple of things all
   over the place. )



Modern git tells you about white spaces - maybe i didnt stare long
enough ;-> I will use the accessor in next update.



Maybe try to make this lockless in the fast path? Otherwise placing
this on ingress / egress (f.e. clsact) doesn't really scale.




Let me think about it. Likely it will be subsequent patches - I just
want to get this set out first.

Thanks Daniel.

cheers,
jamal


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-24 Thread Cong Wang
On Wed, Feb 24, 2016 at 5:09 AM, Jamal Hadi Salim  wrote:
> On 16-02-23 04:44 PM, Cong Wang wrote:
>>
>> On Tue, Feb 23, 2016 at 4:49 AM, Jamal Hadi Salim 
>> wrote:
>
>
>>> +#define MODULE_ALIAS_IFE_META(metan)   MODULE_ALIAS("ifemeta"
>>> __stringify_1(metan))
>>> +
>>> +int get_meta_u32(struct sk_buff *skb, struct tcf_meta_info *mi);
>>> +int get_meta_u16(struct sk_buff *skb, struct tcf_meta_info *mi);
>>> +int tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void
>>> *dval);
>>> +int alloc_meta_u32(struct tcf_meta_info *mi, void *metaval);
>>> +int alloc_meta_u16(struct tcf_meta_info *mi, void *metaval);
>>> +int check_meta_u32(u32 metaval, struct tcf_meta_info *mi);
>>> +int encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info
>>> *mi);
>>> +int validate_meta_u32(void *val, int len);
>>> +int validate_meta_u16(void *val, int len);
>>> +void release_meta_gen(struct tcf_meta_info *mi);
>>> +int register_ife_op(struct tcf_meta_ops *mops);
>>> +int unregister_ife_op(struct tcf_meta_ops *mops);
>>
>>
>>
>> These are exported to other modules, please add some prefix to protect
>> them.
>>
>
> suggestion? I thought they seemed unique enough.
>

I would pick "ife_", for example, ife_get_meta_u32() ...


>
>>> + * copyright   Jamal Hadi Salim (2015)
>>
>>
>>
>> 2016? ;)
>>
>
> Yes. This code has been around since then. Actually earlier than that
> running. But i formatted it for kernel inclusion in about first week
> of January. Dave asked me to wait for the ethertype to get it in.
> The IETF still hasnt come through a year later and so i re-submitted
> with no default ethertype. I think 2015 is deserving here, no?
> I hope i dont spend another year discussing on the list ;-> /me runs

I thought it should be always the time when you submit the code, but
you are the author you have the right to make it whatever you want... ;)

[...]

>>> +   if ((parm->flags & IFE_ENCODE) && (parm->flags & IFE_DECODE)) {
>>> +   pr_info("Ambigous: Cant have both encode and decode\n");
>>> +   return -EINVAL;
>>> +   }
>>
>>
>>
>> Is it possible to fold them into one bit?
>
>
> As in the check you mean?
>

I meant to suggest to make IFE_ENCODE and IFE_DECODE share
one bit.


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-24 Thread Daniel Borkmann

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

[...]

+static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
+   [TCA_IFE_PARMS] = {.len = sizeof(struct tc_ife)},
+   [TCA_IFE_DMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
+   [TCA_IFE_SMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},


This is buggy btw ...


+   [TCA_IFE_TYPE] = {.type = NLA_U16},
+};


[...]


+   if (parm->flags & IFE_ENCODE) {
+   ife_type = *(u16 *) nla_data(tb[TCA_IFE_TYPE]);


( We have accessors for such things. Please also check coding style
  and white space things in your series, there's couple of things all
  over the place. )


+   if (tb[TCA_IFE_DMAC] != NULL)
+   daddr = nla_data(tb[TCA_IFE_DMAC]);
+   if (tb[TCA_IFE_SMAC] != NULL)
+   saddr = nla_data(tb[TCA_IFE_SMAC]);


... NLA_BINARY with len means: max length. But there's nothing
that checks a min length on this, so you potentially access out
of bounds since everything <= ETH_ALEN is allowed in your case.


+   }
+
+   spin_lock_bh(>tcf_lock);


Maybe try to make this lockless in the fast path? Otherwise placing
this on ingress / egress (f.e. clsact) doesn't really scale.


+   ife->tcf_action = parm->action;
+
+   if (parm->flags & IFE_ENCODE) {
+   if (daddr)
+   ether_addr_copy(ife->eth_dst, daddr);
+   else
+   eth_zero_addr(ife->eth_dst);
+
+   if (saddr)
+   ether_addr_copy(ife->eth_src, saddr);
+   else
+   eth_zero_addr(ife->eth_src);
+
+   ife->eth_type = ife_type;
+   }


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-24 Thread Jamal Hadi Salim

On 16-02-23 04:44 PM, Cong Wang wrote:

On Tue, Feb 23, 2016 at 4:49 AM, Jamal Hadi Salim  wrote:



+#define MODULE_ALIAS_IFE_META(metan)   MODULE_ALIAS("ifemeta" 
__stringify_1(metan))
+
+int get_meta_u32(struct sk_buff *skb, struct tcf_meta_info *mi);
+int get_meta_u16(struct sk_buff *skb, struct tcf_meta_info *mi);
+int tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval);
+int alloc_meta_u32(struct tcf_meta_info *mi, void *metaval);
+int alloc_meta_u16(struct tcf_meta_info *mi, void *metaval);
+int check_meta_u32(u32 metaval, struct tcf_meta_info *mi);
+int encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi);
+int validate_meta_u32(void *val, int len);
+int validate_meta_u16(void *val, int len);
+void release_meta_gen(struct tcf_meta_info *mi);
+int register_ife_op(struct tcf_meta_ops *mops);
+int unregister_ife_op(struct tcf_meta_ops *mops);



These are exported to other modules, please add some prefix to protect them.



suggestion? I thought they seemed unique enough.



+ * copyright   Jamal Hadi Salim (2015)



2016? ;)



Yes. This code has been around since then. Actually earlier than that
running. But i formatted it for kernel inclusion in about first week
of January. Dave asked me to wait for the ethertype to get it in.
The IETF still hasnt come through a year later and so i re-submitted
with no default ethertype. I think 2015 is deserving here, no?
I hope i dont spend another year discussing on the list ;-> /me runs






+   return 8;



Why 8?



Magic number;-> I will add a comment.
It is a TLV that includes 4 bytes. I am going to wait for
comments to settle then make an update.


+



+
+int alloc_meta_u32(struct tcf_meta_info *mi, void *metaval)
+{
+   mi->metaval = kzalloc(sizeof(u32), GFP_KERNEL);
+   if (!mi->metaval)
+   return -ENOMEM;
+
+   memcpy(mi->metaval, metaval, 4);



kmemdup()?



Sure. I'll take care of the rest you pointed to.








No need to initi it since list_add_tail() is just one line below.



+   list_add_tail(>list, );
+   write_unlock(_mod_lock);
+   return 0;
+}
+
+int unregister_ife_op(struct tcf_meta_ops *mops)
+{
+   struct tcf_meta_ops *m;
+   int err = -ENOENT;
+
+   write_lock(_mod_lock);
+   list_for_each_entry(m, , list) {
+   if (m->metaid == mops->metaid) {
+   list_del(>list);
+   err = 0;
+   break;
+   }
+   }
+   write_unlock(_mod_lock);
+
+   return err;
+}
+
+EXPORT_SYMBOL_GPL(register_ife_op);
+EXPORT_SYMBOL_GPL(unregister_ife_op);


Move them next to their definitions.



+
+void print_ife_oplist(void)
+{
+   struct tcf_meta_ops *o;
+   int i = 0;
+
+   read_lock(_mod_lock);
+   list_for_each_entry(o, , list) {
+   pr_debug("%d: %s metaid %d\n", ++i, o->name, o->metaid);
+   }
+   read_unlock(_mod_lock);
+}
+
+/* called when adding new meta information
+ * under ife->tcf_lock
+*/
+int load_metaops_and_vet(struct tcf_ife_info *ife, u32 metaid,
+void *val, int len)



I failed to understand the name of this function.



It tries to load a metadata kernel module and vets (or validates)
the passed parameters from user space.
Suggestions?


+{
+   struct tcf_meta_ops *ops = find_ife_oplist(metaid);
+   int ret = 0;
+
+   if (!ops) {
+   ret = -ENOENT;
+#ifdef CONFIG_MODULES
+   spin_unlock_bh(>tcf_lock);
+   rtnl_unlock();
+   request_module("ifemeta%u", metaid);
+   rtnl_lock();
+   spin_lock_bh(>tcf_lock);
+   ops = find_ife_oplist(metaid);
+#endif
+   }
+
+   if (ops) {
+   ret = 0;
+
+   /* XXX: unfortunately cant use nla_policy at this point
+* because a length of 0 is valid in the case of
+* "allow". "use" semantics do enforce for proper
+* length and i couldve use nla_policy but it makes it hard
+* to use it just for that..
+*/
+   if (len) {
+   if (ops->validate) {
+   ret = ops->validate(val, len);
+   } else {
+   if (ops->metatype == NLA_U32) {
+   ret = validate_meta_u32(val, len);
+   } else if (ops->metatype == NLA_U16) {
+   ret = validate_meta_u16(val, len);
+   }
+   }
+   }



Sounds like this should be in a separated function.



Could be.



+int add_metainfo(struct tcf_ife_info *ife, u32 metaid, void *metaval, int len)
+{
+   struct tcf_meta_info *mi = NULL;
+   struct tcf_meta_ops *ops = find_ife_oplist(metaid);
+   int ret = 0;
+
+   if (!ops) {
+   

Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-24 Thread Jamal Hadi Salim

On 16-02-23 11:12 AM, Daniel Borkmann wrote:

On 02/23/2016 03:39 PM, Jamal Hadi Salim wrote:




My question was rather: should the kernel enforce the IDs and only
allow what the kernel dictates (and not in/out of tree modules)? If
yes, then there would be no need for a module parameter (and the
module param should be avoided in any case).



Maybe i should just take it out for now and assume whatever is
in the kernel are the only allowed metadata.

cheers,
jamal


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-24 Thread Jamal Hadi Salim

On 16-02-24 12:46 AM, Simon Horman wrote:

On Tue, Feb 23, 2016 at 05:12:34PM +0100, Daniel Borkmann wrote:




 From my point of view the test should be weather the encapsulation might
reasonably be expected to be used outside of the context of tc. If so then
it makes sense to use a netdev to allow sharing of infrastructure between
different kernel components.

I suspect the answer to that question is no and thus IMHO a netdev would be
nice to have rather than compelling.

With regards to overhead of netdevs: I think that could be mitigated to
some extent by using LWT or some other metadata-based approach to allow a
single netdev to be use by multiple tc action instances.


We actually have a use case where we offload this thing into
an embedded NIC.

In any case it doesnt make much sense to use a netdev for reasons i
specified. Just like it doesnt make sense when i want a policy which
pushes or pops vlans or vxlans to use netdevs either.
Yes it quacks like a duck(i.e has receive) and walks like a duck(has
stats) but it looks like an ostrich;->

cheers,
jamal





Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Simon Horman
On Tue, Feb 23, 2016 at 05:12:34PM +0100, Daniel Borkmann wrote:
> On 02/23/2016 03:39 PM, Jamal Hadi Salim wrote:
> >On 16-02-23 08:32 AM, Daniel Borkmann wrote:
> >>On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:
> >>>From: Jamal Hadi Salim 
> >>>
> >>>This action allows for a sending side to encapsulate arbitrary metadata
> >>>which is decapsulated by the receiving end.
> >>>The sender runs in encoding mode and the receiver in decode mode.
> >>>Both sender and receiver must specify the same ethertype.
> >>>At some point we hope to have a registered ethertype and we'll
> >>>then provide a default so the user doesnt have to specify it.
> >>>For now we enforce the user specify it.
> >>>
> >>>Lets show example usage where we encode icmp from a sender towards
> >>>a receiver with an skbmark of 17; both sender and receiver use
> >>>ethertype of 0xdead to interop.
> >>
> >>On a conceptual level, as this is an L2 encap with TLVs, why not having
> >>a normal device driver for this like we have in other cases that would
> >>encode/decode the meta data itself?
> >
> >netdevs dont scale for large number of policies. See why ipsec which
> >at one point was implemented using a netdev and why xfrm eventually
> >was chosen as the way forward. Or look at the recent lwt
> >effort.
> 
> Sure, I'm just saying that it could conceptionally be similar to the
> collect metadata idea just on L2 in your case. The encoding/decoding
> and transport of the information is actually not overly tc specific
> at least from the code that's shown so far, just a thought.

>From my point of view the test should be weather the encapsulation might
reasonably be expected to be used outside of the context of tc. If so then
it makes sense to use a netdev to allow sharing of infrastructure between
different kernel components.

I suspect the answer to that question is no and thus IMHO a netdev would be
nice to have rather than compelling.

With regards to overhead of netdevs: I think that could be mitigated to
some extent by using LWT or some other metadata-based approach to allow a
single netdev to be use by multiple tc action instances.

[snip]


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Cong Wang
On Tue, Feb 23, 2016 at 4:49 AM, Jamal Hadi Salim  wrote:
> From: Jamal Hadi Salim 
>
> This action allows for a sending side to encapsulate arbitrary metadata
> which is decapsulated by the receiving end.
> The sender runs in encoding mode and the receiver in decode mode.
> Both sender and receiver must specify the same ethertype.
> At some point we hope to have a registered ethertype and we'll
> then provide a default so the user doesnt have to specify it.
> For now we enforce the user specify it.
>
> Lets show example usage where we encode icmp from a sender towards
> a receiver with an skbmark of 17; both sender and receiver use
> ethertype of 0xdead to interop.
>
> : Lets start with Receiver-side policy config:
> xxx: add an ingress qdisc
> sudo tc qdisc add dev $ETH ingress
>
> xxx: any packets with ethertype 0xdead will be subjected to ife decoding
> xxx: we then restart the classification so we can match on icmp at prio 3
> sudo $TC filter add dev $ETH parent : prio 2 protocol 0xdead \
> u32 match u32 0 0 flowid 1:1 \
> action ife decode reclassify
>
> xxx: on restarting the classification from above if it was an icmp
> xxx: packet, then match it here and continue to the next rule at prio 4
> xxx: which will match based on skb mark of 17
> sudo tc filter add dev $ETH parent : prio 3 protocol ip \
> u32 match ip protocol 1 0xff flowid 1:1 \
> action continue
>
> xxx: match on skbmark of 0x11 (decimal 17) and accept
> sudo tc filter add dev $ETH parent : prio 4 protocol ip \
> handle 0x11 fw flowid 1:1 \
> action ok
>
> xxx: Lets show the decoding policy
> sudo tc -s filter ls dev $ETH parent : protocol 0xdead
> xxx:
> filter pref 2 u32
> filter pref 2 u32 fh 800: ht divisor 1
> filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule 
> hit 0 success 0)
>   match / at 0 (success 0 )
> action order 1: ife decode action reclassify
>  index 1 ref 1 bind 1 installed 14 sec used 14 sec
>  type: 0x0
>  Metadata: allow mark allow hash allow prio allow qmap
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> xxx:
> Observe that above lists all metadatum it can decode. Typically these
> submodules will already be compiled into a monolithic kernel or
> loaded as modules
>
> : Lets show the sender side now ..
>
> xxx: Add an egress qdisc on the sender netdev
> sudo tc qdisc add dev $ETH root handle 1: prio
> xxx:
> xxx: Match all icmp packets to 192.168.122.237/24, then
> xxx: tag the packet with skb mark of decimal 17, then
> xxx: Encode it with:
> xxx:ethertype 0xdead
> xxx:add skb->mark to whitelist of metadatum to send
> xxx:rewrite target dst MAC address to 02:15:15:15:15:15
> xxx:
> sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
> match ip dst 192.168.122.237/24 \
> match ip protocol 1 0xff \
> flowid 1:2 \
> action skbedit mark 17 \
> action ife encode \
> type 0xDEAD \
> allow mark \
> dst 02:15:15:15:15:15
>
> xxx: Lets show the encoding policy
> sudo tc -s filter ls dev $ETH parent 1: protocol ip
> xxx:
> filter pref 10 u32
> filter pref 10 u32 fh 800: ht divisor 1
> filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule 
> hit 0 success 0)
>   match c0a87aed/ at 16 (success 0 )
>   match 0001/00ff at 8 (success 0 )
>
> action order 1:  skbedit mark 17
>  index 6 ref 1 bind 1
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> action order 2: ife encode action pipe
>  index 3 ref 1 bind 1
>  dst MAC: 02:15:15:15:15:15 type: 0xDEAD
>  Metadata: allow mark
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> xxx:
>
> test by sending ping from sender to destination
>
> Signed-off-by: Jamal Hadi Salim 
> ---
>  include/net/tc_act/tc_ife.h|  60 +++
>  include/uapi/linux/tc_act/tc_ife.h |  38 ++
>  net/sched/Kconfig  |  12 +
>  net/sched/Makefile |   1 +
>  net/sched/act_ife.c| 865 
> +
>  5 files changed, 976 insertions(+)
>  create mode 100644 include/net/tc_act/tc_ife.h
>  create mode 100644 include/uapi/linux/tc_act/tc_ife.h
>  create mode 100644 net/sched/act_ife.c
>
> diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
> new file mode 100644
> index 000..fbbc23c
> --- /dev/null
> +++ b/include/net/tc_act/tc_ife.h
> @@ -0,0 +1,60 @@
> +#ifndef __NET_TC_IFE_H
> +#define __NET_TC_IFE_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define IFE_METAHDRLEN 2
> +struct tcf_ife_info {
> +   struct tcf_common common;
> +   u8 eth_dst[ETH_ALEN];
> +   u8 

Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Cong Wang
On Tue, Feb 23, 2016 at 8:12 AM, Daniel Borkmann  wrote:
> On 02/23/2016 03:39 PM, Jamal Hadi Salim wrote:
>>
>> On 16-02-23 08:32 AM, Daniel Borkmann wrote:
>>>
>>> On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

 From: Jamal Hadi Salim 

 This action allows for a sending side to encapsulate arbitrary metadata
 which is decapsulated by the receiving end.
 The sender runs in encoding mode and the receiver in decode mode.
 Both sender and receiver must specify the same ethertype.
 At some point we hope to have a registered ethertype and we'll
 then provide a default so the user doesnt have to specify it.
 For now we enforce the user specify it.

 Lets show example usage where we encode icmp from a sender towards
 a receiver with an skbmark of 17; both sender and receiver use
 ethertype of 0xdead to interop.
>>>
>>>
>>> On a conceptual level, as this is an L2 encap with TLVs, why not having
>>> a normal device driver for this like we have in other cases that would
>>> encode/decode the meta data itself?
>>
>>
>> netdevs dont scale for large number of policies. See why ipsec which
>> at one point was implemented using a netdev and why xfrm eventually
>> was chosen as the way forward. Or look at the recent lwt
>> effort.
>
>
> Sure, I'm just saying that it could conceptionally be similar to the
> collect metadata idea just on L2 in your case. The encoding/decoding
> and transport of the information is actually not overly tc specific
> at least from the code that's shown so far, just a thought.
>

TC offers more flexibility than netdev, but on the other hand, TC
actions sit too deep in TC, you know, netdev -> qdisc -> filter -> action...


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Daniel Borkmann

On 02/23/2016 03:39 PM, Jamal Hadi Salim wrote:

On 16-02-23 08:32 AM, Daniel Borkmann wrote:

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.


On a conceptual level, as this is an L2 encap with TLVs, why not having
a normal device driver for this like we have in other cases that would
encode/decode the meta data itself?


netdevs dont scale for large number of policies. See why ipsec which
at one point was implemented using a netdev and why xfrm eventually
was chosen as the way forward. Or look at the recent lwt
effort.


Sure, I'm just saying that it could conceptionally be similar to the
collect metadata idea just on L2 in your case. The encoding/decoding
and transport of the information is actually not overly tc specific
at least from the code that's shown so far, just a thought.


If i was to implement this as a netdev - I would have to either
have actions to redirect to it or plumb it on top of parent
or child devices. The main point is i am extending the tc
graph; it doesnt make sense for me to create a device just
for that when i could implement it as yet another action.
And the most important reason of all: I like to implement it
as an action;->


Why does IFE_META_MAX need to be configurable as a module parameter?

Shouldn't the core kernel be in charge of the IFE_META_*?


I struggled with that earlier.
I cant think of a good way to limit the number of metadata
the kernel allows for decoding without putting an upper bound.
In order to allow people to write kernel modules without worrying
about what is currently is hardcoded in the header file the
only approach i could think of was to allow this number to be
reset.


My question was rather: should the kernel enforce the IDs and only
allow what the kernel dictates (and not in/out of tree modules)? If
yes, then there would be no need for a module parameter (and the
module param should be avoided in any case).


I have some discovery code i took out - will submit later
which looks at these sorts of parameters.


Thanks again,
Daniel


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Jamal Hadi Salim

On 16-02-23 08:32 AM, Daniel Borkmann wrote:

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.


On a conceptual level, as this is an L2 encap with TLVs, why not having
a normal device driver for this like we have in other cases that would
encode/decode the meta data itself?



netdevs dont scale for large number of policies. See why ipsec which
at one point was implemented using a netdev and why xfrm eventually
was chosen as the way forward. Or look at the recent lwt
effort.
If i was to implement this as a netdev - I would have to either
have actions to redirect to it or plumb it on top of parent
or child devices. The main point is i am extending the tc
graph; it doesnt make sense for me to create a device just
for that when i could implement it as yet another action.
And the most important reason of all: I like to implement it
as an action;->


Why does IFE_META_MAX need to be configurable as a module parameter?

Shouldn't the core kernel be in charge of the IFE_META_*?



I struggled with that earlier.
I cant think of a good way to limit the number of metadata
the kernel allows for decoding without putting an upper bound.
In order to allow people to write kernel modules without worrying
about what is currently is hardcoded in the header file the
only approach i could think of was to allow this number to be
reset.
I have some discovery code i took out - will submit later
which looks at these sorts of parameters.

cheers,
jamal


Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Daniel Borkmann

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.


On a conceptual level, as this is an L2 encap with TLVs, why not having
a normal device driver for this like we have in other cases that would
encode/decode the meta data itself?

[...]

+/*XXX: We need to encode the total number of bytes consumed */
+enum {
+   TCA_IFE_UNSPEC,
+   TCA_IFE_PARMS,
+   TCA_IFE_TM,
+   TCA_IFE_DMAC,
+   TCA_IFE_SMAC,
+   TCA_IFE_TYPE,
+   TCA_IFE_METALST,
+   __TCA_IFE_MAX
+};
+#define TCA_IFE_MAX (__TCA_IFE_MAX - 1)
+
+#define IFE_META_SKBMARK 1
+#define IFE_META_HASHID 2
+#defineIFE_META_PRIO 3
+#defineIFE_META_QMAP 4
+/*Can be overriden at runtime by module option*/
+#define__IFE_META_MAX 5
+#define IFE_META_MAX (__IFE_META_MAX - 1)
+
+#endif

[...]

+module_init(ife_init_module);
+module_exit(ife_cleanup_module);
+
+module_param(max_metacnt, int, 0);
+MODULE_AUTHOR("Jamal Hadi Salim(2015)");
+MODULE_DESCRIPTION("Inter-FE LFB action");
+MODULE_LICENSE("GPL");
+MODULE_PARM_DESC(max_metacnt, "Upper bound of metadata id");


Why does IFE_META_MAX need to be configurable as a module parameter?

Shouldn't the core kernel be in charge of the IFE_META_*?

Thanks,
Daniel


[net-next PATCH v2 1/5] introduce IFE action

2016-02-23 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent : prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent : prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent : prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent : protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 
0 success 0)
  match / at 0 (success 0 )
action order 1: ife decode action reclassify
 index 1 ref 1 bind 1 installed 14 sec used 14 sec
 type: 0x0
 Metadata: allow mark allow hash allow prio allow qmap
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

: Lets show the sender side now ..

xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:ethertype 0xdead
xxx:add skb->mark to whitelist of metadatum to send
xxx:rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
sudo tc -s filter ls dev $ETH parent 1: protocol ip
xxx:
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule 
hit 0 success 0)
  match c0a87aed/ at 16 (success 0 )
  match 0001/00ff at 8 (success 0 )

action order 1:  skbedit mark 17
 index 6 ref 1 bind 1
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

action order 2: ife encode action pipe
 index 3 ref 1 bind 1
 dst MAC: 02:15:15:15:15:15 type: 0xDEAD
 Metadata: allow mark
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
xxx:

test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim 
---
 include/net/tc_act/tc_ife.h|  60 +++
 include/uapi/linux/tc_act/tc_ife.h |  38 ++
 net/sched/Kconfig  |  12 +
 net/sched/Makefile |   1 +
 net/sched/act_ife.c| 865 +
 5 files changed, 976 insertions(+)
 create mode 100644 include/net/tc_act/tc_ife.h
 create mode 100644 include/uapi/linux/tc_act/tc_ife.h
 create mode 100644 net/sched/act_ife.c

diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
new file mode 100644
index 000..fbbc23c
--- /dev/null
+++ b/include/net/tc_act/tc_ife.h
@@ -0,0 +1,60 @@
+#ifndef __NET_TC_IFE_H
+#define __NET_TC_IFE_H
+
+#include 
+#include 
+#include 
+#include 
+
+#define IFE_METAHDRLEN 2
+struct tcf_ife_info {
+   struct tcf_common common;
+   u8 eth_dst[ETH_ALEN];
+   u8 eth_src[ETH_ALEN];
+   u16 eth_type;
+   u16 flags;
+   /* list of metaids allowed */
+   struct list_head metalist;
+};
+#define to_ife(a) \
+   container_of(a->priv, struct tcf_ife_info, common)
+
+struct tcf_meta_info {
+   const struct tcf_meta_ops *ops;
+   void *metaval;
+   u16 metaid;
+   struct list_head