Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-02 Thread Thomas Graf
On 2 November 2016 at 07:39, Roopa Prabhu  wrote:
>> diff --git a/net/core/Makefile b/net/core/Makefile
>> index d6508c2..a675fd3 100644
>> --- a/net/core/Makefile
>> +++ b/net/core/Makefile
>> @@ -23,7 +23,7 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
>>  obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
>>  obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
>>  obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
>> -obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
>> +obj-$(CONFIG_LWTUNNEL) += lwtunnel.o lwt_bpf.o
>
> Any reason you want to keep lwt bpf under the main CONFIG_LWTUNNEL infra 
> config ?.
> since it is defined as yet another plug-gable encap function, seems like it 
> will be better under a separate
> CONFIG_LWTUNNEL_BPF or CONFIG_LWT_BPF that depends on CONFIG_LWTUNNEL

The code was so minimal with no additional dependencies that I didn't
see a need for a separate Kconfig. I'm fine adding that in the next
iteration though. No objections.


Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-02 Thread Roopa Prabhu
On 10/31/16, 5:37 PM, Thomas Graf wrote:
> Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and
> BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a
> LWT redirection of type LWTUNNEL_ENCAP_BPF.
>
> The separate program types are required because manipulation of
> packet data is only allowed on the output and transmit path as
> the subsequent dst_input() call path assumes an IP header
> validated by ip_rcv(). The BPF programs will be handed an skb
> with the L3 header attached and may return one of the following
> return codes:
>
>  BPF_OK - Continue routing as per nexthop
>  BPF_DROP - Drop skb and return EPERM
>  BPF_REDIRECT - Redirect skb to device as per redirect() helper.
> (Only valid on lwtunnel_xmit() hook)
>
> The return codes are binary compatible with their TC_ACT_
> relatives to ease compatibility.
>
> A new helper bpf_skb_push() is added which allows to preprend an
> L2 header in front of the skb, extend the existing L3 header, or
> both. This allows to address a wide range of issues:
>  - Optimize L2 header construction when L2 information is always
>static to avoid ARP/NDisc lookup.
>  - Extend IP header to add additional IP options.
>  - Perform simple encapsulation where offload is of no concern.
>(The existing funtionality to attach a tunnel key to the skb
> and redirect to a tunnel net_device to allow for offload
> continues to work obviously).
>
> Signed-off-by: Thomas Graf 
> ---
>  
[snip]
> diff --git a/net/Kconfig b/net/Kconfig
> index 7b6cd34..7554f12 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -396,6 +396,7 @@ source "net/nfc/Kconfig"
>  
>  config LWTUNNEL
>   bool "Network light weight tunnels"
> + depends on IPV6 || IPV6=n
>   ---help---
> This feature provides an infrastructure to support light weight
> tunnels like mpls. There is no netdevice associated with a light
> diff --git a/net/core/Makefile b/net/core/Makefile
> index d6508c2..a675fd3 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -23,7 +23,7 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
>  obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
>  obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
>  obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
> -obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
> +obj-$(CONFIG_LWTUNNEL) += lwtunnel.o lwt_bpf.o

Any reason you want to keep lwt bpf under the main CONFIG_LWTUNNEL infra config 
?.
since it is defined as yet another plug-gable encap function, seems like it 
will be better under a separate
CONFIG_LWTUNNEL_BPF or CONFIG_LWT_BPF that depends on CONFIG_LWTUNNEL




Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-01 Thread Thomas Graf
On 1 November 2016 at 13:11, David Ahern  wrote:
> On 10/31/16 6:37 PM, Thomas Graf wrote:
>>  - Perform simple encapsulation where offload is of no concern.
>>(The existing funtionality to attach a tunnel key to the skb
>> and redirect to a tunnel net_device to allow for offload
>> continues to work obviously).
>
> have you tested the adding of headers with large packets hitting gso and 
> fragmentation? Wondering if mtu is used properly when headers are added and 
> the lwt->headroom is not accounted. See 
> 14972cbd34ff668c390cbd2e6497323484c9e812

Thanks for the pointer David, I will look into this.

The packet size is currently limited to this when adding l2 headers:
skb->dev->mtu + skb->dev->hard_header_len


Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-11-01 Thread David Ahern
On 10/31/16 6:37 PM, Thomas Graf wrote:
> Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and
> BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a
> LWT redirection of type LWTUNNEL_ENCAP_BPF.
> 
> The separate program types are required because manipulation of
> packet data is only allowed on the output and transmit path as
> the subsequent dst_input() call path assumes an IP header
> validated by ip_rcv(). The BPF programs will be handed an skb
> with the L3 header attached and may return one of the following
> return codes:
> 
>  BPF_OK - Continue routing as per nexthop
>  BPF_DROP - Drop skb and return EPERM
>  BPF_REDIRECT - Redirect skb to device as per redirect() helper.
> (Only valid on lwtunnel_xmit() hook)
> 
> The return codes are binary compatible with their TC_ACT_
> relatives to ease compatibility.
> 
> A new helper bpf_skb_push() is added which allows to preprend an
> L2 header in front of the skb, extend the existing L3 header, or
> both. This allows to address a wide range of issues:
>  - Optimize L2 header construction when L2 information is always
>static to avoid ARP/NDisc lookup.
>  - Extend IP header to add additional IP options.
>  - Perform simple encapsulation where offload is of no concern.
>(The existing funtionality to attach a tunnel key to the skb
> and redirect to a tunnel net_device to allow for offload
> continues to work obviously).

have you tested the adding of headers with large packets hitting gso and 
fragmentation? Wondering if mtu is used properly when headers are added and the 
lwt->headroom is not accounted. See 14972cbd34ff668c390cbd2e6497323484c9e812


Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-10-31 Thread kbuild test robot
Hi Thomas,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Thomas-Graf/bpf-BPF-for-lightweight-tunnel-encapsulation/20161101-084038
config: x86_64-randconfig-s0-11010954 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net/core/lwt_bpf.c: In function 'bpf_lwt_lookup6':
>> net/core/lwt_bpf.c:132: warning: initialized field with side-effects 
>> overwritten
   net/core/lwt_bpf.c:132: warning: (near initialization for 'fl6')

vim +132 net/core/lwt_bpf.c

   116  }
   117  
   118  return dst->lwtstate->orig_input(skb);
   119  }
   120  
   121  #if IS_ENABLED(CONFIG_IPV6)
   122  static struct dst_entry *bpf_lwt_lookup6(struct net *net, struct 
sk_buff *skb,
   123   struct bpf_lwt *bpf)
   124  {
   125  struct ipv6hdr *ip6h = ipv6_hdr(skb);
   126  struct dst_entry *dst;
   127  struct flowi6 fl6 = {
   128  .daddr = ip6h->daddr,
   129  .saddr = ip6h->saddr,
   130  .flowlabel = ip6_flowinfo(ip6h),
   131  .flowi6_mark = skb->mark,
 > 132  .flowi6_proto = ip6h->nexthdr,
   133  .flowi6_oif = skb->sk ? skb->sk->sk_bound_dev_if : 0,
   134  };
   135  
   136  dst = ip6_route_output(net, skb->sk, &fl6);
   137  if (unlikely(dst->error)) {
   138  int err = dst->error;
   139  dst_release(dst);
   140  return ERR_PTR(err);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-10-31 Thread kbuild test robot
Hi Thomas,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Thomas-Graf/bpf-BPF-for-lightweight-tunnel-encapsulation/20161101-084038
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `bpf_output':
>> lwt_bpf.c:(.text+0x2cff4): undefined reference to `ip6_route_output_flags'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH net-next v2 3/5] bpf: BPF for lightweight tunnel encapsulation

2016-10-31 Thread Thomas Graf
Register two new BPF prog types BPF_PROG_TYPE_LWT_IN and
BPF_PROG_TYPE_LWT_OUT which are invoked if a route contains a
LWT redirection of type LWTUNNEL_ENCAP_BPF.

The separate program types are required because manipulation of
packet data is only allowed on the output and transmit path as
the subsequent dst_input() call path assumes an IP header
validated by ip_rcv(). The BPF programs will be handed an skb
with the L3 header attached and may return one of the following
return codes:

 BPF_OK - Continue routing as per nexthop
 BPF_DROP - Drop skb and return EPERM
 BPF_REDIRECT - Redirect skb to device as per redirect() helper.
(Only valid on lwtunnel_xmit() hook)

The return codes are binary compatible with their TC_ACT_
relatives to ease compatibility.

A new helper bpf_skb_push() is added which allows to preprend an
L2 header in front of the skb, extend the existing L3 header, or
both. This allows to address a wide range of issues:
 - Optimize L2 header construction when L2 information is always
   static to avoid ARP/NDisc lookup.
 - Extend IP header to add additional IP options.
 - Perform simple encapsulation where offload is of no concern.
   (The existing funtionality to attach a tunnel key to the skb
and redirect to a tunnel net_device to allow for offload
continues to work obviously).

Signed-off-by: Thomas Graf 
---
 include/linux/filter.h|   2 +-
 include/uapi/linux/bpf.h  |  37 +++-
 include/uapi/linux/lwtunnel.h |  21 ++
 kernel/bpf/verifier.c |  16 +-
 net/Kconfig   |   1 +
 net/core/Makefile |   2 +-
 net/core/filter.c | 148 -
 net/core/lwt_bpf.c| 504 ++
 net/core/lwtunnel.c   |   1 +
 9 files changed, 725 insertions(+), 7 deletions(-)
 create mode 100644 net/core/lwt_bpf.c

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..aad7f81 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -438,7 +438,7 @@ struct xdp_buff {
 };
 
 /* compute the linear packet data range [data, data_end) which
- * will be accessed by cls_bpf and act_bpf programs
+ * will be accessed by cls_bpf, act_bpf and lwt programs
  */
 static inline void bpf_compute_data_end(struct sk_buff *skb)
 {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e2f38e0..c034a2d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -96,6 +96,9 @@ enum bpf_prog_type {
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
BPF_PROG_TYPE_PERF_EVENT,
+   BPF_PROG_TYPE_LWT_IN,
+   BPF_PROG_TYPE_LWT_OUT,
+   BPF_PROG_TYPE_LWT_XMIT,
 };
 
 #define BPF_PSEUDO_MAP_FD  1
@@ -383,6 +386,16 @@ union bpf_attr {
  *
  * int bpf_get_numa_node_id()
  * Return: Id of current NUMA node.
+ *
+ * int bpf_skb_push()
+ * Add room to beginning of skb and adjusts MAC header offset accordingly.
+ * Extends/reallocaes for needed skb headeroom automatically.
+ * May change skb data pointer and will thus invalidate any check done
+ * for direct packet access.
+ * @skb: pointer to skb
+ * @len: length of header to be pushed in front
+ * @flags: Flags (unused for now)
+ * Return: 0 on success or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -427,7 +440,8 @@ union bpf_attr {
FN(skb_pull_data),  \
FN(csum_update),\
FN(set_hash_invalid),   \
-   FN(get_numa_node_id),
+   FN(get_numa_node_id),   \
+   FN(skb_push),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -511,6 +525,27 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+/* Generic BPF return codes which all BPF program types may support.
+ * The values are binary compatible with their TC_ACT_* counter-part to
+ * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
+ * programs.
+ *
+ * XDP is handled seprately, see XDP_*.
+ */
+enum bpf_ret_code {
+   BPF_OK = 0,
+   /* 1 reserved */
+   BPF_DROP = 2,
+   /* 3-6 reserved */
+   BPF_REDIRECT = 7,
+   /* >127 are reserved for prog type specific return codes */
+};
+
+/* LWT specific return codes */
+enum bpf_lwt_ret_code {
+   BPF_LWT_REROUTE = 128,
+};
+
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
  * return codes are reserved for future use. Unknown return codes will result
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index a478fe8..9354d997 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_IP,
LWTUNNEL_ENCAP_ILA,
LWTUNNEL_ENCAP_IP6,
+   LWTUNNEL_ENCAP_BPF,
__LWTUNNEL_ENCAP_MAX,