Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-08-01 Thread Jesper Dangaard Brouer
On Wed, 1 Aug 2018 14:41:08 +0900
Toshiaki Makita  wrote:

> On 2018/07/31 21:46, Jesper Dangaard Brouer wrote:
> > On Tue, 31 Jul 2018 19:40:08 +0900
> > Toshiaki Makita  wrote:
> >   
> >> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote:  
> >>>
> >>> Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP
> >>>
> >>> On Mon, 30 Jul 2018 19:43:44 +0900
> >>> Toshiaki Makita  wrote:
> >>> 
[...]
> >>>
> >>> Here you are adding an assumption that struct xdp_frame is always
> >>> located in-the-top of the packet-data area.  I tried hard not to add
> >>> such a dependency!  You can calculate the beginning of the frame from
> >>> the xdp_frame->data pointer.
> >>>
> >>> Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
> >>> make such an assumption.  
> >>>
> >>> Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
> >>> the packet will get dropped when calling convert_to_xdp_frame(), but as
> >>> the TODO comment indicated in convert_to_xdp_frame() this is not the
> >>> end-goal. 
> >>>
> >>> The comment in convert_to_xdp_frame(), indicate we need a full
> >>> alloc+copy, but that is actually not necessary, if we can just use
> >>> another memory area for struct xdp_frame, and a pointer to data.  Thus,
> >>> allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
> >>> on the remote CPU.
> >>
> >> Thanks for pointing this out.
> >> Seems you are saying xdp_frame area is not reusable. That means we
> >> reduce usable headroom on every REDIRECT. I wanted to avoid this but
> >> actually it is impossible, right?  
> > 
> > I'm not sure I understand fully...  has this something to do, with the
> > below memset?  
> 
> Sorry for not being so clear...
> It has something to do with the memset as well but mainly I was talking
> about XDP_TX and REDIRECT introduced in patch 8. On REDIRECT,
> dev_map_enqueue() calls convert_to_xdp_frame() so we use the headroom
> for struct xdp_frame on REDIRECT. If we don't reuse xdp_frame region of
> the original xdp packet, we reduce the headroom size each time on
> REDIRECT. When ZC is used, in the future xdp_frame can be non-contiguous
> to the buffer, so we cannot reuse the xdp_frame region in
> convert_to_xdp_frame()? But current convert_to_xdp_frame()
> implementation requires xdp_frame region in headroom so I think I cannot
> avoid this dependency now.
> 
> SKB has a similar problem if we cannot reuse it. It can be passed to a
> bridge and redirected to another veth which has driver XDP. In that case
> we need to reallocate the page if we have reduced the headroom because
> sufficient headroom is required for XDP processing for now (can we
> remove this requirement actually?).

Okay, now I understand.  Your changes allow multiple levels of
XDP_REDIRECT between/into other veth net_devices.  This is very
interesting and exciting stuff, but also a bit scary, when thinking
about if we got he life-time correct for the different memory objects.

You have convinced me.  We should not sacrifice/reduce the headroom
this way.  I'll also fix up cpumap.

To avoid the performance penalty of the memset, I propose that we just
clear the xdp_frame->data pointer.  But lets implement it via a common
sanitize/scrub function.


> > When cpumap generate an SKB for the netstack, then we sacrifice/reduce
> > the SKB headroom available, by in convert_to_xdp_frame() reducing the
> > headroom by xdp_frame size.
> > 
> >  xdp_frame->headroom = headroom - sizeof(*xdp_frame)
> > 
> > In-order to avoid doing such memset of this area.  We are actually only
> > worried about exposing the 'data' pointer, thus we could just clear
> > that.  (See commit 6dfb970d3dbd, this is because Alexei is planing to
> > move from CAP_SYS_ADMIN to lesser privileged mode CAP_NET_ADMIN)
> > 
> > See commits:
> >  97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom")
> >  6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page 
> > reuse")  
> 
> We have talked about that...
> https://patchwork.ozlabs.org/patch/903536/
> 
> The memset is introduced as per your feedback, but I'm still not sure if
> we need this. In general the headroom is not cleared after allocation in
> drivers, so anyway unprivileged users should not see it no matter if it
> contains xdp_frame or not...

I actually got this request from Alexei. That is why I implemented it.
Personally I don't think this clearing is really needed, until someone
actually makes the TC/cls_act BPF hook CAP_NET_ADMIN.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-07-31 Thread Toshiaki Makita
On 2018/07/31 21:46, Jesper Dangaard Brouer wrote:
> On Tue, 31 Jul 2018 19:40:08 +0900
> Toshiaki Makita  wrote:
> 
>> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote:
>>>
>>> Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP
>>>
>>> On Mon, 30 Jul 2018 19:43:44 +0900
>>> Toshiaki Makita  wrote:
>>>   
 +static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
 +int buflen)
 +{
 +  struct sk_buff *skb;
 +
 +  if (!buflen) {
 +  buflen = SKB_DATA_ALIGN(headroom + len) +
 +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 +  }
 +  skb = build_skb(head, buflen);
 +  if (!skb)
 +  return NULL;
 +
 +  skb_reserve(skb, headroom);
 +  skb_put(skb, len);
 +
 +  return skb;
 +}  
>>>
>>>
>>> On Mon, 30 Jul 2018 19:43:46 +0900
>>> Toshiaki Makita  wrote:
>>>   
 +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
 +  struct xdp_frame *frame)
 +{
 +  int len = frame->len, delta = 0;
 +  struct bpf_prog *xdp_prog;
 +  unsigned int headroom;
 +  struct sk_buff *skb;
 +
 +  rcu_read_lock();
 +  xdp_prog = rcu_dereference(priv->xdp_prog);
 +  if (likely(xdp_prog)) {
 +  struct xdp_buff xdp;
 +  u32 act;
 +
 +  xdp.data_hard_start = frame->data - frame->headroom;
 +  xdp.data = frame->data;
 +  xdp.data_end = frame->data + frame->len;
 +  xdp.data_meta = frame->data - frame->metasize;
 +  xdp.rxq = &priv->xdp_rxq;
 +
 +  act = bpf_prog_run_xdp(xdp_prog, &xdp);
 +
 +  switch (act) {
 +  case XDP_PASS:
 +  delta = frame->data - xdp.data;
 +  len = xdp.data_end - xdp.data;
 +  break;
 +  default:
 +  bpf_warn_invalid_xdp_action(act);
 +  case XDP_ABORTED:
 +  trace_xdp_exception(priv->dev, xdp_prog, act);
 +  case XDP_DROP:
 +  goto err_xdp;
 +  }
 +  }
 +  rcu_read_unlock();
 +
 +  headroom = frame->data - delta - (void *)frame;
 +  skb = veth_build_skb(frame, headroom, len, 0);  
>>>
>>> Here you are adding an assumption that struct xdp_frame is always
>>> located in-the-top of the packet-data area.  I tried hard not to add
>>> such a dependency!  You can calculate the beginning of the frame from
>>> the xdp_frame->data pointer.
>>>
>>> Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
>>> make such an assumption.  
>>>
>>> Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
>>> the packet will get dropped when calling convert_to_xdp_frame(), but as
>>> the TODO comment indicated in convert_to_xdp_frame() this is not the
>>> end-goal. 
>>>
>>> The comment in convert_to_xdp_frame(), indicate we need a full
>>> alloc+copy, but that is actually not necessary, if we can just use
>>> another memory area for struct xdp_frame, and a pointer to data.  Thus,
>>> allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
>>> on the remote CPU.  
>>
>> Thanks for pointing this out.
>> Seems you are saying xdp_frame area is not reusable. That means we
>> reduce usable headroom on every REDIRECT. I wanted to avoid this but
>> actually it is impossible, right?
> 
> I'm not sure I understand fully...  has this something to do, with the
> below memset?

Sorry for not being so clear...
It has something to do with the memset as well but mainly I was talking
about XDP_TX and REDIRECT introduced in patch 8. On REDIRECT,
dev_map_enqueue() calls convert_to_xdp_frame() so we use the headroom
for struct xdp_frame on REDIRECT. If we don't reuse xdp_frame region of
the original xdp packet, we reduce the headroom size each time on
REDIRECT. When ZC is used, in the future xdp_frame can be non-contiguous
to the buffer, so we cannot reuse the xdp_frame region in
convert_to_xdp_frame()? But current convert_to_xdp_frame()
implementation requires xdp_frame region in headroom so I think I cannot
avoid this dependency now.

SKB has a similar problem if we cannot reuse it. It can be passed to a
bridge and redirected to another veth which has driver XDP. In that case
we need to reallocate the page if we have reduced the headroom because
sufficient headroom is required for XDP processing for now (can we
remove this requirement actually?).
Instead I think I need to drop the packet (or reallocate the buffer and
copy data) for ZC according to this patch.
https://patchwork.codeaurora.org/patch/540887/

> When cpumap generate an SKB for the netstack, then we sacrifice/reduce
> the SKB headroom available, by in convert_to_xdp_frame() reducing the
> headroom by xdp_frame size.
> 
>  xdp_frame->headroom = headroom - sizeof(*xdp_frame)
> 
> In-ord

Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-07-31 Thread Jesper Dangaard Brouer
On Tue, 31 Jul 2018 19:40:08 +0900
Toshiaki Makita  wrote:

> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote:
> > 
> > Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP
> > 
> > On Mon, 30 Jul 2018 19:43:44 +0900
> > Toshiaki Makita  wrote:
> >   
> >> +static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
> >> +int buflen)
> >> +{
> >> +  struct sk_buff *skb;
> >> +
> >> +  if (!buflen) {
> >> +  buflen = SKB_DATA_ALIGN(headroom + len) +
> >> +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> +  }
> >> +  skb = build_skb(head, buflen);
> >> +  if (!skb)
> >> +  return NULL;
> >> +
> >> +  skb_reserve(skb, headroom);
> >> +  skb_put(skb, len);
> >> +
> >> +  return skb;
> >> +}  
> > 
> > 
> > On Mon, 30 Jul 2018 19:43:46 +0900
> > Toshiaki Makita  wrote:
> >   
> >> +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
> >> +  struct xdp_frame *frame)
> >> +{
> >> +  int len = frame->len, delta = 0;
> >> +  struct bpf_prog *xdp_prog;
> >> +  unsigned int headroom;
> >> +  struct sk_buff *skb;
> >> +
> >> +  rcu_read_lock();
> >> +  xdp_prog = rcu_dereference(priv->xdp_prog);
> >> +  if (likely(xdp_prog)) {
> >> +  struct xdp_buff xdp;
> >> +  u32 act;
> >> +
> >> +  xdp.data_hard_start = frame->data - frame->headroom;
> >> +  xdp.data = frame->data;
> >> +  xdp.data_end = frame->data + frame->len;
> >> +  xdp.data_meta = frame->data - frame->metasize;
> >> +  xdp.rxq = &priv->xdp_rxq;
> >> +
> >> +  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >> +
> >> +  switch (act) {
> >> +  case XDP_PASS:
> >> +  delta = frame->data - xdp.data;
> >> +  len = xdp.data_end - xdp.data;
> >> +  break;
> >> +  default:
> >> +  bpf_warn_invalid_xdp_action(act);
> >> +  case XDP_ABORTED:
> >> +  trace_xdp_exception(priv->dev, xdp_prog, act);
> >> +  case XDP_DROP:
> >> +  goto err_xdp;
> >> +  }
> >> +  }
> >> +  rcu_read_unlock();
> >> +
> >> +  headroom = frame->data - delta - (void *)frame;
> >> +  skb = veth_build_skb(frame, headroom, len, 0);  
> > 
> > Here you are adding an assumption that struct xdp_frame is always
> > located in-the-top of the packet-data area.  I tried hard not to add
> > such a dependency!  You can calculate the beginning of the frame from
> > the xdp_frame->data pointer.
> > 
> > Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
> > make such an assumption.  
> > 
> > Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
> > the packet will get dropped when calling convert_to_xdp_frame(), but as
> > the TODO comment indicated in convert_to_xdp_frame() this is not the
> > end-goal. 
> > 
> > The comment in convert_to_xdp_frame(), indicate we need a full
> > alloc+copy, but that is actually not necessary, if we can just use
> > another memory area for struct xdp_frame, and a pointer to data.  Thus,
> > allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
> > on the remote CPU.  
> 
> Thanks for pointing this out.
> Seems you are saying xdp_frame area is not reusable. That means we
> reduce usable headroom on every REDIRECT. I wanted to avoid this but
> actually it is impossible, right?

I'm not sure I understand fully...  has this something to do, with the
below memset?

When cpumap generate an SKB for the netstack, then we sacrifice/reduce
the SKB headroom available, by in convert_to_xdp_frame() reducing the
headroom by xdp_frame size.

 xdp_frame->headroom = headroom - sizeof(*xdp_frame)

In-order to avoid doing such memset of this area.  We are actually only
worried about exposing the 'data' pointer, thus we could just clear
that.  (See commit 6dfb970d3dbd, this is because Alexei is planing to
move from CAP_SYS_ADMIN to lesser privileged mode CAP_NET_ADMIN)

See commits:
 97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom")
 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")


> >> +  if (!skb) {
> >> +  xdp_return_frame(frame);
> >> +  goto err;
> >> +  }
> >> +
> >> +  memset(frame, 0, sizeof(*frame));

This memset can become a performance issue later, if we change the size
of xdp_frame. (e.g I'm considering to extend this with the DMA addr,
but I'm not sure about that scheme yet).

Currently sizeof(xdp_frame) == 32 bytes, and a memset of 32 bytes is
fast, due to compiler reasons.  Above 32 bytes are more expensive,
because the compiler translates this into a "rep stos" operation, which
is slower, as it needs to save some registers (to allow it to be
interrupted). See [1] for experiments.

[1] 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c

> >> +  skb->protocol = eth_type_trans(skb, priv->dev);

Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-07-31 Thread Toshiaki Makita
On 2018/07/31 19:26, Jesper Dangaard Brouer wrote:
> 
> Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP
> 
> On Mon, 30 Jul 2018 19:43:44 +0900
> Toshiaki Makita  wrote:
> 
>> +static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
>> +  int buflen)
>> +{
>> +struct sk_buff *skb;
>> +
>> +if (!buflen) {
>> +buflen = SKB_DATA_ALIGN(headroom + len) +
>> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +}
>> +skb = build_skb(head, buflen);
>> +if (!skb)
>> +return NULL;
>> +
>> +skb_reserve(skb, headroom);
>> +skb_put(skb, len);
>> +
>> +return skb;
>> +}
> 
> 
> On Mon, 30 Jul 2018 19:43:46 +0900
> Toshiaki Makita  wrote:
> 
>> +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
>> +struct xdp_frame *frame)
>> +{
>> +int len = frame->len, delta = 0;
>> +struct bpf_prog *xdp_prog;
>> +unsigned int headroom;
>> +struct sk_buff *skb;
>> +
>> +rcu_read_lock();
>> +xdp_prog = rcu_dereference(priv->xdp_prog);
>> +if (likely(xdp_prog)) {
>> +struct xdp_buff xdp;
>> +u32 act;
>> +
>> +xdp.data_hard_start = frame->data - frame->headroom;
>> +xdp.data = frame->data;
>> +xdp.data_end = frame->data + frame->len;
>> +xdp.data_meta = frame->data - frame->metasize;
>> +xdp.rxq = &priv->xdp_rxq;
>> +
>> +act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> +
>> +switch (act) {
>> +case XDP_PASS:
>> +delta = frame->data - xdp.data;
>> +len = xdp.data_end - xdp.data;
>> +break;
>> +default:
>> +bpf_warn_invalid_xdp_action(act);
>> +case XDP_ABORTED:
>> +trace_xdp_exception(priv->dev, xdp_prog, act);
>> +case XDP_DROP:
>> +goto err_xdp;
>> +}
>> +}
>> +rcu_read_unlock();
>> +
>> +headroom = frame->data - delta - (void *)frame;
>> +skb = veth_build_skb(frame, headroom, len, 0);
> 
> Here you are adding an assumption that struct xdp_frame is always
> located in-the-top of the packet-data area.  I tried hard not to add
> such a dependency!  You can calculate the beginning of the frame from
> the xdp_frame->data pointer.
> 
> Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
> make such an assumption.  
> 
> Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
> the packet will get dropped when calling convert_to_xdp_frame(), but as
> the TODO comment indicated in convert_to_xdp_frame() this is not the
> end-goal. 
> 
> The comment in convert_to_xdp_frame(), indicate we need a full
> alloc+copy, but that is actually not necessary, if we can just use
> another memory area for struct xdp_frame, and a pointer to data.  Thus,
> allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
> on the remote CPU.

Thanks for pointing this out.
Seems you are saying xdp_frame area is not reusable. That means we
reduce usable headroom on every REDIRECT. I wanted to avoid this but
actually it is impossible, right?

>> +if (!skb) {
>> +xdp_return_frame(frame);
>> +goto err;
>> +}
>> +
>> +memset(frame, 0, sizeof(*frame));
>> +skb->protocol = eth_type_trans(skb, priv->dev);
>> +err:
>> +return skb;
>> +err_xdp:
>> +rcu_read_unlock();
>> +xdp_return_frame(frame);
>> +
>> +return NULL;
>> +}
> 
> 

-- 
Toshiaki Makita



Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-07-31 Thread Jesper Dangaard Brouer


Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP

On Mon, 30 Jul 2018 19:43:44 +0900
Toshiaki Makita  wrote:

> +static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
> +   int buflen)
> +{
> + struct sk_buff *skb;
> +
> + if (!buflen) {
> + buflen = SKB_DATA_ALIGN(headroom + len) +
> +  SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> + }
> + skb = build_skb(head, buflen);
> + if (!skb)
> + return NULL;
> +
> + skb_reserve(skb, headroom);
> + skb_put(skb, len);
> +
> + return skb;
> +}


On Mon, 30 Jul 2018 19:43:46 +0900
Toshiaki Makita  wrote:

> +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
> + struct xdp_frame *frame)
> +{
> + int len = frame->len, delta = 0;
> + struct bpf_prog *xdp_prog;
> + unsigned int headroom;
> + struct sk_buff *skb;
> +
> + rcu_read_lock();
> + xdp_prog = rcu_dereference(priv->xdp_prog);
> + if (likely(xdp_prog)) {
> + struct xdp_buff xdp;
> + u32 act;
> +
> + xdp.data_hard_start = frame->data - frame->headroom;
> + xdp.data = frame->data;
> + xdp.data_end = frame->data + frame->len;
> + xdp.data_meta = frame->data - frame->metasize;
> + xdp.rxq = &priv->xdp_rxq;
> +
> + act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +
> + switch (act) {
> + case XDP_PASS:
> + delta = frame->data - xdp.data;
> + len = xdp.data_end - xdp.data;
> + break;
> + default:
> + bpf_warn_invalid_xdp_action(act);
> + case XDP_ABORTED:
> + trace_xdp_exception(priv->dev, xdp_prog, act);
> + case XDP_DROP:
> + goto err_xdp;
> + }
> + }
> + rcu_read_unlock();
> +
> + headroom = frame->data - delta - (void *)frame;
> + skb = veth_build_skb(frame, headroom, len, 0);

Here you are adding an assumption that struct xdp_frame is always
located in-the-top of the packet-data area.  I tried hard not to add
such a dependency!  You can calculate the beginning of the frame from
the xdp_frame->data pointer.

Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
make such an assumption.  

Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
the packet will get dropped when calling convert_to_xdp_frame(), but as
the TODO comment indicated in convert_to_xdp_frame() this is not the
end-goal. 

The comment in convert_to_xdp_frame(), indicate we need a full
alloc+copy, but that is actually not necessary, if we can just use
another memory area for struct xdp_frame, and a pointer to data.  Thus,
allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
on the remote CPU.


> + if (!skb) {
> + xdp_return_frame(frame);
> + goto err;
> + }
> +
> + memset(frame, 0, sizeof(*frame));
> + skb->protocol = eth_type_trans(skb, priv->dev);
> +err:
> + return skb;
> +err_xdp:
> + rcu_read_unlock();
> + xdp_return_frame(frame);
> +
> + return NULL;
> +}


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring

2018-07-30 Thread Toshiaki Makita
This is preparation for XDP TX and ndo_xdp_xmit.
This allows napi handler to handle xdp_frames through xdp ring as well
as sk_buff.

v3:
- Revert v2 change around rings and use a flag to differentiate skb and
  xdp_frame, since bulk skb xmit makes little performance difference
  for now.

v2:
- Use another ring instead of using flag to differentiate skb and
  xdp_frame. This approach makes bulk skb transmit possible in
  veth_xmit later.
- Clear xdp_frame feilds in skb->head.
- Implement adjust_tail.

Signed-off-by: Toshiaki Makita 
Acked-by: John Fastabend 
---
 drivers/net/veth.c | 87 ++
 1 file changed, 82 insertions(+), 5 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9edf104..9de0e90 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -22,12 +22,12 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define DRV_NAME   "veth"
 #define DRV_VERSION"1.0"
 
+#define VETH_XDP_FLAG  BIT(0)
 #define VETH_RING_SIZE 256
 #define VETH_XDP_HEADROOM  (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
 
@@ -115,6 +115,24 @@ static void veth_get_ethtool_stats(struct net_device *dev,
 
 /* general routines */
 
+static bool veth_is_xdp_frame(void *ptr)
+{
+   return (unsigned long)ptr & VETH_XDP_FLAG;
+}
+
+static void *veth_ptr_to_xdp(void *ptr)
+{
+   return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG);
+}
+
+static void veth_ptr_free(void *ptr)
+{
+   if (veth_is_xdp_frame(ptr))
+   xdp_return_frame(veth_ptr_to_xdp(ptr));
+   else
+   kfree_skb(ptr);
+}
+
 static void __veth_xdp_flush(struct veth_priv *priv)
 {
/* Write ptr_ring before reading rx_notify_masked */
@@ -249,6 +267,61 @@ static struct sk_buff *veth_build_skb(void *head, int 
headroom, int len,
return skb;
 }
 
+static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
+   struct xdp_frame *frame)
+{
+   int len = frame->len, delta = 0;
+   struct bpf_prog *xdp_prog;
+   unsigned int headroom;
+   struct sk_buff *skb;
+
+   rcu_read_lock();
+   xdp_prog = rcu_dereference(priv->xdp_prog);
+   if (likely(xdp_prog)) {
+   struct xdp_buff xdp;
+   u32 act;
+
+   xdp.data_hard_start = frame->data - frame->headroom;
+   xdp.data = frame->data;
+   xdp.data_end = frame->data + frame->len;
+   xdp.data_meta = frame->data - frame->metasize;
+   xdp.rxq = &priv->xdp_rxq;
+
+   act = bpf_prog_run_xdp(xdp_prog, &xdp);
+
+   switch (act) {
+   case XDP_PASS:
+   delta = frame->data - xdp.data;
+   len = xdp.data_end - xdp.data;
+   break;
+   default:
+   bpf_warn_invalid_xdp_action(act);
+   case XDP_ABORTED:
+   trace_xdp_exception(priv->dev, xdp_prog, act);
+   case XDP_DROP:
+   goto err_xdp;
+   }
+   }
+   rcu_read_unlock();
+
+   headroom = frame->data - delta - (void *)frame;
+   skb = veth_build_skb(frame, headroom, len, 0);
+   if (!skb) {
+   xdp_return_frame(frame);
+   goto err;
+   }
+
+   memset(frame, 0, sizeof(*frame));
+   skb->protocol = eth_type_trans(skb, priv->dev);
+err:
+   return skb;
+err_xdp:
+   rcu_read_unlock();
+   xdp_return_frame(frame);
+
+   return NULL;
+}
+
 static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv,
struct sk_buff *skb)
 {
@@ -359,12 +432,16 @@ static int veth_xdp_rcv(struct veth_priv *priv, int 
budget)
int i, done = 0;
 
for (i = 0; i < budget; i++) {
-   struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring);
+   void *ptr = __ptr_ring_consume(&priv->xdp_ring);
+   struct sk_buff *skb;
 
-   if (!skb)
+   if (!ptr)
break;
 
-   skb = veth_xdp_rcv_skb(priv, skb);
+   if (veth_is_xdp_frame(ptr))
+   skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr));
+   else
+   skb = veth_xdp_rcv_skb(priv, ptr);
 
if (skb)
napi_gro_receive(&priv->xdp_napi, skb);
@@ -417,7 +494,7 @@ static void veth_napi_del(struct net_device *dev)
napi_disable(&priv->xdp_napi);
netif_napi_del(&priv->xdp_napi);
priv->rx_notify_masked = false;
-   ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb);
+   ptr_ring_cleanup(&priv->xdp_ring, veth_ptr_free);
 }
 
 static int veth_enable_xdp(struct net_device *dev)
-- 
1.8.3.1