Re: [PATCH RFC 6/9] veth: Add ndo_xdp_xmit

Toshiaki Makita Tue, 01 May 2018 20:34:40 -0700

On 18/05/01 (火) 17:14, Jesper Dangaard Brouer wrote:

On Tue, 1 May 2018 10:02:01 +0900 Toshiaki Makita<[email protected]> wrote:
On 2018/05/01 2:27, Jesper Dangaard Brouer wrote:
On Thu, 26 Apr 2018 19:52:40 +0900 Toshiaki Makita<[email protected]> wrote:
On 2018/04/26 5:24, Jesper Dangaard Brouer wrote:
On Tue, 24 Apr 2018 23:39:20 +0900 Toshiaki Makita<[email protected]> wrote:
+static int veth_xdp_xmit(struct net_device *dev, structxdp_frame *frame) +{ + struct veth_priv *rcv_priv, *priv =netdev_priv(dev); + int headroom = frame->data - (void*)frame; + struct net_device *rcv; + int err = 0; + + rcv= rcu_dereference(priv->peer); + if (unlikely(!rcv)) +return -ENXIO; + + rcv_priv = netdev_priv(rcv); + /*xdp_ring is initialized on receive side? */ + if(rcu_access_pointer(rcv_priv->xdp_prog)) { + err =xdp_ok_fwd_dev(rcv, frame->len); + if (unlikely(err)) +return err; + + err = veth_xdp_enqueue(rcv_priv,veth_xdp_to_ptr(frame)); + } else { + struct sk_buff *skb;
+ +             skb = veth_build_skb(frame, headroom, frame->len, 0);
+               if (unlikely(!skb)) +                   return -ENOMEM; + +     
        /* Get page
ref in case skb is dropped in netif_rx. + * The caller is
responsible for freeing the page on error. +             */ +
get_page(virt_to_page(frame->data));
I'm not sure you can make this assumption, that xdp_framescoming from another device driver uses a refcnt based memorymodel. But maybe I'm confused, as this looks like an SKBreceive path, but in the ndo_xdp_xmit().
I find this path similar to cpumap, which creates skb fromredirected xdp frame. Once it is converted to skb, skb head isfreed by page_frag_free, so anyway I needed to get the
refcount here regardless of memory model.
Yes I know, I wrote cpumap ;-)
First of all, I don't want to see such xdp_frame to SKBconversion code in every driver. Because that increase thechances of errors. And when looking at the details, then itseems that you have made the mistake of making it possible toleak xdp_frame info to the SKB (which cpumap takes intoaccount).
Do you mean leaving xdp_frame in skb->head is leaking something?how?
Like commit 97e19cce05e5 ("bpf: reserve xdp_frame size in xdpheadroom") and commit 6dfb970d3dbd ("xdp: avoid leaking info storedin frame data on page reuse").


Thanks for sharing the info.

But this time, the concern is a bpf_prog attached at TC/bpf_clslevel, that can that can adjust head via bpf_skb_change_head (forXDP it is bpf_xdp_adjust_head) into the area used by xdp_frame. Asdescribed in https://git.kernel.org/davem/net-next/c/6dfb970d3dbd inis not super critical at the moment, as this _currently_ runs asCAP_SYS_ADMIN, but we would like to move towards CAP_NET_ADMIN.


What I don't get is why special casing xdp_frame info. My assumption is
that the area above skb->mac_header is uninit kernel memory and should
not be readable by unprivileged users anyway. So I didn't clear the area
at this point.

Second, I think the refcnt scheme here is wrong. The xdp_frameshould be "owned" by XDP and have the proper refcnt to deliver
it directly to the network stack.
Third, if we choose that we want a fallback, in-case XDP is notenabled on egress dev (but it have an ndo_xdp_xmit), then itshould be placed in the generic/core code. E.g.__bpf_tx_xdp_map() could look at the return code fromdev->netdev_ops->ndo_xdp() and create an SKB. (Hint, this wouldmake it easy to implement TX bulking towards the dev).
Right, this is a much cleaner way.
Although I feel like we should add this fallback for veth becauseit requires something which is different from other drivers(enabling XDP on the peer device of the egress device),
(This is why I Cc'ed Tariq...)

This is actually a general problem with the xdp "xmit" side (and not
specific to veth driver). The problem exists for other drivers aswell.
The problem is that a driver can implement ndo_xdp_xmit(), but thedriver might not have allocated the necessary XDP TX-queue resources
 yet (or it might not be possible due to system resource limits).
The current "hack" is to load an XDP prog on the egress device, andthen assume that this cause the driver to also allocate the XDPndo_xdo_xmit side HW resources. This is IMHO a wrong assumption.

We need a more generic way to test if a net_device is "ready/enabled"
for handling xdp_frames via ndo_xdp_xmit.  And Tariq had some ideas
on how to implement this...


My assumption on REDIRECT requirement came from this.
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=306da4e685b

I guess you are saying thing are changing, and having an XDP programattached on the egress device is no longer generally sufficient. Lookingforward to Tariq's solution.


Toshiaki Makita

My opinion is that it is a waste of (HW/mem) resources to alwaysallocate resources for ndo_xdp_xmit when loading an XDP program.Because what if my use-cases are XDP_DROP DDoS filter, or CPUMAPredirect load-balancer, then I don't want/need ndo_xdp_xmit. E.g.today using ixgbe on machines with more than 96 CPUs, will fail dueto limited TX queue resources. Thus, blocking the mentioneduse-cases.
I'll drop the part for now. It should not be resolved in the driver
code.
Thank you.

Re: [PATCH RFC 6/9] veth: Add ndo_xdp_xmit

Reply via email to