On Mon, 23 Jul 2018 11:39:36 +0200, Björn Töpel wrote:
> Den fre 20 juli 2018 kl 22:08 skrev Jakub Kicinski:
> > On Fri, 20 Jul 2018 10:18:21 -0700, Martin KaFai Lau wrote:  
> > > On Sat, Jul 21, 2018 at 01:04:45AM +0900, Taehee Yoo wrote:  
> > > > rhashtable_lookup() can return NULL. so that NULL pointer
> > > > check routine should be added.
> > > >
> > > > Fixes: 02b55e5657c3 ("xdp: add MEM_TYPE_ZERO_COPY")
> > > > Signed-off-by: Taehee Yoo <ap420...@gmail.com>
> > > > ---
> > > >  net/core/xdp.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > > index 9d1f220..1c12bc7 100644
> > > > --- a/net/core/xdp.c
> > > > +++ b/net/core/xdp.c
> > > > @@ -345,7 +345,8 @@ static void __xdp_return(void *data, struct 
> > > > xdp_mem_info *mem, bool napi_direct,
> > > >             rcu_read_lock();
> > > >             /* mem->id is valid, checked in 
> > > > xdp_rxq_info_reg_mem_model() */
> > > >             xa = rhashtable_lookup(mem_id_ht, &mem->id, 
> > > > mem_id_rht_params);
> > > > -           xa->zc_alloc->free(xa->zc_alloc, handle);
> > > > +           if (xa)
> > > > +                   xa->zc_alloc->free(xa->zc_alloc, handle);  
> > > hmm...It is not clear to me the "!xa" case don't have to be handled?  
> >
> > Actually I have a more fundamental question about this interface I've
> > been meaning to ask.
> >
> > IIUC free() can happen on any CPU at any time, when whatever device,
> > socket or CPU this got redirected to completed the TX.  IOW there may
> > be multiple producers.  Drivers would need to create spin lock a'la the
> > a9744f7ca200 ("xsk: fix potential race in SKB TX completion code") fix?
> >  
> 
> Jakub, apologies for the slow response. I'm still in
> "holiday/hammock&beer mode", but will be back in a week. :-P

Ah, sorry to interrupt! :)

> The idea with the xdp_return_* functions are that an xdp_buff and
> xdp_frame can have custom allocations schemes. The difference beween
> struct xdp_buff and struct xdp_frame is lifetime. The xdp_buff
> lifetime is within the napi context, whereas xdp_frame can have a
> lifetime longer/outside the napi context. E.g. for a XDP_REDIRECT
> scenario an xdp_buff is converted to a xdp_frame. The conversion is
> done in include/net/xdp.h:convert_to_xdp_frame.
> 
> Currently, the zero-copy MEM_TYPE_ZERO_COPY memtype can *only* be used
> for xdp_buff, meaning that the lifetime is constrained to a napi
> context. Further, given an xdp_buff with memtype MEM_TYPE_ZERO_COPY,
> doing XDP_REDIRECT to a target that is *not* an AF_XDP socket would
> mean converting the xdp_buff to an xdp_frame. The xdp_frame can then
> be free'd on any CPU.
> 
> Note that the xsk_rcv* functions is always called from an napi
> context, and therefore is using the xdp_return_buff calls.
> 
> To answer your question -- no, this fix is *not* needed, because the
> xdp_buff napi constrained, and the xdp_buff will only be free'd on one
> CPU.

Oh, thanks, I missed the check in convert_to_xdp_frame(), so the only
frames which can come back via the free path are out of the error path
in __xsk_rcv_zc()?

That path looks a little surprising too, isn't the expectation that if
xdp_do_redirect() returns an error the driver retains the ownership of
the buffer? 
 
static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
{
        int err = xskq_produce_batch_desc(xs->rx, (u64)xdp->handle, len);

        if (err) {
                xdp_return_buff(xdp);
                xs->rx_dropped++;
        }

        return err;
}

This seems to call xdp_return_buff() *and* return an error.

> > We need some form of internal kernel circulation which would be MPSC.
> > I'm currently hacking up the XSK code to tell me whether the frame was
> > consumed by the correct XSK, and always clone the frame otherwise
> > (claiming to be the "traditional" MEM_TYPE_PAGE_ORDER0).
> >
> > I feel like I'm missing something about the code.  Is redirect of
> > ZC/UMEM frame outside the xsk not possible and the only returns we will
> > see are from net/xdp/xsk.c?  That would work, but I don't see such a
> > check.  Help would be appreciated.
> >  
> 
> Right now, this is the case (refer to the TODO in
> convert_to_xdp_frame), i.e. you cannot redirect an ZC/UMEM allocated
> xdp_buff to a target that is not an xsk. This must, obviously, change
> so that an xdp_buff (of MEM_TYPE_ZERO_COPY) can be converted to an
> xdp_frame. The xdp_frame must be able to be free'd from multiple CPUs,
> so here the a more sophisticated allocation scheme is required.
> 
> > Also the fact that XSK bufs can't be freed, only completed, adds to the
> > pain of implementing AF_XDP, we'd certainly need some form of "give
> > back the frame, but I may need it later" SPSC mechanism, otherwise
> > driver writers will have tough time.  Unless, again, I'm missing
> > something about the code :)
> >  
> 
> Yup, moving the recycling scheme from driver to "generic" is a good
> idea! I need to finish up those i40e zerocopy patches first though...

Interesting, FWIW I wasn't necessarily thinking about full recycling,
although that would be the holy grail.  Just a generic way of giving up
buffers for example when user changes ring sizes or brings the device
down.

> (...and I'm very excited that you're doing nfp support for AF_XDP!!!)

Thanks, I'm still way out in the weeds but it's interesting work :)

Reply via email to