HI Jesper, Ivan,

> On Wed,  3 Jul 2019 12:37:50 +0200
> Jose Abreu <jose.ab...@synopsys.com> wrote:
> 
> > @@ -3547,6 +3456,9 @@ static int stmmac_rx(struct stmmac_priv *priv, int 
> > limit, u32 queue)
> >  
> >                     napi_gro_receive(&ch->rx_napi, skb);
> >  
> > +                   page_pool_recycle_direct(rx_q->page_pool, buf->page);
> 
> This doesn't look correct.
> 
> The page_pool DMA mapping cannot be "kept" when page traveling into the
> network stack attached to an SKB.  (Ilias and I have a long term plan[1]
> to allow this, but you cannot do it ATM).
> 
> You will have to call:
>   page_pool_release_page(rx_q->page_pool, buf->page);
> 
> This will do a DMA-unmap, and you will likely loose your performance
> gain :-(
> 
> 
> > +                   buf->page = NULL;
> > +
> >                     priv->dev->stats.rx_packets++;
> >                     priv->dev->stats.rx_bytes += frame_len;
> >             }
> 
> Also remember that the page_pool requires you driver to do the DMA-sync
> operation.  I see a dma_sync_single_for_cpu(), but I didn't see a
> dma_sync_single_for_device() (well, I noticed one getting removed).
> (For some HW Ilias tells me that the dma_sync_single_for_device can be
> elided, so maybe this can still be correct for you).
On our case (and in the page_pool API in general) you have to track buffers when
both .ndo_xdp_xmit() and XDP_TX are used.
So the lifetime of a packet might be 

1. page pool allocs packet. The API doesn't sync but i *think* you don't have to
explicitly since the CPU won't touch that buffer until the NAPI handler kicks
in. On the napi handler you need to dma_sync_single_for_cpu() and process the
packet.
2a) no XDP is required so the packet is unmapped and free'd
2b) .ndo_xdp_xmit is called so tyhe buffer need to be mapped/unmapped
2c) XDP_TX is called. In that case we re-use an Rx buffer so we need to
dma_sync_single_for_device()
2a and 2b won't cause any issues
In 2c the buffer will be recycled and fed back to the device with a *correct*
sync (for_device) and all those buffers are allocated as DMA_BIDIRECTIONAL.

So bvottom line i *think* we can skip the dma_sync_single_for_device() on the
initial allocation *only*. If am terribly wrong please let me know :)

Thanks
/Ilias
> 
> 
> [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool02_SKB_return_callback.org
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

Reply via email to