On Thu, Jan 19, 2017 at 9:25 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Thu, 2017-01-19 at 21:14 +0200, Saeed Mahameed wrote: >> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: >> > From: Eric Dumazet <eduma...@google.com> >> > >> > A driver using dev_alloc_page() must not reuse a page allocated from >> > emergency memory reserve. >> > >> > Otherwise all packets using this page will be immediately dropped, >> > unless for very specific sockets having SOCK_MEMALLOC bit set. >> > >> > This issue might be hard to debug, because only a fraction of received >> > packets would be dropped. >> >> Hi Eric, >> >> When you say reuse, you mean point to the same page from several SKBs ? >> >> Because in our page cache implementation we don't reuse pages that >> already passed to the stack, >> we just keep them in the page cache until the ref count drop back to >> one, so we recycle them (i,e they will be re-used only when no one >> else is using them). > > Look at other page_is_pfmemalloc() calls in other network drivers. > > The point is : > > We do not want to reuse a page that was initially allocated from > emergency reserve. > > Chances are high that the packet will be simply dropped. > > If your cache is not refilled with such problematic pages, > future dev_page_alloc() will likely give you sane pages, after stress > has ended. > > By definition, emergency reserves are small, we do not want to keep > pages from the reserve in another cache, depleting the reserve for too > long.
Got it, Thanks for the explanation, I got confused since the term "reuse" can also imply "reuse same page for another RX packet", I guess the correct term for mlx5 page cache is "recycle". > > dev_alloc_page() in mlx5 was introduced in commit > bc77b240b3c57236cdcc08d64ca390655d3a16ff > ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") > > Maybe my Fixes: tag was slightly wrong. > No, i think you got it right in first place, you are fixing the RX cache to not keep hold on such pages. before the RX cache we didn't have this issue. >> >> on mlx5e rx re-post to the HW buffer we will call >> mlx5e_page_alloc_mapped->mlx5e_rx_cache_get, and mlx5e_rx_cache_get >> will only return a page that is already freed by all stack users, >> otherwise it will allocated a brand new one. >> >> a page form mlx5e_page_cache can have one of 2 states >> 1. passed to the stack (not free) >> 2. free (already used and freed by the stack) >> >> a non free page will never get reused for another SKB. > > > Of course. > >> >> this is true for both mlx5 rx modes (striding and non-striding) >> >> but in the striding mode regardless of page cache, small packets can >> always fit in one page and this page can get pointed to by several >> SKBs - should we be concerned about this case ? >> >> Anyway, if you meant by this patch to avoid keeping such special pages >> in the mlx5 page cache, >> then it looks very good to me. >> >> Thanks, >> Saeed. >> >> >> > >> > Fixes: 4415a0319f92 ("net/mlx5e: Implement RX mapped page cache for page >> > recycle") >> > Signed-off-by: Eric Dumazet <eduma...@google.com> >> > Cc: Tariq Toukan <tar...@mellanox.com> >> > Cc: Saeed Mahameed <sae...@mellanox.com> >> > --- >> > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > index >> > 0e2fb3ed1790900ccdc0bbbef8bc5c33267df092..ce46409bde2736ea4d1246ca97855098f052f271 >> > 100644 >> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c >> > @@ -193,6 +193,9 @@ static inline bool mlx5e_rx_cache_put(struct mlx5e_rq >> > *rq, >> > return false; >> > } >> > >> > + if (unlikely(page_is_pfmemalloc(dma_info->page))) >> > + return false; >> > + >> > cache->page_cache[cache->tail] = *dma_info; >> > cache->tail = tail_next; >> > return true; >> > >> > > >