On Thu, Jan 19, 2017 at 9:25 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2017-01-19 at 21:14 +0200, Saeed Mahameed wrote:
>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > From: Eric Dumazet <eduma...@google.com>
>> >
>> > A driver using dev_alloc_page() must not reuse a page allocated from
>> > emergency memory reserve.
>> >
>> > Otherwise all packets using this page will be immediately dropped,
>> > unless for very specific sockets having SOCK_MEMALLOC bit set.
>> >
>> > This issue might be hard to debug, because only a fraction of received
>> > packets would be dropped.
>>
>> Hi Eric,
>>
>> When you say reuse, you mean point to the same page from several SKBs ?
>>
>> Because in our page cache implementation we don't reuse pages that
>> already passed to the stack,
>> we just keep them in the page cache until the ref count drop back to
>> one, so we recycle them (i,e they will be re-used only when no one
>> else is using them).
>
> Look at other page_is_pfmemalloc() calls in other network drivers.
>
> The point is :
>
> We do not want to reuse a page that was initially allocated from
> emergency reserve.
>
> Chances are high that the packet will be simply dropped.
>
> If your cache is not refilled with such problematic pages,
> future dev_page_alloc() will likely give you sane pages, after stress
> has ended.
>
> By definition, emergency reserves are small, we do not want to keep
> pages from the reserve in another cache, depleting the reserve for too
> long.

Got it, Thanks for the explanation, I got confused since the term "reuse"
can also imply "reuse same page for another RX packet", I guess the
correct term for
mlx5 page cache is "recycle".

>
> dev_alloc_page() in mlx5 was introduced in commit
> bc77b240b3c57236cdcc08d64ca390655d3a16ff
> ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
>
> Maybe my Fixes: tag was slightly wrong.
>

No, i think you got it right in first place, you are fixing the RX
cache to not keep hold on such pages.
before the RX cache we didn't have this issue.

>>
>> on mlx5e rx re-post to the HW buffer we will call
>> mlx5e_page_alloc_mapped->mlx5e_rx_cache_get, and  mlx5e_rx_cache_get
>> will only return a page that is already freed by all stack users,
>> otherwise it will allocated a brand new one.
>>
>> a page form mlx5e_page_cache can have one of 2 states
>> 1. passed to the stack (not free)
>> 2. free (already used and freed by the stack)
>>
>> a non free page will never get reused for another SKB.
>
>
> Of course.
>
>>
>> this is true for both mlx5 rx modes (striding and non-striding)
>>
>> but in the striding mode regardless of page cache, small packets can
>> always fit in one page and this page can get pointed to by several
>> SKBs - should we be concerned about this case ?
>>
>> Anyway, if you meant by this patch to avoid keeping such special pages
>> in the mlx5 page cache,
>> then it looks very good to me.
>>
>> Thanks,
>> Saeed.
>>
>>
>> >
>> > Fixes: 4415a0319f92 ("net/mlx5e: Implement RX mapped page cache for page 
>> > recycle")
>> > Signed-off-by: Eric Dumazet <eduma...@google.com>
>> > Cc: Tariq Toukan <tar...@mellanox.com>
>> > Cc: Saeed Mahameed <sae...@mellanox.com>
>> > ---
>> >  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |    3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
>> > b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > index 
>> > 0e2fb3ed1790900ccdc0bbbef8bc5c33267df092..ce46409bde2736ea4d1246ca97855098f052f271
>> >  100644
>> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > @@ -193,6 +193,9 @@ static inline bool mlx5e_rx_cache_put(struct mlx5e_rq 
>> > *rq,
>> >                 return false;
>> >         }
>> >
>> > +       if (unlikely(page_is_pfmemalloc(dma_info->page)))
>> > +               return false;
>> > +
>> >         cache->page_cache[cache->tail] = *dma_info;
>> >         cache->tail = tail_next;
>> >         return true;
>> >
>> >
>
>

Reply via email to