On Mon, Feb 13, 2017 at 4:57 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > >> Alex, be assured that I implemented the full thing, of course. > > Patch was : > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c > index > aa074e57ce06fb2842fa1faabd156c3cd2fe10f5..0ae1b544668d26c24044dbdefdd9b12253596ff9 > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c > @@ -68,6 +68,7 @@ static int mlx4_alloc_page(struct mlx4_en_priv *priv, > frag->page = page; > frag->dma = dma; > frag->page_offset = priv->rx_headroom; > + frag->pagecnt_bias = 1; > return 0; > } > > @@ -97,7 +98,7 @@ static void mlx4_en_free_frag(const struct mlx4_en_priv > *priv, > if (frag->page) { > dma_unmap_page(priv->ddev, frag->dma, > PAGE_SIZE, priv->dma_dir); > - __free_page(frag->page); > + __page_frag_cache_drain(frag->page, frag->pagecnt_bias); > } > /* We need to clear all fields, otherwise a change of > priv->log_rx_info > * could lead to see garbage later in frag->page. > @@ -470,6 +471,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv > *priv, > { > const struct mlx4_en_frag_info *frag_info = priv->frag_info; > unsigned int truesize = 0; > + unsigned int pagecnt_bias; > int nr, frag_size; > struct page *page; > dma_addr_t dma; > @@ -491,9 +493,10 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv > *priv, > frag_size); > > truesize += frag_info->frag_stride; > + pagecnt_bias = frags->pagecnt_bias--; > if (frag_info->frag_stride == PAGE_SIZE / 2) { > frags->page_offset ^= PAGE_SIZE / 2; > - release = page_count(page) != 1 || > + release = page_count(page) != pagecnt_bias || > page_is_pfmemalloc(page) || > page_to_nid(page) != numa_mem_id(); > } else { > @@ -504,9 +507,13 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv > *priv, > } > if (release) { > dma_unmap_page(priv->ddev, dma, PAGE_SIZE, > priv->dma_dir); > + __page_frag_cache_drain(page, --pagecnt_bias); > frags->page = NULL; > } else { > - page_ref_inc(page); > + if (pagecnt_bias == 1) { > + page_ref_add(page, USHRT_MAX); > + frags->pagecnt_bias = USHRT_MAX; > + } > } > > nr++;
You might want to examine the code while running perf. What you should see is the page_ref_inc here go from eating a significant amount of time prior to the patch to something negligable after the patch. If the page_ref_inc isn't adding much pressure then maybe that is why it didn't provide any significant gain on mlx4. I suppose it's a possibility that the mlx4 code is different enough that maybe their code is just running in a different environment, for example there might not be any MMIO pressure to put any serious pressure on the atomic op so it is processed more quickly. Also back when I was hammering on this it was back when I was mostly focused on routing and doing micro-benchmarks. Odds are it is probably one of those things that won't show up unless you are really looking for it so no need to worry about addressing it now. - Alex