On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:
> /*
> * make sure we read the CQE after we read the ownership bit
> */
> dma_rmb();
> + prefetch(frags[0].page);
Note that I would like to instead do a prefetch(frags[1].page)
So I will probably change how ring->rx_info is allocated
wasting all that space and forcing vmalloc() is silly :
tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = vzalloc_node(tmp, node);
In most cases, using exactly 12 bytes per slot would allow better
packing. Only one cpu is using this area, no need to force strange
alignments, for the sake of avoiding a multiply !