RE: [PATCH v11 0/4] Recycle mbufs from Tx queue into Rx queue

Feifei Wang Wed, 23 Aug 2023 20:12:07 -0700

> -----Original Message-----
> From: Stephen Hemminger <step...@networkplumber.org>
> Sent: Tuesday, August 22, 2023 9:59 PM
> To: Feifei Wang <feifei.wa...@arm.com>
> Cc: dev@dpdk.org; nd <n...@arm.com>
> Subject: Re: [PATCH v11 0/4] Recycle mbufs from Tx queue into Rx queue
> 
> On Tue, 22 Aug 2023 15:27:06 +0800
> Feifei Wang <feifei.wa...@arm.com> wrote:
> 
> >   Currently, the transmit side frees the buffers into the lcore cache
> > and the receive side allocates buffers from the lcore cache. The
> > transmit side typically frees 32 buffers resulting in 32*8=256B of
> > stores to lcore cache. The receive side allocates 32 buffers and
> > stores them in the receive side software ring, resulting in 32*8=256B
> > of stores and 256B of load from the lcore cache.
> >
> > This patch proposes a mechanism to avoid freeing to/allocating from
> > the lcore cache. i.e. the receive side will free the buffers from
> > transmit side directly into its software ring. This will avoid the
> > 256B of loads and stores introduced by the lcore cache. It also frees
> > up the cache lines used by the lcore cache. And we can call this mode
> > as mbufs recycle mode.
> 
> Isn't the recycle ring just another cache? Why is the lcore cache slower?
> Could we fix the general case there?

Here lcore cache means the mempool lcore cache for each lcore:  
mp->local_cache[lcore_id];
For each buffer allocate from mempool and free into mempool, the thread will 
firstly try to do
Memory copy from or into lcore cache.

We do not say lcore cache is slower, we means do memory copy from or into lcore 
cache will
cost much CPU cycle, and mbuf recycle can bypass these memory copy.

For generic case , we try to use zero copy to optimize, but the performance is 
worse than mbufs recycle:

For general path: 
                Rx: 32 pkts memcpy from mempool cache to rx_sw_ring
                Tx: 32 pkts memcpy from tx_sw_ring to temporary variable + 32 
pkts memcpy from temporary variable to mempool cache
For ZC API used in mempool:
                Rx: 32 pkts memcpy from mempool cache to rx_sw_ring
                Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool cache
                Refer link: 
http://patches.dpdk.org/project/dpdk/patch/20230221055205.22984-2-kamalakshitha.alig...@arm.com/
For mbufs recycle:
                Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring
RE: [PATCH v11 0/4] Recycle mbufs from Tx queue into Rx queue

Reply via email to