> -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman > Sent: Friday, March 27, 2015 10:26 AM > To: Wiles, Keith > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] mbuf: optimize refcnt handling during free > > On Thu, Mar 26, 2015 at 09:00:33PM +0000, Wiles, Keith wrote: > > > > > > On 3/26/15, 1:10 PM, "Zoltan Kiss" <zoltan.kiss at linaro.org> wrote: > > > > >The current way is not the most efficient: if m->refcnt is 1, the second > > >condition never evaluates, and we set it to 0. If refcnt > 1, the 2nd > > >condition fails again, although the code suggest otherwise to branch > > >prediction. Instead we should keep the second condition only, and remove > > >the > > >duplicate set to zero. > > > > > >Signed-off-by: Zoltan Kiss <zoltan.kiss at linaro.org> > > >--- > > > lib/librte_mbuf/rte_mbuf.h | 5 +---- > > > 1 file changed, 1 insertion(+), 4 deletions(-) > > > > > >diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > >index 17ba791..3ec4024 100644 > > >--- a/lib/librte_mbuf/rte_mbuf.h > > >+++ b/lib/librte_mbuf/rte_mbuf.h > > >@@ -764,10 +764,7 @@ __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > > > { > > > __rte_mbuf_sanity_check(m, 0); > > > > > >- if (likely (rte_mbuf_refcnt_read(m) == 1) || > > >- likely (rte_mbuf_refcnt_update(m, -1) == 0)) { > > >- > > >- rte_mbuf_refcnt_set(m, 0); > > >+ if (likely (rte_mbuf_refcnt_update(m, -1) == 0)) { > > > > > > /* if this is an indirect mbuf, then > > > * - detach mbuf > > > > I fell for this one too, but read Bruce?s email > > http://dpdk.org/ml/archives/dev/2015-March/014481.html > > This is still the right thing to do though, Bruce's reasoning is erroneous.
No, it is not. I believe Bruce comments is absolutely correct here. > Just because the return from rte_mbuf_refcnt_read returns 1, doesn't mean you It does. > are the last user of the mbuf, > you are only guaranteed that if the update > operation returns zero. > > In other words: > rte_mbuf_refcnt_update(m, -1) > > is an atomic operation > > if (likely (rte_mbuf_refcnt_read(m) == 1) || > likely (rte_mbuf_refcnt_update(m, -1) == 0)) { > > > is not. > > To illustrate, on two cpus, this might occur: > > CPU0 CPU1 > rte_mbuf_refcnt_read ... > returns 1 rte_mbuf_refcnt_read > ... returns 1 > execute if clause execute if clause If you have an mbuf with refcnt==N and try to call free() for it N+1 times - it is a bug in your code. Such code wouldn't work properly doesn't matter do we use: if (likely (rte_mbuf_refcnt_read(m) == 1) || likely (rte_mbuf_refcnt_update(m, -1) == 0)) or just: if (likely (rte_mbuf_refcnt_update(m, -1) == 0)) To illustrate it with your example: Suppose m.refcnt==1 CPU0 executes: rte_pktmbuf_free(m1) /*rte_mbuf_refcnt_update(m1, -1) returns 0, so we reset I'ts refcnt and next and put mbuf back to the pool.*/ m2 = rte_pktmbuf_alloc(pool); /*as m1 is 'free' alloc could return same mbuf here, i.e: m2 == m1. */ /* m2 refcnt ==1 start using m2 */ CPU1 executes: rte_pktmbuf_free(m1) /*rte_mbuf_refcnt_update(m1, -1) returns 0, so we reset I'ts refcnt and next and put mbuf back to the pool.*/ We just returnend to the pool mbuf that is in use and caused silent memory corruption of the mbuf's content. > > In the above scenario both cpus fell into the if clause because they both > held a > pointer to the same buffer and both got a return value of one, so they skipped > the update portion of the if clause and both executed the internal block of > the > conditional expression. you might be tempted to think thats ok, since that > block just sets the refcnt to zero, and doing so twice isn't harmful, but the > entire purpose of that if conditional above was to ensure that only one > execution context ever executed the conditional for a given buffer. Look at > what else happens in that conditional: > > static inline struct rte_mbuf* __attribute__((always_inline)) > __rte_pktmbuf_prefree_seg(struct rte_mbuf *m) > { > __rte_mbuf_sanity_check(m, 0); > > if (likely (rte_mbuf_refcnt_read(m) == 1) || > likely (rte_mbuf_refcnt_update(m, -1) == 0)) { > > rte_mbuf_refcnt_set(m, 0); > > /* if this is an indirect mbuf, then > * - detach mbuf > * - free attached mbuf segment > */ > if (RTE_MBUF_INDIRECT(m)) { > struct rte_mbuf *md = > RTE_MBUF_FROM_BADDR(m->buf_addr); > rte_pktmbuf_detach(m); > if (rte_mbuf_refcnt_update(md, -1) == 0) > __rte_mbuf_raw_free(md); > } > return(m); > } > return (NULL); > } > > If the buffer is indirect, another refcnt update occurs to the buf_addr mbuf, > and in the scenario I outlined above, that refcnt will underflow, likely > causing > a buffer leak. Additionally, the return code of this function is designed to > indicate to the caller if they were the last user of the buffer. In the above > scenario, two execution contexts will be told that they were, which is wrong. > > Zoltans patch is a good fix I don't think so. > > Acked-by: Neil Horman <nhorman at tuxdriver.com> NACKed-by: Konstantin Ananyev <konstantin.ananyev at intel.com>