Hi,
On 2017/10/19 18:02, Ananyev, Konstantin wrote:
Hi Jia,
Hi
On 10/13/2017 9:02 AM, Jia He Wrote:
Hi Jerin
On 10/13/2017 1:23 AM, Jerin Jacob Wrote:
-----Original Message-----
Date: Thu, 12 Oct 2017 17:05:50 +0000
[...]
On the same lines,
Jia He, jie2.liu, bing.zhao,
Is this patch based on code review or do you saw this issue on any of
the
arm/ppc target? arm64 will have performance impact with this change.
sorry, miss one important information
Our platform is an aarch64 server with 46 cpus.
If we reduced the involved cpu numbers, the bug occurred less frequently.
Yes, mb barrier impact the performance, but correctness is more
important, isn't it ;-)
Maybe we canĀ find any other lightweight barrier here?
Cheers,
Jia
Based on mbuf_autotest, the rte_panic will be invoked in seconds.
PANIC in test_refcnt_iter():
(lcore=0, iter=0): after 10s only 61 of 64 mbufs left free
1: [./test(rte_dump_stack+0x38) [0x58d868]]
Aborted (core dumped)
So is it only reproducible with mbuf refcnt test?
Could it be reproduced with some 'pure' ring test
(no mempools/mbufs refcnt, etc.)?
The reason I am asking - in that test we also have mbuf refcnt updates
(that's what for that test was created) and we are doing some optimizations
here too
to avoid excessive atomic updates.
BTW, if the problem is not reproducible without mbuf refcnt,
can I suggest to extend the test with:
- add a check that enqueue() operation was successful
- walk through the pool and check/printf refcnt of each mbuf.
Hopefully that would give us some extra information what is going wrong here.
Konstantin
Currently, the issue is only found in this case here on the ARM
platform, not sure how it is going with the X86_64 platform. In another
mail of this thread, we've made a simple test based on this and captured
some information and I pasted there.(I pasted the patch there :-)) And
it seems that Juhamatti & Jacod found some reverting action several
months ago.
BR. Bing