20/10/2020 23:49, Honnappa Nagarahalli: > <snip> > > > > > Honnappa? > > > > 07/10/2020 11:55, Diogo Behrens: > > > Hi Thomas, > > > > > > we are still waiting for the comments from Honnappa. In our > > > understanding, the missing barrier is a bug according to the model. We > > > reproduced the scenario in herd7, which represents the authoritative > > > memory model: > > > https://developer.arm.com/architectures/cpu-architecture/a-profile/mem > > > ory-model-tool > > > > > > Here is a litmus code that shows that the XCHG (when compiled to LDAXR > > and STLR) is not atomic wrt memory updates to other locations: > > > ----- > > > AArch64 XCHG-nonatomic > > > { > > > 0:X1=locked; 0:X3=next; > > > 1:X1=locked; 1:X3=next; 1:X5=tail; > > > } > > > P0 | P1; > > > LDR W0, [X3] | MOV W0, #1; > > > CBZ W0, end | STR W0, [X1]; (* init locked *) > > > MOV W2, #2 | MOV W2, #0; > > > STR W2, [X1] | xchg:; > > > end: | LDAXR W6, [X5]; > > > NOP | STLXR W4, W0, [X5]; > > > NOP | CBNZ W4, xchg; > > > NOP | STR W0, [X3]; (* set next *) > > > exists > > > (0:X2=2 /\ locked=1) > > > ----- > > > (web version of herd7: http://diy.inria.fr/www/?record=aarch64) > > > > > > P1 is trying to acquire the lock: > > > - initializes locked > > > - does the xchg on the tail of the mcslock > > > - sets the next > > > > > > P0 is releasing the lock: > > > - if next is not set, just terminates > > > - if next is set, stores 2 in locked > > > > > > The initialization of locked should never overwrite the store 2 to > > > locked, but > > it does. > > > To avoid that reordering to happen, one should make the last store of P1 > > > to > > have a "release" barrier, ie, STLR. > > > > > > This is equivalent to the reordering occurring in the mcslock of > > > librte_eal. > > > > > > Best regards, > > > -Diogo > > > > > > -----Original Message----- > > > From: Thomas Monjalon [mailto:tho...@monjalon.net] > > > Sent: Tuesday, October 6, 2020 11:50 PM > > > To: Phil Yang <phil.y...@arm.com>; Diogo Behrens > > > <diogo.behr...@huawei.com>; Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com> > > > Cc: dev@dpdk.org; nd <n...@arm.com> > > > Subject: Re: [dpdk-dev] [PATCH] librte_eal: fix mcslock hang on weak > > > memory > > > > > > 31/08/2020 20:45, Honnappa Nagarahalli: > > > > > > > > Hi Diogo, > > > > > > > > Thanks for your explanation. > > > > > > > > As documented in > > https://developer.arm.com/documentation/ddi0487/fc B2.9.5 Load- > > Exclusive and Store-Exclusive instruction usage restrictions: > > > > " Between the Load-Exclusive and the Store-Exclusive, there are no > > > > explicit memory accesses, preloads, direct or indirect System > > > > register writes, address translation instructions, cache or TLB > > maintenance instructions, exception generating instructions, exception > > returns, or indirect branches." > > > > [Honnappa] This is a requirement on the software, not on the micro- > > architecture. > > > > We are having few discussions internally, will get back soon. > > > > > > > > So it is not allowed to insert (1) & (4) between (2, 3). The cmpxchg > > operation is atomic. > > > > > > > > > Please what is the conclusion? > Apologies for not updating on this sooner. > > Unfortunately, memory ordering questions are hard topics. I have been > discussing this internally with few experts and it is still ongoing, hope to > conclude soon. > > My focus has been to replace __atomic_exchange_n(msl, me, __ATOMIC_ACQ_REL) > with __atomic_exchange_n(msl, me, __ATOMIC_SEQ_CST). However, the generated > code is the same in the second case as well (for load-store exclusives), > which I am not sure if it is correct. > > I think we have 2 choices here: > 1) Accept the patch - when my internal discussion concludes, I can make the > change and backport according to the conclusion. > 2) Wait till the discussion is over - it might take another couple of weeks
One month passed since this last update. We are keeping this issue in DPDK 20.11.0 I guess.