On Wed, Oct 16, 2019 at 11:04 AM Phil Yang (Arm Technology China) <phil.y...@arm.com> wrote: > > > -----Original Message----- > > From: David Marchand <david.march...@redhat.com> > > Sent: Tuesday, October 15, 2019 8:16 PM > > To: Phil Yang (Arm Technology China) <phil.y...@arm.com> > > Cc: tho...@monjalon.net; jer...@marvell.com; Gage Eads > > <gage.e...@intel.com>; dev <dev@dpdk.org>; hemant.agra...@nxp.com; > > Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Gavin Hu (Arm > > Technology China) <gavin...@arm.com>; nd <n...@arm.com> > > Subject: Re: [dpdk-dev] [PATCH v9 1/3] eal/arm64: add 128-bit atomic > > compare exchange > > > > On Tue, Oct 15, 2019 at 1:32 PM Phil Yang (Arm Technology China) > > <phil.y...@arm.com> wrote: > > > > -----Original Message----- > > > > From: David Marchand <david.march...@redhat.com> > > > > If LSE is available, we expose __rte_cas_XX (explicitely) *non* > > > > inlined functions, while without LSE, we expose inlined __rte_ldr_XX > > > > and __rte_stx_XX functions. > > > > So we have a first disparity with non-inlined vs inlined functions > > > > depending on a #ifdef. > > > > You did not comment on the inline / no inline part and I still see > > this in the v10. > > Is this __rte_noinline on the CAS function intentional? > > Apologize for missing this item. Yes, it is to avoid ABI break. > Please check > 5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix possible > arm64 ABI break")
Looked at the kernel parts on LSE CAS (thanks for the pointer) but I see inlines are used: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/atomic_lse.h#n365?h=v5.4-rc3 What is special in the kernel or in dpdk that makes this different? > > > > > > > > > Then, we have a second disparity with two sets of "apis" depending on > > > > this #ifdef. > > > > > > > > And we expose those sets with a rte_ prefix, meaning people will try > > > > to use them, but those are not part of a public api. > > > > > > > > Can't we do without them ? (see below [2] for a proposal with ldr/stx, > > > > cas should be the same) > > > > > > No, it doesn't work. > > > Because we need to verify the return value at the end of the loop for > > > these > > macros. > > > > Do you mean the return value for the stores? > > It is my bad. I missed the ret option in the macro. This approach works. Ok, thanks for confirming. > > However, I suggest to keep them as static inline functions rather than a > piece of macro in the rte_atomic128_cmp_exchange API. > One reason is APIs name can indicate the memory ordering of these operations. API? Those inlines are not part of a public API and we agree this patch is not about adding 128 bits load/store apis. My proposal gives us small code that looks like: if (ldx_mo == __ATOMIC_RELAXED) __READ_128("ldxp", dst, old); else __READ_128("ldaxp", dst, old); I am not a memory order guru, but with this, I can figure the asm instruction depends on it. And, since we are looking at internals of an implementation, this is mainly for people looking at/maintaining these low level details. > Moreover, it uses the register type to pass the value in the inline function, > so it should not have too much cost comparing with the macro. This is not a problem of cost, this is about hiding architecture details from the final user. If you expose something, you can expect someone will start using it and will complain later if you break it. > I also think these 128bit load and store functions can be used in other > places, once it has been proved valuable in rte_atomic128_cmp_exchange API. > But let's keep them private for the current stage. Yes I agree this could be introduced in the future. > BTW, Linux kernel implemented in the same way. > https://github.com/torvalds/linux/blob/master/arch/arm64/include/asm/atomic_lse.h#L19 Ok kernel exposes its internals, but I think kernel developpers are more vigilant than dpdk developpers on what is part of the public API and what is internal. -- David Marchand