Sorry to say but I see the ARM disclaimer at the bottom even though I requested the disclaimer not to be added to this specific email (there is a trick for that). I need to figure out what went wrong.
-- Ola On 4 April 2017 at 22:38, Ola Liljedahl <ola.liljed...@arm.com> wrote: > > > > > On 04/04/2017, 22:14, "Dmitry Eremin-Solenikov" > <dmitry.ereminsoleni...@linaro.org> wrote: > >>On 04.04.2017 21:48, Brian Brooks wrote: >>> Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com> >>> Reviewed-by: Brian Brooks <brian.bro...@arm.com> >>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> >>> --- >>> platform/linux-generic/include/odp_llsc.h | 332 >>>++++++++++++++++++++++++++++++ >>> 1 file changed, 332 insertions(+) >>> create mode 100644 platform/linux-generic/include/odp_llsc.h >>> >>> diff --git a/platform/linux-generic/include/odp_llsc.h >>>b/platform/linux-generic/include/odp_llsc.h >>> new file mode 100644 >>> index 00000000..ea60c54b >>> --- /dev/null >>> +++ b/platform/linux-generic/include/odp_llsc.h >>> @@ -0,0 +1,332 @@ >>> +/* Copyright (c) 2017, ARM Limited >>> + * All rights reserved. >>> + * >>> + * SPDX-License-Identifier:BSD-3-Clause >>> + */ >>> + >>> +#ifndef ODP_LLSC_H_ >>> +#define ODP_LLSC_H_ >>> + >>> +#include <odp_config_internal.h> >>> + >>> >>>+/*********************************************************************** >>>******* >>> + * LL/SC primitives >>> + >>>************************************************************************* >>>****/ >>> + >>> +#if defined __ARM_ARCH >>> +#if __ARM_ARCH == 7 || (__ARM_ARCH == 8 && __ARM_64BIT_STATE == 0) >>> +static inline void dmb(void) >>> +{ >>> +__asm__ volatile("dmb" : : : "memory"); >>> +} >>> + >>> +static inline uint32_t ll8(uint8_t *var, int mm) >>> +{ >>> +uint8_t old; >>> + >>> +__asm__ volatile("ldrexb %0, [%1]" >>> + : "=&r" (old) >>> + : "r" (var) >>> + : ); >>> +/* Barrier after an acquiring load */ >>> +if (mm == __ATOMIC_ACQUIRE) >>> +dmb(); >>> +return old; >>> +} >> >> >>Hmm, I remember Ola's story about ipfrag and stdatomic not providing >>enough support for 128-bit atomics. But why do you need to define >>load/store for 8- and 32-bit variables? Why can not you use stdatomic >>interface here? > The usage here is actually not to perform atomic updates. > > Load-exclusive is used in ARMv8 to load the local ³monitor² (which is used > to check for atomicity in load-exclusive/store-exclusive sections). > When the monitor is ³lost² (there is a better formal word for that but I > don¹t remember it now) because some other CPU obtained exclusive ownership > of the cache line in order to write to it, an event is generated. This is > the event that wakes up the CPU when it is sleeping in WFE (wait for > event). > > This is all explained in the ARMv8 documentation. In ARMv7a, you had to > perform DSB+SEV in order to signal/wake up waiting CPU¹s, this is much > slower (you can still do it on ARMv8 CPUs). > >> >>Not to mention that ll/ll8/etc macto names are not _that_ easy to >>understand without any additional comments. Please expand the names. >> >>> +#endif >>> + >>> +#if __ARM_ARCH == 8 && __ARM_64BIT_STATE == 1 >> >>#elif here please. > Brian this one is for you! :-) > >> >>> +static inline void sevl(void) >>> +{ >>> +#if defined __ARM_ARCH >>> +__asm__ volatile("sevl" : : : ); >>> +#endif >>> +} >>> + >>> +static inline void sev(void) >>> +{ >>> +#if defined __ARM_ARCH >>> +__asm__ volatile("sev" : : : "memory"); >>> +#endif >>> +} >>> + >>> +static inline int wfe(void) >>> +{ >>> +#if defined __ARM_ARCH >>> +__asm__ volatile("wfe" : : : "memory"); >>> +#endif >>> +return 1; >>> +} >> >>Ugh. And what about other architectures? > I don¹t know if any other architectures support something like the ARM > event mechanism. I have never seen anything like it in e.g. MIPS or POWER. > > We could remove the inline functions and have the WFE() macro translate > directly to the inline asm statement. Better? > >> >>> + >>> +#ifdef ODP_CONFIG_DMBSTR >>> + >>> +#if defined __ARM_ARCH && __ARM_ARCH == 8 >>> +/* Only ARMv8 supports DMB ISHLD */ >>> +/* A load only barrier is much cheaper than full barrier */ >>> +#define _odp_release_barrier(ro) \ >>> +do { \ >>> +if (ro) \ >>> +__asm__ volatile("dmb ishld" ::: "memory"); \ >>> +else \ >>> +__asm__ volatile("dmb ish" ::: "memory"); \ >>> +} while (0) >>> +#else >>> +#define _odp_release_barrier(ro) \ >>> +__atomic_thread_fence(__ATOMIC_RELEASE) >>> +#endif >>> + >>> +#define atomic_store_release(loc, val, ro)\ >>> +do {\ >>> +_odp_release_barrier(ro);\ >>> +__atomic_store_n(loc, val, __ATOMIC_RELAXED); \ >>> +} while (0) >>> + >>> +#else >>> + >>> +#define atomic_store_release(loc, val, ro) \ >>> +__atomic_store_n(loc, val, __ATOMIC_RELEASE) >>> + >>> +#endif >>> + >>> +#ifdef ODP_CONFIG_USE_WFE >> >>Lack of documentation on those abbreviations is a very, very nice thing. > Which abbreviations are we talking about here? The different build > configurations (e.g. ODP_CONFIG_USE_WFE) are described in > odp_config_internal.h. Do you think there should be some documentation > here as well? > >> >>> +#define SEVL() sevl() >>> +#define WFE() wfe() >>> +#define SEV() do { __asm__ volatile("dsb ish" ::: "memory"); sev(); } >>>while (0) >> >>Soooo, if you need dsb here, why isn't it a part of sev()? > We actually only need SEV (and DSB if there were any stores that need to > be visible which is the normal case) on ARMv7a. I assume that we need to > ensure ARMv7a support at some time (although 32-bit support doesn¹t > necessarily mean ARMv7a, ARMv8 AArch32 code can still use ARMv8 > mechanisms). This macro could probably be written in a better (or at least > different) way, e.g. combine DSB and SEV in one inline asm statement. > > So I am a little bit torn here. Remove the ARMv7a stuff until it is really > needed or keep it (but possibly change it when it is actually used)? > Suggestions? (now I start to feel that dead-ish code should be removed). > >> >>> +#if defined __ARM_ARCH && __ARM_ARCH == 8 && __ARM_64BIT_STATE == 1 >>> +#define LDXR128(addr, mo) lld((addr), (mo)) >>> +#endif >> >>And who will provide LDXR128 in the other case? > There are no other cases AFAIK. > Only ARMv8 supports the event mechanism and 128-bit words. > On ARMv7a, a double-word (e.g. two pointers) will only be 64-bits. If we > need to do wait-for-event on e.g. a double-word location, ll64() will > suffice. > > In practice, it might not matter which size (e.g. ll8..ll128) is used in > load-exclusive in order to arm the local monitor because in practice the > mechanism is implemented using the cache coherency protocol. Doing a > load-exclusive on any word in the cache line would probably work but that > feels misleading. > > On non-ARM architectures, the macros in the #else branch below will be > used and we will spin wait the generic way (relaxed loads and using > whatever NOP/pause instruction odp_cpu_pause() translates to). > >> >>> +#define LDXR64(addr, mo) ll64((addr), (mo)) >>> +#define LDXR32(addr, mo) ll32((addr), (mo)) >>> +#define LDXR8(addr, mo) ll8((addr), (mo)) >>> +/* When using WFE do not stall the pipeline using other means */ >>> +#define DOZE() (void)0 >>> +#else >>> +#define SEVL() (void)0 >>> +#define WFE() 1 >>> +#define SEV() (void)0 >>> +#define LDXR128(addr, mo) __atomic_load_n((addr), (mo)) >>> +#define LDXR64(addr, mo) __atomic_load_n((addr), (mo)) >>> +#define LDXR32(addr, mo) __atomic_load_n((addr), (mo)) >>> +#define LDXR8(addr, mo) __atomic_load_n((addr), (mo)) >>> +#define DOZE() odp_cpu_pause() >>> +#endif >>> + >>> +#endif >> >> >>-- >>With best wishes >>Dmitry > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you.