On Tue, Apr 4, 2017 at 3:38 PM, Ola Liljedahl <ola.liljed...@arm.com> wrote:
>
>
>
>
>
> On 04/04/2017, 22:14, "Dmitry Eremin-Solenikov"
> <dmitry.ereminsoleni...@linaro.org> wrote:
>
> >On 04.04.2017 21:48, Brian Brooks wrote:
> >> Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com>
> >> Reviewed-by: Brian Brooks <brian.bro...@arm.com>
> >> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> >> ---
> >>  platform/linux-generic/include/odp_llsc.h | 332
> >>++++++++++++++++++++++++++++++
> >>  1 file changed, 332 insertions(+)
> >>  create mode 100644 platform/linux-generic/include/odp_llsc.h
> >>
> >> diff --git a/platform/linux-generic/include/odp_llsc.h
> >>b/platform/linux-generic/include/odp_llsc.h
> >> new file mode 100644
> >> index 00000000..ea60c54b
> >> --- /dev/null
> >> +++ b/platform/linux-generic/include/odp_llsc.h
> >> @@ -0,0 +1,332 @@
> >> +/* Copyright (c) 2017, ARM Limited
> >> + * All rights reserved.
> >> + *
> >> + * SPDX-License-Identifier:BSD-3-Clause
> >> + */
> >> +
> >> +#ifndef ODP_LLSC_H_
> >> +#define ODP_LLSC_H_
> >> +
> >> +#include <odp_config_internal.h>
> >> +
> >>
> >>+/***********************************************************************
> >>*******
> >> + * LL/SC primitives
> >> +
> >>*************************************************************************
> >>****/
> >> +
> >> +#if defined __ARM_ARCH
> >> +#if __ARM_ARCH == 7 || (__ARM_ARCH == 8 && __ARM_64BIT_STATE == 0)
> >> +static inline void dmb(void)
> >> +{
> >> +__asm__ volatile("dmb" : : : "memory");
> >> +}
> >> +
> >> +static inline uint32_t ll8(uint8_t *var, int mm)
> >> +{
> >> +uint8_t old;
> >> +
> >> +__asm__ volatile("ldrexb %0, [%1]"
> >> + : "=&r" (old)
> >> + : "r" (var)
> >> + : );
> >> +/* Barrier after an acquiring load */
> >> +if (mm == __ATOMIC_ACQUIRE)
> >> +dmb();
> >> +return old;
> >> +}
> >
> >
> >Hmm, I remember Ola's story about ipfrag and stdatomic not providing
> >enough support for 128-bit atomics. But why do you need to define
> >load/store for 8- and 32-bit variables? Why can not you use stdatomic
> >interface here?
> The usage here is actually not to perform atomic updates.
>
> Load-exclusive is used in ARMv8 to load the local ³monitor² (which is used
> to check for atomicity in load-exclusive/store-exclusive sections).
> When the monitor is ³lost² (there is a better formal word for that but I
> don¹t remember it now) because some other CPU obtained exclusive ownership
> of the cache line in order to write to it, an event is generated. This is
> the event that wakes up the CPU when it is sleeping in WFE (wait for
> event).
>
> This is all explained in the ARMv8 documentation. In ARMv7a, you had to
> perform DSB+SEV in order to signal/wake up waiting CPU¹s, this is much
> slower (you can still do it on ARMv8 CPUs).
>
> >
> >Not to mention that ll/ll8/etc macto names are not _that_ easy to
> >understand without any additional comments. Please expand the names.
> >
> >> +#endif
> >> +
> >> +#if __ARM_ARCH == 8 && __ARM_64BIT_STATE == 1
> >
> >#elif here please.
> Brian this one is for you! :-)

I am not sure where you are requesting the #elif, Dmitry. The first
block is for ARMv7 and AArch32, and the second block is for AArch64.
Each block is wrapped in a #if XYZ ... #endif.  It's symmetrical.

> >
> >> +static inline void sevl(void)
> >> +{
> >> +#if defined __ARM_ARCH
> >> +__asm__ volatile("sevl" : : : );
> >> +#endif
> >> +}
> >> +
> >> +static inline void sev(void)
> >> +{
> >> +#if defined __ARM_ARCH
> >> +__asm__ volatile("sev" : : : "memory");
> >> +#endif
> >> +}
> >> +
> >> +static inline int wfe(void)
> >> +{
> >> +#if defined __ARM_ARCH
> >> +__asm__ volatile("wfe" : : : "memory");
> >> +#endif
> >> +return 1;
> >> +}
> >
> >Ugh. And what about other architectures?
> I don¹t know if any other architectures support something like the ARM
> event mechanism. I have never seen anything like it in e.g. MIPS or POWER.
>
> We could remove the inline functions and have the WFE() macro translate
> directly to the inline asm statement. Better?
>
> >
> >> +
> >> +#ifdef ODP_CONFIG_DMBSTR
> >> +
> >> +#if defined __ARM_ARCH && __ARM_ARCH == 8
> >> +/* Only ARMv8 supports DMB ISHLD */
> >> +/* A load only barrier is much cheaper than full barrier */
> >> +#define _odp_release_barrier(ro) \
> >> +do {     \
> >> +if (ro)     \
> >> +__asm__ volatile("dmb ishld" ::: "memory");  \
> >> +else     \
> >> +__asm__ volatile("dmb ish" ::: "memory");    \
> >> +} while (0)
> >> +#else
> >> +#define _odp_release_barrier(ro) \
> >> +__atomic_thread_fence(__ATOMIC_RELEASE)
> >> +#endif
> >> +
> >> +#define atomic_store_release(loc, val, ro)\
> >> +do {\
> >> +_odp_release_barrier(ro);\
> >> +__atomic_store_n(loc, val, __ATOMIC_RELAXED);   \
> >> +} while (0)
> >> +
> >> +#else
> >> +
> >> +#define atomic_store_release(loc, val, ro) \
> >> +__atomic_store_n(loc, val, __ATOMIC_RELEASE)
> >> +
> >> +#endif
> >> +
> >> +#ifdef ODP_CONFIG_USE_WFE
> >
> >Lack of documentation on those abbreviations is a very, very nice thing.
> Which abbreviations are we talking about here? The different build
> configurations (e.g. ODP_CONFIG_USE_WFE) are described in
> odp_config_internal.h. Do you think there should be some documentation
> here as well?
>
> >
> >> +#define SEVL() sevl()
> >> +#define WFE() wfe()
> >> +#define SEV() do { __asm__ volatile("dsb ish" ::: "memory"); sev(); }
> >>while (0)
> >
> >Soooo, if you need dsb here, why isn't it a part of sev()?
> We actually only need SEV (and DSB if there were any stores that need to
> be visible which is the normal case) on ARMv7a. I assume that we need to
> ensure ARMv7a support at some time (although 32-bit support doesn¹t
> necessarily mean ARMv7a, ARMv8 AArch32 code can still use ARMv8
> mechanisms). This macro could probably be written in a better (or at least
> different) way, e.g. combine DSB and SEV in one inline asm statement.
>
> So I am a little bit torn here. Remove the ARMv7a stuff until it is really
> needed or keep it (but possibly change it when it is actually used)?
> Suggestions? (now I start to feel that dead-ish code should be removed).
>
> >
> >> +#if defined __ARM_ARCH && __ARM_ARCH == 8 && __ARM_64BIT_STATE == 1
> >> +#define LDXR128(addr, mo) lld((addr), (mo))
> >> +#endif
> >
> >And who will provide LDXR128 in the other case?
> There are no other cases AFAIK.
> Only ARMv8 supports the event mechanism and 128-bit words.
> On ARMv7a, a double-word (e.g. two pointers) will only be 64-bits. If we
> need to do wait-for-event on e.g. a double-word location, ll64() will
> suffice.
>
> In practice, it might not matter which size (e.g. ll8..ll128) is used in
> load-exclusive in order to arm the local monitor because in practice the
> mechanism is implemented using the cache coherency protocol. Doing a
> load-exclusive on any word in the cache line would probably work but that
> feels misleading.
>
> On non-ARM architectures, the macros in the #else branch below will be
> used and we will spin wait the generic way (relaxed loads and using
> whatever NOP/pause instruction odp_cpu_pause() translates to).
>
> >
> >> +#define LDXR64(addr, mo) ll64((addr), (mo))
> >> +#define LDXR32(addr, mo) ll32((addr), (mo))
> >> +#define LDXR8(addr, mo) ll8((addr), (mo))
> >> +/* When using WFE do not stall the pipeline using other means */
> >> +#define DOZE() (void)0
> >> +#else
> >> +#define SEVL() (void)0
> >> +#define WFE() 1
> >> +#define SEV() (void)0
> >> +#define LDXR128(addr, mo) __atomic_load_n((addr), (mo))
> >> +#define LDXR64(addr, mo) __atomic_load_n((addr), (mo))
> >> +#define LDXR32(addr, mo) __atomic_load_n((addr), (mo))
> >> +#define LDXR8(addr, mo) __atomic_load_n((addr), (mo))
> >> +#define DOZE() odp_cpu_pause()
> >> +#endif
> >> +
> >> +#endif
> >
> >
> >--
> >With best wishes
> >Dmitry
>
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.

Reply via email to