On 04/04/2017, 22:14, "Dmitry Eremin-Solenikov"
<dmitry.ereminsoleni...@linaro.org> wrote:

>On 04.04.2017 21:48, Brian Brooks wrote:
>> Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com>
>> Reviewed-by: Brian Brooks <brian.bro...@arm.com>
>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
>> ---
>>  platform/linux-generic/include/odp_llsc.h | 332
>>++++++++++++++++++++++++++++++
>>  1 file changed, 332 insertions(+)
>>  create mode 100644 platform/linux-generic/include/odp_llsc.h
>>
>> diff --git a/platform/linux-generic/include/odp_llsc.h
>>b/platform/linux-generic/include/odp_llsc.h
>> new file mode 100644
>> index 00000000..ea60c54b
>> --- /dev/null
>> +++ b/platform/linux-generic/include/odp_llsc.h
>> @@ -0,0 +1,332 @@
>> +/* Copyright (c) 2017, ARM Limited
>> + * All rights reserved.
>> + *
>> + * SPDX-License-Identifier:BSD-3-Clause
>> + */
>> +
>> +#ifndef ODP_LLSC_H_
>> +#define ODP_LLSC_H_
>> +
>> +#include <odp_config_internal.h>
>> +
>>
>>+/***********************************************************************
>>*******
>> + * LL/SC primitives
>> +
>>*************************************************************************
>>****/
>> +
>> +#if defined __ARM_ARCH
>> +#if __ARM_ARCH == 7 || (__ARM_ARCH == 8 && __ARM_64BIT_STATE == 0)
>> +static inline void dmb(void)
>> +{
>> +__asm__ volatile("dmb" : : : "memory");
>> +}
>> +
>> +static inline uint32_t ll8(uint8_t *var, int mm)
>> +{
>> +uint8_t old;
>> +
>> +__asm__ volatile("ldrexb %0, [%1]"
>> + : "=&r" (old)
>> + : "r" (var)
>> + : );
>> +/* Barrier after an acquiring load */
>> +if (mm == __ATOMIC_ACQUIRE)
>> +dmb();
>> +return old;
>> +}
>
>
>Hmm, I remember Ola's story about ipfrag and stdatomic not providing
>enough support for 128-bit atomics. But why do you need to define
>load/store for 8- and 32-bit variables? Why can not you use stdatomic
>interface here?
The usage here is actually not to perform atomic updates.

Load-exclusive is used in ARMv8 to load the local ³monitor² (which is used
to check for atomicity in load-exclusive/store-exclusive sections).
When the monitor is ³lost² (there is a better formal word for that but I
don¹t remember it now) because some other CPU obtained exclusive ownership
of the cache line in order to write to it, an event is generated. This is
the event that wakes up the CPU when it is sleeping in WFE (wait for
event).

This is all explained in the ARMv8 documentation. In ARMv7a, you had to
perform DSB+SEV in order to signal/wake up waiting CPU¹s, this is much
slower (you can still do it on ARMv8 CPUs).

>
>Not to mention that ll/ll8/etc macto names are not _that_ easy to
>understand without any additional comments. Please expand the names.
>
>> +#endif
>> +
>> +#if __ARM_ARCH == 8 && __ARM_64BIT_STATE == 1
>
>#elif here please.
Brian this one is for you! :-)

>
>> +static inline void sevl(void)
>> +{
>> +#if defined __ARM_ARCH
>> +__asm__ volatile("sevl" : : : );
>> +#endif
>> +}
>> +
>> +static inline void sev(void)
>> +{
>> +#if defined __ARM_ARCH
>> +__asm__ volatile("sev" : : : "memory");
>> +#endif
>> +}
>> +
>> +static inline int wfe(void)
>> +{
>> +#if defined __ARM_ARCH
>> +__asm__ volatile("wfe" : : : "memory");
>> +#endif
>> +return 1;
>> +}
>
>Ugh. And what about other architectures?
I don¹t know if any other architectures support something like the ARM
event mechanism. I have never seen anything like it in e.g. MIPS or POWER.

We could remove the inline functions and have the WFE() macro translate
directly to the inline asm statement. Better?

>
>> +
>> +#ifdef ODP_CONFIG_DMBSTR
>> +
>> +#if defined __ARM_ARCH && __ARM_ARCH == 8
>> +/* Only ARMv8 supports DMB ISHLD */
>> +/* A load only barrier is much cheaper than full barrier */
>> +#define _odp_release_barrier(ro) \
>> +do {     \
>> +if (ro)     \
>> +__asm__ volatile("dmb ishld" ::: "memory");  \
>> +else     \
>> +__asm__ volatile("dmb ish" ::: "memory");    \
>> +} while (0)
>> +#else
>> +#define _odp_release_barrier(ro) \
>> +__atomic_thread_fence(__ATOMIC_RELEASE)
>> +#endif
>> +
>> +#define atomic_store_release(loc, val, ro)\
>> +do {\
>> +_odp_release_barrier(ro);\
>> +__atomic_store_n(loc, val, __ATOMIC_RELAXED);   \
>> +} while (0)
>> +
>> +#else
>> +
>> +#define atomic_store_release(loc, val, ro) \
>> +__atomic_store_n(loc, val, __ATOMIC_RELEASE)
>> +
>> +#endif
>> +
>> +#ifdef ODP_CONFIG_USE_WFE
>
>Lack of documentation on those abbreviations is a very, very nice thing.
Which abbreviations are we talking about here? The different build
configurations (e.g. ODP_CONFIG_USE_WFE) are described in
odp_config_internal.h. Do you think there should be some documentation
here as well?

>
>> +#define SEVL() sevl()
>> +#define WFE() wfe()
>> +#define SEV() do { __asm__ volatile("dsb ish" ::: "memory"); sev(); }
>>while (0)
>
>Soooo, if you need dsb here, why isn't it a part of sev()?
We actually only need SEV (and DSB if there were any stores that need to
be visible which is the normal case) on ARMv7a. I assume that we need to
ensure ARMv7a support at some time (although 32-bit support doesn¹t
necessarily mean ARMv7a, ARMv8 AArch32 code can still use ARMv8
mechanisms). This macro could probably be written in a better (or at least
different) way, e.g. combine DSB and SEV in one inline asm statement.

So I am a little bit torn here. Remove the ARMv7a stuff until it is really
needed or keep it (but possibly change it when it is actually used)?
Suggestions? (now I start to feel that dead-ish code should be removed).

>
>> +#if defined __ARM_ARCH && __ARM_ARCH == 8 && __ARM_64BIT_STATE == 1
>> +#define LDXR128(addr, mo) lld((addr), (mo))
>> +#endif
>
>And who will provide LDXR128 in the other case?
There are no other cases AFAIK.
Only ARMv8 supports the event mechanism and 128-bit words.
On ARMv7a, a double-word (e.g. two pointers) will only be 64-bits. If we
need to do wait-for-event on e.g. a double-word location, ll64() will
suffice.

In practice, it might not matter which size (e.g. ll8..ll128) is used in
load-exclusive in order to arm the local monitor because in practice the
mechanism is implemented using the cache coherency protocol. Doing a
load-exclusive on any word in the cache line would probably work but that
feels misleading.

On non-ARM architectures, the macros in the #else branch below will be
used and we will spin wait the generic way (relaxed loads and using
whatever NOP/pause instruction odp_cpu_pause() translates to).

>
>> +#define LDXR64(addr, mo) ll64((addr), (mo))
>> +#define LDXR32(addr, mo) ll32((addr), (mo))
>> +#define LDXR8(addr, mo) ll8((addr), (mo))
>> +/* When using WFE do not stall the pipeline using other means */
>> +#define DOZE() (void)0
>> +#else
>> +#define SEVL() (void)0
>> +#define WFE() 1
>> +#define SEV() (void)0
>> +#define LDXR128(addr, mo) __atomic_load_n((addr), (mo))
>> +#define LDXR64(addr, mo) __atomic_load_n((addr), (mo))
>> +#define LDXR32(addr, mo) __atomic_load_n((addr), (mo))
>> +#define LDXR8(addr, mo) __atomic_load_n((addr), (mo))
>> +#define DOZE() odp_cpu_pause()
>> +#endif
>> +
>> +#endif
>
>
>--
>With best wishes
>Dmitry

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Reply via email to