On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote: > The important thing is: if we compile with __aarch64__ as a target: > - Will the compiler emit the intended instructions from the intrinsics > without extra flags?
My testing with GCC and Clang did not require any extra flags. GCC appears to enable it by default for aarch64 [0]. AFAICT this is the case for Clang as well, but that is based on the code and my testing (I couldn't find any documentation for this). > - Can a user on ARM64 ever get a runtime fault if the machine attempts > to execute NEON instructions? IIUC yes, although I'm not sure how likely it is in practice. > "I have been able to compile for > __aarch64__ without __ARM_NEON" doesn't really answer that question -- > what exactly did this entail? Compiling with something like -march=armv8-a+nosimd prevents defining __ARM_NEON. Interestingly, Clang still defines __ARM_NEON__ even when +nosimd is specified. > I took a quick look around at Debian code search, *BSD, Apple, and a > few other places, and I can't find it. Then, I looked at the > discussions around commit 5c7603c318872a42e "Add ARM64 (aarch64) > support to s_lock.h", and the proposed patch [1] only had __aarch64__ > . When it was committed, the platform was vaporware and I suppose we > included "__aarch64" as a prophylactic measure because no other reason > was given. It doesn't seem to exist anywhere, so unless someone can > demonstrate otherwise, I'm going to rip it out soon. This is what I found, too, so +1. I've attached a patch for this. [0] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h index f7cd0f6f20..b14ce832bf 100644 --- a/src/include/port/atomics.h +++ b/src/include/port/atomics.h @@ -63,8 +63,7 @@ * compiler barrier. * */ -#if defined(__arm__) || defined(__arm) || \ - defined(__aarch64__) || defined(__aarch64) +#if defined(__arm__) || defined(__arm) || defined(__aarch64__) #include "port/atomics/arch-arm.h" #elif defined(__i386__) || defined(__i386) || defined(__x86_64__) #include "port/atomics/arch-x86.h" diff --git a/src/include/port/atomics/arch-arm.h b/src/include/port/atomics/arch-arm.h index 9fe8f1b95f..7449f8404a 100644 --- a/src/include/port/atomics/arch-arm.h +++ b/src/include/port/atomics/arch-arm.h @@ -21,7 +21,7 @@ * 64 bit atomics on ARM32 are implemented using kernel fallbacks and thus * might be slow, so disable entirely. On ARM64 that problem doesn't exist. */ -#if !defined(__aarch64__) && !defined(__aarch64) +#if !defined(__aarch64__) #define PG_DISABLE_64_BIT_ATOMICS #else /* @@ -29,4 +29,4 @@ * general purpose register is atomic. */ #define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY -#endif /* __aarch64__ || __aarch64 */ +#endif /* __aarch64__ */ diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index cc83d561b2..65aa66c598 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -256,7 +256,7 @@ spin_delay(void) * We use the int-width variant of the builtin because it works on more chips * than other widths. */ -#if defined(__arm__) || defined(__arm) || defined(__aarch64__) || defined(__aarch64) +#if defined(__arm__) || defined(__arm) || defined(__aarch64__) #ifdef HAVE_GCC__SYNC_INT32_TAS #define HAS_TEST_AND_SET @@ -277,7 +277,7 @@ tas(volatile slock_t *lock) * high-core-count ARM64 processors. It seems mostly a wash for smaller gear, * and ISB doesn't exist at all on pre-v7 ARM chips. */ -#if defined(__aarch64__) || defined(__aarch64) +#if defined(__aarch64__) #define SPIN_DELAY() spin_delay() @@ -288,9 +288,9 @@ spin_delay(void) " isb; \n"); } -#endif /* __aarch64__ || __aarch64 */ +#endif /* __aarch64__ */ #endif /* HAVE_GCC__SYNC_INT32_TAS */ -#endif /* __arm__ || __arm || __aarch64__ || __aarch64 */ +#endif /* __arm__ || __arm || __aarch64__ */ /*