On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote:
> The important thing is: if we compile with __aarch64__ as a target:
> - Will the compiler emit the intended instructions from the intrinsics
> without extra flags?

My testing with GCC and Clang did not require any extra flags.  GCC appears
to enable it by default for aarch64 [0].  AFAICT this is the case for Clang
as well, but that is based on the code and my testing (I couldn't find any
documentation for this).

> - Can a user on ARM64 ever get a runtime fault if the machine attempts
> to execute NEON instructions?

IIUC yes, although I'm not sure how likely it is in practice.

> "I have been able to compile for
> __aarch64__ without __ARM_NEON" doesn't really answer that question --
> what exactly did this entail?

Compiling with something like -march=armv8-a+nosimd prevents defining
__ARM_NEON.  Interestingly, Clang still defines __ARM_NEON__ even when
+nosimd is specified.

> I took a quick look around at Debian code search, *BSD, Apple, and a
> few other places, and I can't find it. Then, I looked at the
> discussions around commit 5c7603c318872a42e "Add ARM64 (aarch64)
> support to s_lock.h", and the proposed patch [1] only had __aarch64__
> . When it was committed, the platform was vaporware and I suppose we
> included "__aarch64" as a prophylactic measure because no other reason
> was given. It doesn't seem to exist anywhere, so unless someone can
> demonstrate otherwise, I'm going to rip it out soon.

This is what I found, too, so +1.  I've attached a patch for this.

[0] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index f7cd0f6f20..b14ce832bf 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -63,8 +63,7 @@
  * compiler barrier.
  *
  */
-#if defined(__arm__) || defined(__arm) || \
-	defined(__aarch64__) || defined(__aarch64)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__)
 #include "port/atomics/arch-arm.h"
 #elif defined(__i386__) || defined(__i386) || defined(__x86_64__)
 #include "port/atomics/arch-x86.h"
diff --git a/src/include/port/atomics/arch-arm.h b/src/include/port/atomics/arch-arm.h
index 9fe8f1b95f..7449f8404a 100644
--- a/src/include/port/atomics/arch-arm.h
+++ b/src/include/port/atomics/arch-arm.h
@@ -21,7 +21,7 @@
  * 64 bit atomics on ARM32 are implemented using kernel fallbacks and thus
  * might be slow, so disable entirely. On ARM64 that problem doesn't exist.
  */
-#if !defined(__aarch64__) && !defined(__aarch64)
+#if !defined(__aarch64__)
 #define PG_DISABLE_64_BIT_ATOMICS
 #else
 /*
@@ -29,4 +29,4 @@
  * general purpose register is atomic.
  */
 #define PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY
-#endif  /* __aarch64__ || __aarch64 */
+#endif  /* __aarch64__ */
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index cc83d561b2..65aa66c598 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -256,7 +256,7 @@ spin_delay(void)
  * We use the int-width variant of the builtin because it works on more chips
  * than other widths.
  */
-#if defined(__arm__) || defined(__arm) || defined(__aarch64__) || defined(__aarch64)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__)
 #ifdef HAVE_GCC__SYNC_INT32_TAS
 #define HAS_TEST_AND_SET
 
@@ -277,7 +277,7 @@ tas(volatile slock_t *lock)
  * high-core-count ARM64 processors.  It seems mostly a wash for smaller gear,
  * and ISB doesn't exist at all on pre-v7 ARM chips.
  */
-#if defined(__aarch64__) || defined(__aarch64)
+#if defined(__aarch64__)
 
 #define SPIN_DELAY() spin_delay()
 
@@ -288,9 +288,9 @@ spin_delay(void)
 		" isb;				\n");
 }
 
-#endif	 /* __aarch64__ || __aarch64 */
+#endif	 /* __aarch64__ */
 #endif	 /* HAVE_GCC__SYNC_INT32_TAS */
-#endif	 /* __arm__ || __arm || __aarch64__ || __aarch64 */
+#endif	 /* __arm__ || __arm || __aarch64__ */
 
 
 /*

Reply via email to