Hi Stefan;

On Sun, Feb 5, 2023 at 9:44 PM Stefan Fritsch <s...@sfritsch.de> wrote:
>
>
> On powerpc we got a segfault:
[]
> Fixed in http://svn.apache.org/viewvc?view=revision&revision=1907442 .
[]
>
> On armel, we got a link error:
>
> /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to
> `__atomic_fetch_sub_8'
> /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to `__atomic_load_8'
> /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to
> `__atomic_exchange_8'
> /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to `__atomic_store_8'
> /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to
> `__atomic_fetch_add_8'
> /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to
> `__atomic_compare_exchange_8'
>
> Build log at
> https://buildd.debian.org/status/logs.php?pkg=apr&arch=armel&suite=sid
> Fixed in http://svn.apache.org/viewvc?view=revision&revision=1907441

Thanks!

>
>
> However, even with both fixes, there still seem to be problems. On
> powerpc, there was the same test failure as in PR 66457:
> testatomic: Line 908: Unexpected value
>
> Log at
> https://buildd.debian.org/status/fetch.php?pkg=apr&arch=powerpc&ver=1.7.2-2&stamp=1675512084&raw=0

Fixed in http://svn.apache.org/r1907541

>
> If I am not mistaken, on 32 bit architectures you always need to ensure
> manually that both halves of a 64 bit value are written atomically. So
> code like this
>
> apr_atomic_set64()
> ...
>     *mem = val;
>
> seems wrong.
>
> Also, this construct
>
> #if defined(__arm__) || defined(__powerpc__) || defined(__powerpc64__)
> #define WEAK_MEMORY_ORDERING 1
> #else
> #define WEAK_MEMORY_ORDERING 0
> #endif
>
> seems to be very fragile.

I think we have several issues here:
1. we should set WEAK_MEMORY_ORDERING for anything but __i386__,
__x86_64__, __s390__ and __s390x__
2. we should not use direct 64bit load/store on 32bit systems for
atomic operations
3. on 32bit systems the alignment for an uint64_t is usually 4 (not 8
like on a 64bit systems), which hurts (at best) or prevents atomicity
even with builtins.

For 1. and 2. the generic/mutex case was addressed in r1907541
(apr_atomic_set64() was doing the right thing already), and the
builtins case is addressed in r1907637 (hopefully).
Windows code is also concerned but I believe it's doing the right
thing already by checking _M_X64 (x86_64 CPU) or should we check for
_WIN64 too (or APR_SIZEOF_VOIDP >= 8) for WIN32 running on x64? What
about the alignment of uint64_t on WIN32, the Interlocked*64()
functions seem to require 8-byte alignment so should we fall back to
generic/mutex on WIN32 too?

For 3., r1907642 should now take care of the alignment to define (or
not) HAVE_ATOMIC_BUILTINS64 and HAVE__ATOMIC_BUILTINS64, with a
fallback to USE_ATOMICS_GENERIC for cases where it matters. This was
tested by looking at the (32bit) assembly generated for different
compilers/options/builtins here: https://godbolt.org/z/r4daf6b4v
(comment in/out some "__attribute__((aligned(8)))" and/or "&& 0" to
see what helps or not).
Notably, while both gcc and clang seem to implement the same kind of
compare-and-swap loop for _read64 and _set64 with -m32 and the legacy
__sync_* builtins, they seem to disagree on whether __atomic builtins
should defer to libatomic (thus possibly mutex) when an uint64_t is
not 8-byte aligned. gcc will use some x87/FPU instructions
(fild/fistp) regardless of the alignment whereas clang will issue
calls to libatomic whenever apr_uint64_t is not forcibly 8-byte
aligned.
Dunno who's correct here (whether x87 instructions work with 4-byte
alignment, CPU cacheline crossing and so on..), but this seems to beg
for a forced "typedef uint64_t apr_uint64_t
__attribute__((aligned(8)));" (an ABI change) or a new API for 64bit
atomics (with properly aligned apr_atomic_u64_t) on the APR side
anyway to do the right thing on 32bit systems..
For now (after r1907642) this means that compiling a 32bit APR will
USE_ATOMICS_BUILTINS64 with gcc and NEED_ATOMICS_GENERIC64 with clang.

Does this all work for you?

Finally it seems that both -march=i[56]86 (but not i[34]86) provide
the same atomic builtins too (with gcc), so maybe we could use them
instead of the forced generic/mutex implementation (currently) if it's
how distros build 32bit APR (per
https://bz.apache.org/bugzilla/show_bug.cgi?id=66457). So I think
something like this would be fine:

Index: configure.in
===================================================================
--- configure.in    (revision 1907642)
+++ configure.in    (working copy)
@@ -566,6 +566,9 @@ if test "$ap_cv_atomic_builtins" = "yes" -o "$ap_c
     if test "$ap_cv__atomic_builtins" = "yes"; then
         AC_DEFINE(HAVE__ATOMIC_BUILTINS, 1, [Define if compiler
provides 32bit __atomic builtins])
     fi
+    has_atomic_builtins=yes
+else
+    has_atomic_builtins=no
 fi

 AC_CACHE_CHECK([whether the compiler provides 64bit atomic builtins],
[ap_cv_atomic_builtins64],
@@ -829,10 +832,15 @@ AC_ARG_ENABLE(nonportable-atomics,
    force_generic_atomics=yes
  fi
 ],
-[case $host_cpu in
-   i[[456]]86) force_generic_atomics=yes ;;
-   *) force_generic_atomics=no
-      case $host in
+[force_generic_atomics=no
+case $host_cpu in
+   i[[34]]86) force_generic_atomics=yes;;
+   i[[56]]86)
+      if test "$has_atomic_builtins" != "yes"; then
+        force_generic_atomics=yes
+      fi
+      ;;
+   *) case $host in
          *solaris2.10*)
             AC_TRY_COMPILE(
                 [#include <atomic.h>],
?


Regards;
Yann.

Reply via email to