Hi Stefan; On Sun, Feb 5, 2023 at 9:44 PM Stefan Fritsch <s...@sfritsch.de> wrote: > > > On powerpc we got a segfault: [] > Fixed in http://svn.apache.org/viewvc?view=revision&revision=1907442 . [] > > On armel, we got a link error: > > /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to > `__atomic_fetch_sub_8' > /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to `__atomic_load_8' > /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to > `__atomic_exchange_8' > /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to `__atomic_store_8' > /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to > `__atomic_fetch_add_8' > /usr/bin/ld: ../.libs/libapr-1.so: undefined reference to > `__atomic_compare_exchange_8' > > Build log at > https://buildd.debian.org/status/logs.php?pkg=apr&arch=armel&suite=sid > Fixed in http://svn.apache.org/viewvc?view=revision&revision=1907441
Thanks! > > > However, even with both fixes, there still seem to be problems. On > powerpc, there was the same test failure as in PR 66457: > testatomic: Line 908: Unexpected value > > Log at > https://buildd.debian.org/status/fetch.php?pkg=apr&arch=powerpc&ver=1.7.2-2&stamp=1675512084&raw=0 Fixed in http://svn.apache.org/r1907541 > > If I am not mistaken, on 32 bit architectures you always need to ensure > manually that both halves of a 64 bit value are written atomically. So > code like this > > apr_atomic_set64() > ... > *mem = val; > > seems wrong. > > Also, this construct > > #if defined(__arm__) || defined(__powerpc__) || defined(__powerpc64__) > #define WEAK_MEMORY_ORDERING 1 > #else > #define WEAK_MEMORY_ORDERING 0 > #endif > > seems to be very fragile. I think we have several issues here: 1. we should set WEAK_MEMORY_ORDERING for anything but __i386__, __x86_64__, __s390__ and __s390x__ 2. we should not use direct 64bit load/store on 32bit systems for atomic operations 3. on 32bit systems the alignment for an uint64_t is usually 4 (not 8 like on a 64bit systems), which hurts (at best) or prevents atomicity even with builtins. For 1. and 2. the generic/mutex case was addressed in r1907541 (apr_atomic_set64() was doing the right thing already), and the builtins case is addressed in r1907637 (hopefully). Windows code is also concerned but I believe it's doing the right thing already by checking _M_X64 (x86_64 CPU) or should we check for _WIN64 too (or APR_SIZEOF_VOIDP >= 8) for WIN32 running on x64? What about the alignment of uint64_t on WIN32, the Interlocked*64() functions seem to require 8-byte alignment so should we fall back to generic/mutex on WIN32 too? For 3., r1907642 should now take care of the alignment to define (or not) HAVE_ATOMIC_BUILTINS64 and HAVE__ATOMIC_BUILTINS64, with a fallback to USE_ATOMICS_GENERIC for cases where it matters. This was tested by looking at the (32bit) assembly generated for different compilers/options/builtins here: https://godbolt.org/z/r4daf6b4v (comment in/out some "__attribute__((aligned(8)))" and/or "&& 0" to see what helps or not). Notably, while both gcc and clang seem to implement the same kind of compare-and-swap loop for _read64 and _set64 with -m32 and the legacy __sync_* builtins, they seem to disagree on whether __atomic builtins should defer to libatomic (thus possibly mutex) when an uint64_t is not 8-byte aligned. gcc will use some x87/FPU instructions (fild/fistp) regardless of the alignment whereas clang will issue calls to libatomic whenever apr_uint64_t is not forcibly 8-byte aligned. Dunno who's correct here (whether x87 instructions work with 4-byte alignment, CPU cacheline crossing and so on..), but this seems to beg for a forced "typedef uint64_t apr_uint64_t __attribute__((aligned(8)));" (an ABI change) or a new API for 64bit atomics (with properly aligned apr_atomic_u64_t) on the APR side anyway to do the right thing on 32bit systems.. For now (after r1907642) this means that compiling a 32bit APR will USE_ATOMICS_BUILTINS64 with gcc and NEED_ATOMICS_GENERIC64 with clang. Does this all work for you? Finally it seems that both -march=i[56]86 (but not i[34]86) provide the same atomic builtins too (with gcc), so maybe we could use them instead of the forced generic/mutex implementation (currently) if it's how distros build 32bit APR (per https://bz.apache.org/bugzilla/show_bug.cgi?id=66457). So I think something like this would be fine: Index: configure.in =================================================================== --- configure.in (revision 1907642) +++ configure.in (working copy) @@ -566,6 +566,9 @@ if test "$ap_cv_atomic_builtins" = "yes" -o "$ap_c if test "$ap_cv__atomic_builtins" = "yes"; then AC_DEFINE(HAVE__ATOMIC_BUILTINS, 1, [Define if compiler provides 32bit __atomic builtins]) fi + has_atomic_builtins=yes +else + has_atomic_builtins=no fi AC_CACHE_CHECK([whether the compiler provides 64bit atomic builtins], [ap_cv_atomic_builtins64], @@ -829,10 +832,15 @@ AC_ARG_ENABLE(nonportable-atomics, force_generic_atomics=yes fi ], -[case $host_cpu in - i[[456]]86) force_generic_atomics=yes ;; - *) force_generic_atomics=no - case $host in +[force_generic_atomics=no +case $host_cpu in + i[[34]]86) force_generic_atomics=yes;; + i[[56]]86) + if test "$has_atomic_builtins" != "yes"; then + force_generic_atomics=yes + fi + ;; + *) case $host in *solaris2.10*) AC_TRY_COMPILE( [#include <atomic.h>], ? Regards; Yann.