On Wed, Aug 12, 2020 at 01:41:58PM +0200, Tijl Coosemans wrote: > On Wed, 12 Aug 2020 09:44:25 +0400 Gleb Popov <arr...@freebsd.org> wrote: > > On Wed, Aug 12, 2020 at 9:21 AM Gleb Popov <arr...@freebsd.org> wrote: > >> Indeed, this looks like a culprit! When compiling using first command line > >> (the long one) I get following warnings: > >> > >> /wrkdirs/usr/ports/lang/ghc/work/ghc-8.10.1/libraries/ghc-prim/cbits/atomic.c:369:10: > >> warning: misaligned atomic operation may incur significant performance > >> penalty [-Watomic-alignment] > >> return __atomic_load_n((StgWord64 *) x, __ATOMIC_SEQ_CST); > >> ^ > >> /wrkdirs/usr/ports/lang/ghc/work/ghc-8.10.1/libraries/ghc-prim/cbits/atomic.c:417:3: > >> warning: misaligned atomic operation may incur significant performance > >> penalty [-Watomic-alignment] > >> __atomic_store_n((StgWord64 *) x, (StgWord64) val, __ATOMIC_SEQ_CST); > >> ^ > >> 2 warnings generated. > >> > >> I guess this basically means "I'm emitting a call there". So, what's the > >> correct fix in this case? > > > > I just noticed that Clang emits these warnings (and the call instruction) > > only for functions handling StgWord64 type. For the same code with > > StgWord32, like > > > > StgWord > > hs_atomicread32(StgWord x) > > { > > #if HAVE_C11_ATOMICS > > return __atomic_load_n((StgWord32 *) x, __ATOMIC_SEQ_CST); > > #else > > return __sync_add_and_fetch((StgWord32 *) x, 0); > > #endif > > } > > > > no warning is emitted as well as no call. > > > > How does clang infer alignment in these cases? What's so special about > > StgWord64? > > StgWord64 is uint64_t which is unsigned long long which is 4 byte > aligned on i386. Clang wants 8 byte alignment to use the fildll > instruction. This all is very strange.
How could code use fildll to load 8 bytes as bit-value ? FILDLL converts single and double precision fp into long-double fp, so it would change the bit-value. Also, both ISA and x86 psABI only require 4-byte alignment for the double precision fp variables. If the variable memory spans over two cache lines, then SDM states that the access can be not atomic, but I believe it cannot happen for any existing CPU. It might be slow. For some future CPUs, Intel provides control which would cause such accesses to trap. > > You could change the definition of the StgWord64 type to look like: > > typedef uint64_t StgWord64 __attribute__((aligned(8))); > > But this only works if all calls to hs_atomicread64 pass a StgWord64 > as argument and not some other 64 bit value. > > > Another solution I already mentioned in a previous message: replace > HAVE_C11_ATOMICS with 0 in hs_atomicread64 so it uses > __sync_add_and_fetch instead of __atomic_load_n. That uses the > cmpxchg8b instruction which doesn't care about alignment. It's much > slower but I guess 64 bit atomic loads are rare enough that this > doesn't matter much. _______________________________________________ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"