On Tue, 15 Oct 2019 at 01:16, Yann Ylavic <ylavic....@gmail.com> wrote: > > On Tue, Oct 8, 2019 at 1:10 PM <i...@apache.org> wrote: > > > > Author: ivan > > Date: Tue Oct 8 11:10:32 2019 > > New Revision: 1868129 > > > > URL: http://svn.apache.org/viewvc?rev=1868129&view=rev > > Log: > > apr_atomic_read64(): Fix non-atomic read on 32-bit Windows. > [] > > --- apr/apr/trunk/atomic/win32/apr_atomic64.c (original) > > +++ apr/apr/trunk/atomic/win32/apr_atomic64.c Tue Oct 8 11:10:32 2019 > > @@ -51,7 +51,9 @@ APR_DECLARE(void) apr_atomic_set64(volat > > > > APR_DECLARE(apr_uint64_t) apr_atomic_read64(volatile apr_uint64_t *mem) > > { > > - return *mem; > > + /* 64-bit read is not atomic on 32-bit platform: use > > InterlockedCompareExchange > > + to perform atomic read. */ > > + return InterlockedCompareExchange64((volatile LONG64 *)mem, 0, 0); > > } > > Shouldn't we make this #ifdef'd on WIN32/WIN64? Despite the win32 > dirname, I think both Windows(es) share the same atomic/ code (same > applies to apr_atomic_set64() I think, where *mem = val should be fine > on WIN64). > > Then I don't know if we care enough about WIN32, but if so the > InterlockedCompareExchange64() call here is maybe a bit of a hammer > for usual apr_atomic_read64() usage. For things like lightly check for > the current 64bit shared value and do the heavy CAS work for some > special/unlikely condition only, this change is going to kill > performances. Since we provide _cas64() already, I fell that > read64/set64() should always be lightweight (when/if possible). > I considered this optimization, but decided not to implement it to avoid additional #ifdef.
Do you have any evidence that CAS would be significantly slower than just read on x64? As I see in disassembly InterlockedCompareExchange() is compiled to just three instructions on x64: [[[ xor ebx,ebx xor eax,eax lock cmpxchg qword ptr [src],rbx ]]] Another small argument is that .NET always uses InterlockedCompareExchange() for Interlocked.Read() [1] But I can add special code for x64 if you think it's worth it, along the lines of the attached patch. > I know that [2] says "Simple reads and writes to properly aligned > 64-bit variables are atomic on 64-bit Windows. Reads and writes to > 64-bit values are not guaranteed to be atomic on 32-bit Windows.", but > it looks like in practice for CPUs that Windows cares about, an > aligned 8-bytes (eg. our apr_uint64_t type) load/store can always be > atomic. The load/store are likely compiled using x87 registers or xmm > or sse or at worse lock cmpxchg8b (i.e. the above), and supposedly > some 64bit or more registers available on ARM too. > > I'm no Windows expert, but wonder if we could use inline > ReadAcquire64/WriteRelease64() functions from winnt.h/wdm.h. They seem > to pair with Interlocked* builtins (same API), and have the semantics > we want. I found them by naïvely (and unsuccessfully) searching for > InterlockedRead/InterlockedWrite() functions, and falling into the > discussion in [1]. It seems that there is no such functions in the > windows API because .. there are not needed (well, IMHO, if atomic > load64/store64 should be implemented using cas64, they look needed to > me..). > This seems to be confirmed by the "winnt.h" file I downloaded from the > net, for the CPUs handled (and cared about) there, > ReadAcquire64/WriteRelease64() are implemented using volatile access > (i.e. *(volatile LONG64*)mem [= val]), plus a barrier for arm(s?), so > relying on the native acquire/release semantics from the CPU/cache. > > So possibly r1868129 is not necessary, or if > ReadAcquire64/WriteRelease64() can be used easily could we do that > instead? (having a look at the WIN32 assembly code for > apr_atomic_read64/set64() using ReadAcquire64/WriteRelease64() could > be instructive, though) > The r1868129 is necessary: there is test in r1868129 that fails on x86, without fix. While ReadAcquire64()/WriteRelease64() look promising, 1. They doesn't provide atomic read: test added in r1868129 fails if I replace InterlockedCompareExchange64() to ReadAcquire64() in apr_atomic_read64().. 2. They are undocumented. -- Ivan
Index: atomic/win32/apr_atomic64.c =================================================================== --- atomic/win32/apr_atomic64.c (revision 1868468) +++ atomic/win32/apr_atomic64.c (working copy) @@ -46,14 +46,28 @@ APR_DECLARE(void) apr_atomic_set64(volatile apr_uint64_t *mem, apr_uint64_t val) { +#if defined(_WIN64) + /* https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access + * "Simple reads and writes to properly aligned 64-bit variables are atomic on 64-bit Windows."*/ + *mem = val; +#else + /* 64-bit write is not atomic on 32-bit platform: use InterlockedExchange64 + to perform atomic write. */ InterlockedExchange64((volatile LONG64 *)mem, val); +#endif } APR_DECLARE(apr_uint64_t) apr_atomic_read64(volatile apr_uint64_t *mem) { +#if defined(_WIN64) + /* https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access + * "Simple reads and writes to properly aligned 64-bit variables are atomic on 64-bit Windows."*/ + return *mem; +#else /* 64-bit read is not atomic on 32-bit platform: use InterlockedCompareExchange to perform atomic read. */ return InterlockedCompareExchange64((volatile LONG64 *)mem, 0, 0); +#endif } APR_DECLARE(apr_uint64_t) apr_atomic_cas64(volatile apr_uint64_t *mem, apr_uint64_t with,