On 26/12/14 22:49, Matt Godbolt wrote:
> On Fri, Dec 26, 2014 at 4:26 PM, Andrew Haley <[email protected]> wrote:
>> On 26/12/14 20:32, Matt Godbolt wrote:
>>> Is there a reason why (in principal) the volatile increment can't be
>>> made into a single add? Clang and ICC both emit the same code for the
>>> volatile and non-volatile case.
>>
>> Yes. Volatiles use the "as if" rule, where every memory access is as
>> written. a volatile increment is defined as a load, an increment, and
>> a store.
>
> That makes sense to me from a logical point of view. My
> understanding though is the volatile keyword was mainly used when
> working with memory-mapped devices, where memory loads and stores
> could not be elided. A single-instruction load-modify-write like
> "increment [addr]" adheres to these constraints even though it is a
> single instruction. I realise my understanding could be wrong here!
> If not though, both clang and icc are taking a short-cut that may
> puts them into non-compliant state.
It's hard to be certain. The language used by the standard is very
unhelpful: it requires all accesses to be as written, but does not
define exactly what constitutes an access.
>> If you want single atomic increment, atomics are what you
>> should use. If you want an increment to be written to memory, use a
>> store barrier after the increment.
>
> Thanks. I realise I was unclear in my original email. I'm really
> looking for a way to say "do a non-lock-prefixed increment".
Why?
> Atomics are too strong and enforce a bus lock. Doing a store
> barrier after the increment also appears heavy-handed: while I wish
> for eventual consistency with memory, I do not require it. I do
> however need the compiler to not move or elide my increment.
You could just use a compiler barrier: asm volatile(""); But this is
good only for x86 and a few others. Everyone else needs a real store
barrier.
> At the moment I think the best I can do is to use an inline assembly
> version of the increment which prevents GCC from doing any
> optimisation upon it. That seems rather ugly though, and if anyone has
> any better suggestions I'd be very grateful.
Well, that's the problem: do you want a barrier or not? With no
barrier there is no guarantee that the data will ever be written to
memory. Do you only care about x86 processors?
> To give a concrete example:
>
> uint64_t num_done = 0;
> void process_work() { /* does something somewhat expensive */}
> void worker_thread(int num_work) {
> for (int i = 0; i < num_work; ++i) {
> process_work();
> num_done++; // ideally a relaxed atomic increment here
> }
> }
>
> void reporting_thread() {
> while(true) {
> sleep(60);
> printf("worker has done %d\n", num_done); // ideally a relaxed read here
> }
> }
>
>
> In the non-atomic case above, no locked instructions are used. Given
> enough information about what process_work() does, the compiler can
> realise that num_done can be added to outside of the loop (num_done +=
> num_work); which is the part I'd like to avoid. By making the int
> atomic and using relaxed, I get this guarantee but at the cost of a
> "lock addl".
Ok, I get that, but not why. If you care about a particular x86
instruction, you can use it in an inlne asm. I'm not at all sure what
you want, really.
Andrew.