Re: [Interest] Heavily Commented Example: Simple Single Frontend with Two Backends

Thiago Macieira Tue, 23 Oct 2012 11:25:06 -0700

On terça-feira, 23 de outubro de 2012 12.14.37, Till Oliver Knoll wrote:
> - Use "QAtomicInt" instead of a "volatile bool" (for the simple "Stop
> thread" use case) or


The problems with volatile bool:

1) volatile wasn't designed for threading.

It was designed for memory-mapped I/O. Its purpose is to make sure that there
are no more and no fewer reads from the variable and writes to it than what
the code does. If I write:
        a = 1;
        a = 1;
I want the compiler to store 1 twice. If this is MMIO, then I might need that
value of 0x01 sent twice over my I/O device.

For threading, however, that's irrelevant. Storing the same value twice,
especially sequentially like that, makes no sense. I won't bother explaining
why because you can see it with little thought.

What's more, CPU architectures don't work like that either. Writes are cached
and then sent to the main RAM and other CPUs later, in bursts. Writing twice
to memory, especially sequentially, will almost certainly result in RAM being
written to only once. And besides, there's no way to detect that a location in
memory has been overwritten with the same value.

For those reasons, the semantics of volatile don't match the needs of
threading.

2) volatile isn't atomic.
 a) for all types

All CPU architectures I know of have at least one size that they can read and
write in a single operation. It's the machine word, which usually corresponds
to the register size.

Complex and modern CPUs are often able to read and write data types of
different sizes in atomic operations, but there are many examples of CPUs that
can't do it. The only way to store an 8-bit value is to load the entire word
where that 8-bit value is located, merge it in and then store the full word. A
read-modify-write sequence is definitely not an atomic store.

The C++ bool type is 1 byte in size, so it suffers from this problem. So here
we have a conclusion: you'd never use volatile bool, you'd use volatile
sig_atomic_t (a type that is required by POSIX to have atomic loads and
stores).

 b) for all operations

Even if you follow the POSIX recommendations and use a sig_atomic_t for your
variable, most other operations aren't atomic. On most architectures,
incrementing and decrementing isn't atomic. And if you're trying to do thread
synchronisation, you often need higher operations like fetch-and-add, compare-
and-swap or simple swap.

3) volatile does not (usually) generate memory barriers

There are two types of memory barriers: compiler and processor ones. Take the
following code:

        value = 123456;
        spinlock = 0;

Where spinlock is a volatile int. Two levels of things might go wrong there:
first, since there's no compiler barrier, the compiler might generate code that
stores the 0 to the spinlock (unlocking it) before it generates the code that
saves the more complex value to the other variable.

I'm not even talking hypotheticals or obscure architectures. This is what the
ARMv7 compiler generated for me:

        movw    r1, #57920
        mov     r0, #0
        movt    r1, 1
        str     r0, [r2, #0]
        str     r1, [r3, #0]

This example was intentional because I knew that ARM can't load a large value
to a register in a single instruction. Loading 123456 requires two
instructions (move and move top). So I expected the compiler to schedule the
saving of 0 to before the saving of the more complex value and it did.

And even when it does schedule things in the correct order, the memory barrier
might be missing. Taking again the example of ARMv7, saving a zero to "value"
and unlocking the mutex:

        mov     r1, #0
        str     r1, [r2, #0]
        str     r1, [r3, #0]

The ARMv7 architecture, unlike x86, *does* allow the processor to write to
main RAM in any order. That means another core could see the the spinlock
being unlocked *before* the new value is stored, even if the compiler
generated the proper instructions. It's missing the memory barrier
instruction.

The Qt 4 QAtomicInt API does not offer a load-acquire or a store-release
operation. All reads and writes are non-atomic and may be problematic -- you
can work around that by using a fetch-and-add of zero for load or a fetch-and-
store for store.

The Qt 5 API does offer the right functions and even requires you to think
about it.

The reason I said "usually" is because there is one architecture whose ABI
requires acquire semantics for volatile loads and release semantics for
volatile stores. That's IA-64, an architecture that was introduced after
multithreading became mainstream and has a specific "load acquire" instruction
anyway. The IA-64 manual explaining the memory ordering and barriers is one of
the references I use to study the subject.

4) compilers have bugs

In this case, there's little we can do but work around them. This problem was
found by the kernel developers in GCC. They had a structure like:

        int field1;
        volatile int field2;

On a 64-bit architecture, to modify "field1", the compiler generated a full
read-modify-write of the full 64-bit word, including the overwriting of the
volatile field. In other words, the compiler was clearly violating the volatile
specs, since it generated a write to a volatile that didn't exist in the
source code.

In this particular case, QAtomicInt wouldn't protect you.

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Interest mailing list
Interest@qt-project.org
http://lists.qt-project.org/mailman/listinfo/interest

Re: [Interest] Heavily Commented Example: Simple Single Frontend with Two Backends

Reply via email to