On terça-feira, 23 de outubro de 2012 12.14.37, Till Oliver Knoll wrote: > - Use "QAtomicInt" instead of a "volatile bool" (for the simple "Stop > thread" use case) or
The problems with volatile bool: 1) volatile wasn't designed for threading. It was designed for memory-mapped I/O. Its purpose is to make sure that there are no more and no fewer reads from the variable and writes to it than what the code does. If I write: a = 1; a = 1; I want the compiler to store 1 twice. If this is MMIO, then I might need that value of 0x01 sent twice over my I/O device. For threading, however, that's irrelevant. Storing the same value twice, especially sequentially like that, makes no sense. I won't bother explaining why because you can see it with little thought. What's more, CPU architectures don't work like that either. Writes are cached and then sent to the main RAM and other CPUs later, in bursts. Writing twice to memory, especially sequentially, will almost certainly result in RAM being written to only once. And besides, there's no way to detect that a location in memory has been overwritten with the same value. For those reasons, the semantics of volatile don't match the needs of threading. 2) volatile isn't atomic. a) for all types All CPU architectures I know of have at least one size that they can read and write in a single operation. It's the machine word, which usually corresponds to the register size. Complex and modern CPUs are often able to read and write data types of different sizes in atomic operations, but there are many examples of CPUs that can't do it. The only way to store an 8-bit value is to load the entire word where that 8-bit value is located, merge it in and then store the full word. A read-modify-write sequence is definitely not an atomic store. The C++ bool type is 1 byte in size, so it suffers from this problem. So here we have a conclusion: you'd never use volatile bool, you'd use volatile sig_atomic_t (a type that is required by POSIX to have atomic loads and stores). b) for all operations Even if you follow the POSIX recommendations and use a sig_atomic_t for your variable, most other operations aren't atomic. On most architectures, incrementing and decrementing isn't atomic. And if you're trying to do thread synchronisation, you often need higher operations like fetch-and-add, compare- and-swap or simple swap. 3) volatile does not (usually) generate memory barriers There are two types of memory barriers: compiler and processor ones. Take the following code: value = 123456; spinlock = 0; Where spinlock is a volatile int. Two levels of things might go wrong there: first, since there's no compiler barrier, the compiler might generate code that stores the 0 to the spinlock (unlocking it) before it generates the code that saves the more complex value to the other variable. I'm not even talking hypotheticals or obscure architectures. This is what the ARMv7 compiler generated for me: movw r1, #57920 mov r0, #0 movt r1, 1 str r0, [r2, #0] str r1, [r3, #0] This example was intentional because I knew that ARM can't load a large value to a register in a single instruction. Loading 123456 requires two instructions (move and move top). So I expected the compiler to schedule the saving of 0 to before the saving of the more complex value and it did. And even when it does schedule things in the correct order, the memory barrier might be missing. Taking again the example of ARMv7, saving a zero to "value" and unlocking the mutex: mov r1, #0 str r1, [r2, #0] str r1, [r3, #0] The ARMv7 architecture, unlike x86, *does* allow the processor to write to main RAM in any order. That means another core could see the the spinlock being unlocked *before* the new value is stored, even if the compiler generated the proper instructions. It's missing the memory barrier instruction. The Qt 4 QAtomicInt API does not offer a load-acquire or a store-release operation. All reads and writes are non-atomic and may be problematic -- you can work around that by using a fetch-and-add of zero for load or a fetch-and- store for store. The Qt 5 API does offer the right functions and even requires you to think about it. The reason I said "usually" is because there is one architecture whose ABI requires acquire semantics for volatile loads and release semantics for volatile stores. That's IA-64, an architecture that was introduced after multithreading became mainstream and has a specific "load acquire" instruction anyway. The IA-64 manual explaining the memory ordering and barriers is one of the references I use to study the subject. 4) compilers have bugs In this case, there's little we can do but work around them. This problem was found by the kernel developers in GCC. They had a structure like: int field1; volatile int field2; On a 64-bit architecture, to modify "field1", the compiler generated a full read-modify-write of the full 64-bit word, including the overwriting of the volatile field. In other words, the compiler was clearly violating the volatile specs, since it generated a write to a volatile that didn't exist in the source code. In this particular case, QAtomicInt wouldn't protect you. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest