Il 18/06/2013 18:38, Torvald Riegel ha scritto: > I don't think that this is the conclusion here. I strongly suggest to > just go with the C11/C++11 model, instead of rolling your own or trying > to replicate the Java model. That would also allow you to just point to > the C11 model and any information / tutorials about it instead of having > to document your own (see the patch), and you can make use of any > (future) tool support (e.g., race detectors).
I'm definitely not rolling my own, but I think there is some value in using the Java model. Warning: the explanation came out quite verbose... tl;dr at the end. One reason is that implementing SC for POWER is quite expensive, while this is not the case for Java volatile (which I'm still not convinced is acq-rel, because it also orders volatile stores and volatile loads). People working on QEMU are often used to manually placed barriers on Linux, and Linux barriers do not fully give you seq-cst semantics. They give you something much more similar to the Java model. The Java model gives good performance and is easier to understand than the non-seqcst modes of atomic builtins. It is pretty much impossible to understand the latter without a formal model; I see the importance of a formal model, but at the same time it is hard not to appreciate the detailed-but-practical style of the Linux documentation. Second, the Java model has very good "practical" documentation from sources I trust. Note the part about trust: I found way too many Java tutorials, newsgroup posts, and blogs that say Java is SC, when it is not. Paul's Linux docs are a source I trust, and the JSR-133 FAQ/cookbook too (especially now that Richard and Paul completed my understanding of them). There are substantially fewer practical documents for C11/C++11 that are similarly authoritative. I obviously trust Cambridge for C11/C++11, but their material is very concise or just refers to the formal model. The formal model is not what I want when my question is simply "why is lwsync good for acquire and release, but not for seqcst?", for example. And the papers sometime refer to "private communication" between the authors and other people, which can be annoying. Hans Boehm and Herb Sutter have good poster and slide material, but they do not have the same level of completeness as Paul's Linux documentation. Paul _really_ has spoiled us "pure practitioners"... Third, we must support old GCC (even as old as 4.2), so we need hand-written assembly for atomics anyway. This again goes back to documentation and the JSR-133 cookbook. It not only gives you instructions on how to implement the model (which is also true for the Cambridge web pages on C11/C++11), but is also a good base for writing our own documentation. It helped me understanding existing code using barriers, optimizing it, and putting this knowledge in words. I just couldn't find anything as useful for C11/C++11. In short, the C11/C++11 model is not what most developers are used to here, hardware is not 100% mature for it (for example ARMv8 has seqcst load/store; perhaps POWER will grow that in time), is harder to optimize, and has (as of 2013) less "practical" documentation from sources I trust. Besides, since what I'm using is weaker than SC, there's always the possibility of switching to SC in the future when enough of these issues are solved. In case you really need SC _now_, it is easy to do it using fetch-and-add (for loads) or xchg (for stores). >> I will just not use __atomic_load/__atomic_store to implement the >> primitives, and always express them in terms of memory barriers. > > Why? (If there's some QEMU-specific reason, just let me know; I know > little about QEMU..) I guess I mentioned the QEMU-specific reasons above. > I would assume that using the __atomic* builtins is just fine if they're > available. It would implement slightly different semantics based on the compiler version, so I think it's dangerous. Paolo