On 23/07/18 12:01, Andrew Dinn wrote: > Hi Florian, > > Thank you for the feedback. Comments inline below (extensive comments in > the case of the memory model question - it is complex). Having written up what I assumed was needed, of course I then went back to have another look at the Intel code and as a result I think I have
i) corrected my mistaken understanding of what it does ii) come to the conclusion that libpmem is doing the right thing No surprise ;-] Firstly, I was mistaken in thinking that the libpmem code was executing a memory barrier before the writeback operations (I was confused by talk of pre-drain, hw-drain and drain at various points in the code). The definition of the critical function pmem_persist is as follows void pmem_persist(const void *addr, size_t len) { pmem_flush(addr, len); pmem_drain(); } where pmem_flush is implemented via clflush/clflushopt/clwb or dc(CVAC) pmem_drain is implemented as an sfence or dmb(ISH) So, libpmem is emitting a memory barrier /after/ writeback and this barrier serves to to ensure that writeback is complete before any subsequent writes can proceed to completion and become visible. That deals with the data + control update issue I was concerned about. On the other hand, libpmem does /not/ execute any memory barriers prior to executing the writeback instructions. I had assumed that a pre-sync was also necessary to ensure that prior writes completed before the writeback was initiated. However, looking at the details provided in the documentation this seems to be a spurious requirement of my own invention. On x86_64 writeback operations proceeding via CLFLUSHOPT or CLWB on some specific cache line are ordered wrt to writes on that same cache line. The relevant text is: "Executions of the CLFLUSHOPT instruction are [...] ordered with respect to the following accesses to the cache line being invalidated: writes, executions of CLFLUSH, and executions of CLFLUSHOPT." "CLWB is implicitly ordered with older stores executed by the logical processor to the same address" So, there is no need to execute a memory barrier to avoid the possibility of writing back a cache line before it has been updated by an in-progress write to that same cache line. Of course, neither is there a need to be concerned about in-progress writes to cache lines that are not going to be flushed. They won't have any effect on the correctness of the writeback. So, a pre-barrier is unnecessary. The same consideration seem to apply for AArch64. "All data cache instructions, other than DC ZVA, that specify an address: - Execute in program order relative to loads or stores that access an address in Normal memory with either Inner Write Through or Inner Write Back attributes within the same cache line of minimum size, as indicated by CTR_EL0.DMinLine" So, once again a DMB is required after but not before writeback is initiated. I'm also now unsure that a DMB(ISHST)is actually adequate. The relevant text from the ARM ARM is at the end of the text I previously cited. "In all cases, where the text in this section refers to a DMB or a DSB , this means a DMB or DSB whose required access type is both loads and stores." I think that means that the pmem_drain barrier has to be a DMB(ISH) not a DMB(ISHST). I'm not really sure why but it does seem to be the implication. So, to sum up Unsafe.writebackPreSync0 should be implemented as a no-op on x86_64 a no-op on AArch64 Unsafe.writebackPosSync0 should be implemented as a no-op on x86_64 when using clflush an sfence on x86_64 when using clflushopt or clwb a dmb(ISH) on AArch64 I think it may still be worth retaining Unsafe.writebackPreSync0 (even though it currently translates to a no-op) in case new hardware requires some extra preparatory operation before writeback commences. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander