On Tue, Jul 12, 2022 at 06:25:30PM -0700, Dan Williams wrote: > > This goes back to my question from years ago: why do we ever > > do this deep flush in the Linux nvdimm stack to start with? > > The rationale is to push the data to smaller failure domain. Similar to > flushing disk write-caches.
Flushing disk caches is not about a smaller failure domain. Flushing disk caches is about making data durable _at _all_. > Otherwise, if you trust your memory power > supplies like you trust your disks then just rely on them to take care > of the data. Well, it seems like all the benchmarketing schemes around pmem seem to trust it. Why would kernel block I/O be different from device dax, MAP_SYNC? > Otherwise, by default the kernel should default to taking as much care > as possible to push writes to the smallest failure domain possible. In which case we need remve the device dax direct map and MAP_SYNC. Reducing the failure domain is not what fsync or REQ_OP_FLUSH are about, they are about making changes durable. How durable is up to your device implementation. But if you trust it only a little you should not offer that half way option to start with.
