On Thu, Sep 08, 2016 at 04:56:36PM -0600, Ross Zwisler wrote: > On Wed, Sep 07, 2016 at 09:32:36PM -0700, Dan Williams wrote: > > My understanding is that it is looking for the VM_MIXEDMAP flag which > > is already ambiguous for determining if DAX is enabled even if this > > dynamic listing issue is fixed. XFS has arranged for DAX to be a > > per-inode capability and has an XFS-specific inode flag. We can make > > that a common inode flag, but it seems we should have a way to > > interrogate the mapping itself in the case where the inode is unknown > > or unavailable. I'm thinking extensions to mincore to have flags for > > DAX and possibly whether the page is part of a pte, pmd, or pud > > mapping. Just floating that idea before starting to look into the > > implementation, comments or other ideas welcome... > > I think this goes back to our previous discussion about support for the PMEM > programming model. Really I think what NVML needs isn't a way to tell if it > is getting a DAX mapping, but whether it is getting a DAX mapping on a > filesystem that fully supports the PMEM programming model. This of course is > defined to be a filesystem where it can do all of its flushes from userspace > safely and never call fsync/msync, and that allocations that happen in page > faults will be synchronized to media before the page fault completes. > > IIUC this is what NVML needs - a way to decide "do I use fsync/msync for > everything or can I rely fully on flushes from userspace?"
"need fsync/msync" is a dynamic state of an inode, not a static property. i.e. users can do things that change an inode behind the back of a mapping, even if they are not aware that this might happen. As such, a filesystem can invalidate an existing mapping at any time and userspace won't notice because it will simply fault in a new mapping on the next access... > For all existing implementations, I think the answer is "you need to use > fsync/msync" because we don't yet have proper support for the PMEM programming > model. Yes, that is correct. FWIW, I don't think it will ever be possible to support this .... wonderful "PMEM programming model" from any current or future kernel filesystem without a very specific set of restrictions on what can be done to a file. e.g. 1. the file has to be fully allocated and zeroed before use. Preallocation/zeroing via unwritten extents is not allowed. Sparse files are not allowed. Shared extents are not allowed. 2. set the "PMEM_IMMUTABLE" inode flag - filesystem must check the file is fully allocated before allowing it to be set, and caller must have CAP_LINUX_IMMUTABLE. 3. Inode metadata is now immutable, and file data can only be accessed and/or modified via mmap(). 4. All non-mmap methods of inode data modification will now fail with EPERM. 5. all methods of inode metadata modification will now fail with EPERM, timestamp udpdates will be ignored. 6. PMEM_IMMUTABLE flag can only be removed if the file is not currently mapped and caller has CAP_LINUX_IMMUTABLE. A flag like this /should/ make it possible to avoid fsync/msync() on a file for existing filesystems, but it also means that such files have significant management issues (hence the need for CAP_LINUX_IMMUTABLE to cover it's use). Cheers, Dave. -- Dave Chinner da...@fromorbit.com