Re: [PATCH v6 4/5] fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges

2017-08-24 Thread Dan Williams
On Thu, Aug 24, 2017 at 9:39 AM, Christoph Hellwig  wrote:
> On Thu, Aug 24, 2017 at 09:31:17AM -0700, Dan Williams wrote:
>> External agent is a DMA device, or a hypervisor like Xen. In the DMA
>> case perhaps we can use the fcntl lease mechanism, I'll investigate.
>> In the Xen case it actually would need to use fiemap() to discover the
>> physical addresses that back the file to setup their M2P tables.
>> Here's the discussion where we discovered that physical address
>> dependency:
>>
>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg00419.html
>
> fiemap does not work to discover physical addresses.  If they want
> to do anything involving physical address they will need a kernel
> driver.

True, it's broken with respect to multi-device filesystems and these
patches do nothing to fix that problem. Ok, I'm fine to let that use
case depend on a kernel driver and just focus on fixing the DMA case.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 4/5] fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges

2017-08-24 Thread Christoph Hellwig
On Thu, Aug 24, 2017 at 09:31:17AM -0700, Dan Williams wrote:
> External agent is a DMA device, or a hypervisor like Xen. In the DMA
> case perhaps we can use the fcntl lease mechanism, I'll investigate.
> In the Xen case it actually would need to use fiemap() to discover the
> physical addresses that back the file to setup their M2P tables.
> Here's the discussion where we discovered that physical address
> dependency:
> 
> https://lists.xen.org/archives/html/xen-devel/2017-04/msg00419.html

fiemap does not work to discover physical addresses.  If they want
to do anything involving physical address they will need a kernel
driver.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 4/5] fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges

2017-08-24 Thread Dan Williams
[ adding Xen ]

On Thu, Aug 24, 2017 at 9:11 AM, Christoph Hellwig  wrote:
> I still can't make any sense of this description.  What is an external
> agent?  Userspace obviously can't ever see a change in the extent
> map, so it can't be meant.

External agent is a DMA device, or a hypervisor like Xen. In the DMA
case perhaps we can use the fcntl lease mechanism, I'll investigate.
In the Xen case it actually would need to use fiemap() to discover the
physical addresses that back the file to setup their M2P tables.
Here's the discussion where we discovered that physical address
dependency:

https://lists.xen.org/archives/html/xen-devel/2017-04/msg00419.html

> It would help a lot if you could come up with a concrete user for this,
> including example code.

Will do.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 4/5] fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges

2017-08-24 Thread Christoph Hellwig
I still can't make any sense of this description.  What is an external
agent?  Userspace obviously can't ever see a change in the extent
map, so it can't be meant.

It would help a lot if you could come up with a concrete user for this,
including example code.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 4/5] fs, xfs: introduce MAP_DIRECT for creating block-map-atomic file ranges

2017-08-23 Thread Dan Williams
MAP_DIRECT is an mmap(2) flag with the following semantics:

  MAP_DIRECT
  When specified with MAP_SHARED a successful fault in this range
  indicates that the kernel is maintaining the block map (user linear
  address to file offset to physical address relationship) in a manner
  that no external agent can observe any inconsistent changes. In other
  words, the block map of the mapping is effectively pinned, or the kernel
  is otherwise able to exchange a new physical extent atomically with
  respect to any hardware / software agent. As implied by this definition
  a successful fault in a MAP_DIRECT range bypasses kernel indirections
  like the page-cache, and all updates are carried directly through to the
  underlying file physical-address blocks (modulo cpu cache effects).

  ETXTBSY may be returned to any third party operation on the file that
  attempts to update the block map (allocate blocks / convert unwritten
  extents / break shared extents). However, whether a filesystem returns
  EXTBSY for a certain state of the block relative to a MAP_DIRECT mapping
  is filesystem and kernel version dependent.

  Some filesystems may extend these operation restrictions outside the
  mapped range and return ETXTBSY to any file operations that might mutate
  the block map. MAP_DIRECT faults may fail with a SIGBUS if the
  filesystem needs to write the block map to satisfy the fault. For
  example, if the mapping was established over a hole in a sparse file.

  ERRORS
  EACCES A MAP_DIRECT mapping was requested and PROT_WRITE was not set,
  or the requesting process is missing CAP_LINUX_IMMUTABLE.

  EINVAL MAP_ANONYMOUS or MAP_PRIVATE was specified with MAP_DIRECT.

  EOPNOTSUPP The filesystem explicitly does not support the flag

  SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
 might require block-map updates.

Cc: Jan Kara 
Cc: Jeff Moyer 
Cc: Christoph Hellwig 
Cc: Dave Chinner 
Cc: Alexander Viro 
Cc: "Darrick J. Wong" 
Cc: Ross Zwisler 
Signed-off-by: Dan Williams 
---
 fs/xfs/xfs_file.c   |  115 ++-
 fs/xfs/xfs_inode.h  |1 
 fs/xfs/xfs_super.c  |1 
 include/linux/mman.h|6 ++
 include/uapi/asm-generic/mman.h |1 
 mm/mmap.c   |   23 
 6 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index cacc0162a41a..f82bf9416200 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -40,6 +40,7 @@
 #include "xfs_iomap.h"
 #include "xfs_reflink.h"
 
+#include 
 #include 
 #include 
 #include 
@@ -1001,6 +1002,25 @@ xfs_file_llseek(
return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
 }
 
+static const struct vm_operations_struct xfs_file_vm_direct_ops;
+
+STATIC int
+xfs_vma_checks(
+   struct vm_area_struct   *vma,
+   struct inode*inode)
+{
+   if (vma->vm_ops != _file_vm_direct_ops)
+   return 0;
+
+   if (xfs_is_reflink_inode(XFS_I(inode)))
+   return VM_FAULT_SIGBUS;
+
+   if (!IS_DAX(inode))
+   return VM_FAULT_SIGBUS;
+
+   return 0;
+}
+
 /*
  * Locking for serialisation of IO during page faults. This results in a lock
  * ordering of:
@@ -1031,6 +1051,10 @@ xfs_filemap_page_mkwrite(
file_update_time(vmf->vma->vm_file);
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
+   ret = xfs_vma_checks(vmf->vma, inode);
+   if (ret)
+   goto out_unlock;
+
if (IS_DAX(inode)) {
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, _iomap_ops);
} else {
@@ -1038,6 +1062,7 @@ xfs_filemap_page_mkwrite(
ret = block_page_mkwrite_return(ret);
}
 
+out_unlock:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
sb_end_pagefault(inode->i_sb);
 
@@ -1058,10 +1083,15 @@ xfs_filemap_fault(
return xfs_filemap_page_mkwrite(vmf);
 
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+   ret = xfs_vma_checks(vmf->vma, inode);
+   if (ret)
+   goto out_unlock;
+
if (IS_DAX(inode))
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, _iomap_ops);
else
ret = filemap_fault(vmf);
+out_unlock:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
return ret;
@@ -1094,7 +1124,9 @@ xfs_filemap_huge_fault(
}
 
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-   ret = dax_iomap_fault(vmf, pe_size, _iomap_ops);
+   ret = xfs_vma_checks(vmf->vma, inode);
+   if (ret == 0)
+   ret = dax_iomap_fault(vmf, pe_size, _iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1137,6 +1169,61 @@ xfs_filemap_pfn_mkwrite(
 
 }