MAP_DIRECT is an mmap(2) flag with the following semantics:
MAP_DIRECT
When specified with MAP_SHARED a successful fault in this range
indicates that the kernel is maintaining the block map (user linear
address to file offset to physical address relationship) in a manner
that no external agent can observe any inconsistent changes. In other
words, the block map of the mapping is effectively pinned, or the kernel
is otherwise able to exchange a new physical extent atomically with
respect to any hardware / software agent. As implied by this definition
a successful fault in a MAP_DIRECT range bypasses kernel indirections
like the page-cache, and all updates are carried directly through to the
underlying file physical-address blocks (modulo cpu cache effects).
ETXTBSY may be returned to any third party operation on the file that
attempts to update the block map (allocate blocks / convert unwritten
extents / break shared extents). However, whether a filesystem returns
EXTBSY for a certain state of the block relative to a MAP_DIRECT mapping
is filesystem and kernel version dependent.
Some filesystems may extend these operation restrictions outside the
mapped range and return ETXTBSY to any file operations that might mutate
the block map. MAP_DIRECT faults may fail with a SIGBUS if the
filesystem needs to write the block map to satisfy the fault. For
example, if the mapping was established over a hole in a sparse file.
ERRORS
EACCES A MAP_DIRECT mapping was requested and PROT_WRITE was not set,
or the requesting process is missing CAP_LINUX_IMMUTABLE.
EINVAL MAP_ANONYMOUS or MAP_PRIVATE was specified with MAP_DIRECT.
EOPNOTSUPP The filesystem explicitly does not support the flag
SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
might require block-map updates.
Cc: Jan Kara
Cc: Jeff Moyer
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Alexander Viro
Cc: "Darrick J. Wong"
Cc: Ross Zwisler
Signed-off-by: Dan Williams
---
fs/xfs/xfs_file.c | 115 ++-
fs/xfs/xfs_inode.h |1
fs/xfs/xfs_super.c |1
include/linux/mman.h|6 ++
include/uapi/asm-generic/mman.h |1
mm/mmap.c | 23
6 files changed, 142 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index cacc0162a41a..f82bf9416200 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -40,6 +40,7 @@
#include "xfs_iomap.h"
#include "xfs_reflink.h"
+#include
#include
#include
#include
@@ -1001,6 +1002,25 @@ xfs_file_llseek(
return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
}
+static const struct vm_operations_struct xfs_file_vm_direct_ops;
+
+STATIC int
+xfs_vma_checks(
+ struct vm_area_struct *vma,
+ struct inode*inode)
+{
+ if (vma->vm_ops != _file_vm_direct_ops)
+ return 0;
+
+ if (xfs_is_reflink_inode(XFS_I(inode)))
+ return VM_FAULT_SIGBUS;
+
+ if (!IS_DAX(inode))
+ return VM_FAULT_SIGBUS;
+
+ return 0;
+}
+
/*
* Locking for serialisation of IO during page faults. This results in a lock
* ordering of:
@@ -1031,6 +1051,10 @@ xfs_filemap_page_mkwrite(
file_update_time(vmf->vma->vm_file);
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+ ret = xfs_vma_checks(vmf->vma, inode);
+ if (ret)
+ goto out_unlock;
+
if (IS_DAX(inode)) {
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, _iomap_ops);
} else {
@@ -1038,6 +1062,7 @@ xfs_filemap_page_mkwrite(
ret = block_page_mkwrite_return(ret);
}
+out_unlock:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
sb_end_pagefault(inode->i_sb);
@@ -1058,10 +1083,15 @@ xfs_filemap_fault(
return xfs_filemap_page_mkwrite(vmf);
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+ ret = xfs_vma_checks(vmf->vma, inode);
+ if (ret)
+ goto out_unlock;
+
if (IS_DAX(inode))
ret = dax_iomap_fault(vmf, PE_SIZE_PTE, _iomap_ops);
else
ret = filemap_fault(vmf);
+out_unlock:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
return ret;
@@ -1094,7 +1124,9 @@ xfs_filemap_huge_fault(
}
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- ret = dax_iomap_fault(vmf, pe_size, _iomap_ops);
+ ret = xfs_vma_checks(vmf->vma, inode);
+ if (ret == 0)
+ ret = dax_iomap_fault(vmf, pe_size, _iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
if (vmf->flags & FAULT_FLAG_WRITE)
@@ -1137,6 +1169,61 @@ xfs_filemap_pfn_mkwrite(
}