devirtualize kernel access to DAX

2021-12-08 Thread Christoph Hellwig
Hi Dan,

this series cleans up a few loose end ends and then removes the
copy_from_iter and copy_to_iter dax_operations methods in favor of
straight calls.

Diffstat:
 drivers/dax/bus.c |3 +
 drivers/dax/super.c   |   40 ++---
 drivers/md/dm-linear.c|   20 --
 drivers/md/dm-log-writes.c|   80 --
 drivers/md/dm-stripe.c|   20 --
 drivers/md/dm.c   |   52 ---
 drivers/nvdimm/pmem.c |   27 +-
 drivers/s390/block/dcssblk.c  |   18 +
 fs/dax.c  |5 --
 fs/fuse/virtio_fs.c   |   20 +-
 include/linux/dax.h   |   28 +++---
 include/linux/device-mapper.h |4 --
 include/linux/uio.h   |   20 --
 13 files changed, 44 insertions(+), 293 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/5] dax: simplify dax_synchronous and set_dax_synchronous

2021-12-08 Thread Christoph Hellwig
Remove the pointless wrappers.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c |  8 
 include/linux/dax.h | 12 ++--
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e7152a6c4cc40..e18155f43a635 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -208,17 +208,17 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
-bool __dax_synchronous(struct dax_device *dax_dev)
+bool dax_synchronous(struct dax_device *dax_dev)
 {
return test_bit(DAXDEV_SYNC, _dev->flags);
 }
-EXPORT_SYMBOL_GPL(__dax_synchronous);
+EXPORT_SYMBOL_GPL(dax_synchronous);
 
-void __set_dax_synchronous(struct dax_device *dax_dev)
+void set_dax_synchronous(struct dax_device *dax_dev)
 {
set_bit(DAXDEV_SYNC, _dev->flags);
 }
-EXPORT_SYMBOL_GPL(__set_dax_synchronous);
+EXPORT_SYMBOL_GPL(set_dax_synchronous);
 
 bool dax_alive(struct dax_device *dax_dev)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 87ae4c9b1d65b..3bd1fdb5d5f4b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -48,16 +48,8 @@ void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
-bool __dax_synchronous(struct dax_device *dax_dev);
-static inline bool dax_synchronous(struct dax_device *dax_dev)
-{
-   return  __dax_synchronous(dax_dev);
-}
-void __set_dax_synchronous(struct dax_device *dax_dev);
-static inline void set_dax_synchronous(struct dax_device *dax_dev)
-{
-   __set_dax_synchronous(dax_dev);
-}
+bool dax_synchronous(struct dax_device *dax_dev);
+void set_dax_synchronous(struct dax_device *dax_dev);
 /*
  * Check if given mapping is supported by the file / underlying device.
  */
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 28/29] iomap: build the block based code conditionally

2021-11-29 Thread Christoph Hellwig
Only build the block based iomap code if CONFIG_BLOCK is set.  Currently
that is always the case, but it will change soon.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/Kconfig| 4 ++--
 fs/iomap/Makefile | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index a6313a969bc5f..6d608330a096e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -15,11 +15,11 @@ config VALIDATE_FS_PARSER
  Enable this to perform validation of the parameter description for a
  filesystem when it is registered.
 
-if BLOCK
-
 config FS_IOMAP
bool
 
+if BLOCK
+
 source "fs/ext2/Kconfig"
 source "fs/ext4/Kconfig"
 source "fs/jbd2/Kconfig"
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index 4143a3ff89dbc..fc070184b7faa 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,9 +9,9 @@ ccflags-y += -I $(srctree)/$(src)   # needed for 
trace events
 obj-$(CONFIG_FS_IOMAP) += iomap.o
 
 iomap-y+= trace.o \
-  buffered-io.o \
+  iter.o
+iomap-$(CONFIG_BLOCK)  += buffered-io.o \
   direct-io.o \
   fiemap.o \
-  iter.o \
   seek.o
 iomap-$(CONFIG_SWAP)   += swapfile.o
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 29/29] fsdax: don't require CONFIG_BLOCK

2021-11-29 Thread Christoph Hellwig
The file system DAX code now does not require the block code.  So allow
building a kernel with fuse DAX but not block layer.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 6d608330a096e..7a2b11c0b8036 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -42,6 +42,8 @@ source "fs/nilfs2/Kconfig"
 source "fs/f2fs/Kconfig"
 source "fs/zonefs/Kconfig"
 
+endif # BLOCK
+
 config FS_DAX
bool "File system based Direct Access (DAX) support"
depends on MMU
@@ -89,8 +91,6 @@ config FS_DAX_PMD
 config FS_DAX_LIMITED
bool
 
-endif # BLOCK
-
 # Posix ACL utility routines
 #
 # Note: Posix ACLs can be implemented without these helpers.  Never use
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 25/29] dax: return the partition offset from fs_dax_get_by_bdev

2021-11-29 Thread Christoph Hellwig
Prepare for the removal of the block_device from the DAX I/O path by
returning the partition offset from fs_dax_get_by_bdev so that the file
systems have it at hand for use during I/O.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
---
 drivers/dax/super.c | 9 ++---
 drivers/md/dm.c | 4 ++--
 fs/erofs/internal.h | 2 ++
 fs/erofs/super.c| 4 ++--
 fs/ext2/ext2.h  | 1 +
 fs/ext2/super.c | 2 +-
 fs/ext4/ext4.h  | 1 +
 fs/ext4/super.c | 2 +-
 fs/xfs/xfs_buf.c| 2 +-
 fs/xfs/xfs_buf.h| 1 +
 include/linux/dax.h | 6 --
 11 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 45d931aefd063..e7152a6c4cc40 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -69,17 +69,20 @@ EXPORT_SYMBOL_GPL(dax_remove_host);
 /**
  * fs_dax_get_by_bdev() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
+ * @start_off: returns the byte offset into the dax_device that @bdev starts
  */
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 
*start_off)
 {
struct dax_device *dax_dev;
+   u64 part_size;
int id;
 
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
-   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   *start_off = get_start_sect(bdev) * SECTOR_SIZE;
+   part_size = bdev_nr_sectors(bdev) * SECTOR_SIZE;
+   if (*start_off % PAGE_SIZE || part_size % PAGE_SIZE) {
pr_info("%pg: error: unaligned partition for dax\n", bdev);
return NULL;
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4eba27e75c230..4e997c02bb0a0 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -637,7 +637,7 @@ static int open_table_device(struct table_device *td, dev_t 
dev,
 struct mapped_device *md)
 {
struct block_device *bdev;
-
+   u64 part_off;
int r;
 
BUG_ON(td->dm_dev.bdev);
@@ -653,7 +653,7 @@ static int open_table_device(struct table_device *td, dev_t 
dev,
}
 
td->dm_dev.bdev = bdev;
-   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev);
+   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, _off);
return 0;
 }
 
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 3265688af7f9f..c1e65346e9f15 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -51,6 +51,7 @@ struct erofs_device_info {
char *path;
struct block_device *bdev;
struct dax_device *dax_dev;
+   u64 dax_part_off;
 
u32 blocks;
u32 mapped_blkaddr;
@@ -109,6 +110,7 @@ struct erofs_sb_info {
 #endif /* CONFIG_EROFS_FS_ZIP */
struct erofs_dev_context *devs;
struct dax_device *dax_dev;
+   u64 dax_part_off;
u64 total_blocks;
u32 primarydevice_blocks;
 
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0aed886473c8d..71efce16024d9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -312,7 +312,7 @@ static int erofs_init_devices(struct super_block *sb,
goto err_out;
}
dif->bdev = bdev;
-   dif->dax_dev = fs_dax_get_by_bdev(bdev);
+   dif->dax_dev = fs_dax_get_by_bdev(bdev, >dax_part_off);
dif->blocks = le32_to_cpu(dis->blocks);
dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
sbi->total_blocks += dif->blocks;
@@ -644,7 +644,7 @@ static int erofs_fc_fill_super(struct super_block *sb, 
struct fs_context *fc)
 
sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
-   sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
+   sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, >dax_part_off);
sbi->devs = ctx->devs;
ctx->devs = NULL;
 
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 3be9dd6412b78..d4f306aa5aceb 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -118,6 +118,7 @@ struct ext2_sb_info {
spinlock_t s_lock;
struct mb_cache *s_ea_block_cache;
struct dax_device *s_daxdev;
+   u64 s_dax_part_off;
 };
 
 static inline spinlock_t *
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7e23482862e69..94f1fbd7d3ac2 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -831,7 +831,7 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
}
sb->s_fs_info = sbi;
sbi->s_sb_block = sb_block;
-   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev, >s_dax_part_off);
 
spin_lock_init(>s_lock);
ret = -EINVAL;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 404d

[PATCH 27/29] dax: fix up some of the block device related ifdefs

2021-11-29 Thread Christoph Hellwig
The DAX device <-> block device association is only enabled if
CONFIG_BLOCK is enabled.  Update dax.h to account for that and use
the right conditions for the fs_put_dax stub as well.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 include/linux/dax.h | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index f6f353382cc90..87ae4c9b1d65b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -108,24 +108,15 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
 #endif
 
 struct writeback_control;
-#if IS_ENABLED(CONFIG_FS_DAX)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
 void dax_remove_host(struct gendisk *disk);
-
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
+   u64 *start_off);
 static inline void fs_put_dax(struct dax_device *dax_dev)
 {
put_dax(dax_dev);
 }
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
-   u64 *start_off);
-int dax_writeback_mapping_range(struct address_space *mapping,
-   struct dax_device *dax_dev, struct writeback_control *wbc);
-
-struct page *dax_layout_busy_page(struct address_space *mapping);
-struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t 
start, loff_t end);
-dax_entry_t dax_lock_page(struct page *page);
-void dax_unlock_page(struct page *page, dax_entry_t cookie);
 #else
 static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk 
*disk)
 {
@@ -134,17 +125,25 @@ static inline int dax_add_host(struct dax_device 
*dax_dev, struct gendisk *disk)
 static inline void dax_remove_host(struct gendisk *disk)
 {
 }
-
-static inline void fs_put_dax(struct dax_device *dax_dev)
-{
-}
-
 static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
u64 *start_off)
 {
return NULL;
 }
+static inline void fs_put_dax(struct dax_device *dax_dev)
+{
+}
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
+#if IS_ENABLED(CONFIG_FS_DAX)
+int dax_writeback_mapping_range(struct address_space *mapping,
+   struct dax_device *dax_dev, struct writeback_control *wbc);
+
+struct page *dax_layout_busy_page(struct address_space *mapping);
+struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t 
start, loff_t end);
+dax_entry_t dax_lock_page(struct page *page);
+void dax_unlock_page(struct page *page, dax_entry_t cookie);
+#else
 static inline struct page *dax_layout_busy_page(struct address_space *mapping)
 {
return NULL;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 26/29] fsdax: shift partition offset handling into the file systems

2021-11-29 Thread Christoph Hellwig
Remove the last user of ->bdev in dax.c by requiring the file system to
pass in an address that already includes the DAX offset.  As part of the
only set ->bdev or ->daxdev when actually required in the ->iomap_begin
methods.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Gao Xiang  [erofs]
---
 fs/dax.c|  6 +-
 fs/erofs/data.c | 11 +--
 fs/erofs/internal.h |  1 +
 fs/ext2/inode.c |  8 ++--
 fs/ext4/inode.c | 16 +++-
 fs/xfs/xfs_iomap.c  | 10 --
 6 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 148e8b0967f35..e0eecd8e3a8f8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -711,11 +711,7 @@ int dax_invalidate_mapping_entry_sync(struct address_space 
*mapping,
 
 static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos)
 {
-   phys_addr_t paddr = iomap->addr + (pos & PAGE_MASK) - iomap->offset;
-
-   if (iomap->bdev)
-   paddr += (get_start_sect(iomap->bdev) << SECTOR_SHIFT);
-   return PHYS_PFN(paddr);
+   return PHYS_PFN(iomap->addr + (pos & PAGE_MASK) - iomap->offset);
 }
 
 static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 0e35ef3f9f3d7..9b1bb177ce303 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -159,6 +159,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
/* primary device by default */
map->m_bdev = sb->s_bdev;
map->m_daxdev = EROFS_SB(sb)->dax_dev;
+   map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
 
if (map->m_deviceid) {
down_read(>rwsem);
@@ -169,6 +170,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
}
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
+   map->m_dax_part_off = dif->dax_part_off;
up_read(>rwsem);
} else if (devs->extra_devices) {
down_read(>rwsem);
@@ -185,6 +187,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
map->m_pa -= startoff;
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
+   map->m_dax_part_off = dif->dax_part_off;
break;
}
}
@@ -215,9 +218,13 @@ static int erofs_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
if (ret)
return ret;
 
-   iomap->bdev = mdev.m_bdev;
-   iomap->dax_dev = mdev.m_daxdev;
iomap->offset = map.m_la;
+   if (flags & IOMAP_DAX) {
+   iomap->dax_dev = mdev.m_daxdev;
+   iomap->offset += mdev.m_dax_part_off;
+   } else {
+   iomap->bdev = mdev.m_bdev;
+   }
iomap->length = map.m_llen;
iomap->flags = 0;
iomap->private = NULL;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index c1e65346e9f15..5c2a83876220c 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -438,6 +438,7 @@ static inline int z_erofs_map_blocks_iter(struct inode 
*inode,
 struct erofs_map_dev {
struct block_device *m_bdev;
struct dax_device *m_daxdev;
+   u64 m_dax_part_off;
 
erofs_off_t m_pa;
unsigned int m_deviceid;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 01d69618277de..602578b72d8c5 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -817,9 +817,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
return ret;
 
iomap->flags = 0;
-   iomap->bdev = inode->i_sb->s_bdev;
iomap->offset = (u64)first_block << blkbits;
-   iomap->dax_dev = sbi->s_daxdev;
+   if (flags & IOMAP_DAX)
+   iomap->dax_dev = sbi->s_daxdev;
+   else
+   iomap->bdev = inode->i_sb->s_bdev;
 
if (ret == 0) {
iomap->type = IOMAP_HOLE;
@@ -828,6 +830,8 @@ static int ext2_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
} else {
iomap->type = IOMAP_MAPPED;
iomap->addr = (u64)bno << blkbits;
+   if (flags & IOMAP_DAX)
+   iomap->addr += sbi->s_dax_part_off;
iomap->length = (u64)ret << blkbits;
iomap->flags |= IOMAP_F_MERGED;
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 89c4a174bd393..ccafcbc146d3e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3272,7 +3272,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
 
 static void ext4_set_iomap(struct inode *inode, struct

[PATCH 24/29] iomap: add a IOMAP_DAX flag

2021-11-29 Thread Christoph Hellwig
Add a flag so that the file system can easily detect DAX operations
based just on the iomap operation requested instead of looking at
inode state using IS_DAX.  This will be needed to apply the to be
added partition offset only for operations that actually use DAX,
but not things like fiemap that are based on the block device.
In the long run it should also allow turning the bdev, dax_dev
and inline_data into a union.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
---
 fs/dax.c  | 7 ---
 fs/ext4/inode.c   | 4 ++--
 fs/xfs/xfs_iomap.c| 7 ---
 fs/xfs/xfs_iomap.h| 3 ++-
 fs/xfs/xfs_pnfs.c | 2 +-
 include/linux/iomap.h | 5 +
 6 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 43d58b4219fd0..148e8b0967f35 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1180,7 +1180,7 @@ int dax_zero_range(struct inode *inode, loff_t pos, 
loff_t len, bool *did_zero,
.inode  = inode,
.pos= pos,
.len= len,
-   .flags  = IOMAP_ZERO,
+   .flags  = IOMAP_DAX | IOMAP_ZERO,
};
int ret;
 
@@ -1308,6 +1308,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
.inode  = iocb->ki_filp->f_mapping->host,
.pos= iocb->ki_pos,
.len= iov_iter_count(iter),
+   .flags  = IOMAP_DAX,
};
loff_t done = 0;
int ret;
@@ -1461,7 +1462,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
.inode  = mapping->host,
.pos= (loff_t)vmf->pgoff << PAGE_SHIFT,
.len= PAGE_SIZE,
-   .flags  = IOMAP_FAULT,
+   .flags  = IOMAP_DAX | IOMAP_FAULT,
};
vm_fault_t ret = 0;
void *entry;
@@ -1570,7 +1571,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
struct iomap_iter iter = {
.inode  = mapping->host,
.len= PMD_SIZE,
-   .flags  = IOMAP_FAULT,
+   .flags  = IOMAP_DAX | IOMAP_FAULT,
};
vm_fault_t ret = VM_FAULT_FALLBACK;
pgoff_t max_pgoff;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d316a2009489b..89c4a174bd393 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3349,8 +3349,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct 
ext4_map_blocks *map,
 * DAX and direct I/O are the only two operations that are currently
 * supported with IOMAP_WRITE.
 */
-   WARN_ON(!IS_DAX(inode) && !(flags & IOMAP_DIRECT));
-   if (IS_DAX(inode))
+   WARN_ON(!(flags & (IOMAP_DAX | IOMAP_DIRECT)));
+   if (flags & IOMAP_DAX)
m_flags = EXT4_GET_BLOCKS_CREATE_ZERO;
/*
 * We use i_size instead of i_disksize here because delalloc writeback
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d6beb1502f8bc..0ed3e7674353b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -188,6 +188,7 @@ xfs_iomap_write_direct(
struct xfs_inode*ip,
xfs_fileoff_t   offset_fsb,
xfs_fileoff_t   count_fsb,
+   unsigned intflags,
struct xfs_bmbt_irec*imap)
 {
struct xfs_mount*mp = ip->i_mount;
@@ -229,7 +230,7 @@ xfs_iomap_write_direct(
 * the reserve block pool for bmbt block allocation if there is no space
 * left but we need to do unwritten extent conversion.
 */
-   if (IS_DAX(VFS_I(ip))) {
+   if (flags & IOMAP_DAX) {
bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
if (imap->br_state == XFS_EXT_UNWRITTEN) {
force = true;
@@ -620,7 +621,7 @@ imap_needs_alloc(
imap->br_startblock == DELAYSTARTBLOCK)
return true;
/* we convert unwritten extents before copying the data for DAX */
-   if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
+   if ((flags & IOMAP_DAX) && imap->br_state == XFS_EXT_UNWRITTEN)
return true;
return false;
 }
@@ -826,7 +827,7 @@ xfs_direct_write_iomap_begin(
xfs_iunlock(ip, lockmode);
 
error = xfs_iomap_write_direct(ip, offset_fsb, end_fsb - offset_fsb,
-   );
+   flags, );
if (error)
return error;
 
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 657cc02290f22..e88dc162c785e 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -12,7 +12,8 @@ struct xfs_inode;
 struct xfs_bmbt_irec;
 
 int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb,
-   xfs_fileoff_t cou

[PATCH 23/29] xfs: pass the mapping flags to xfs_bmbt_to_iomap

2021-11-29 Thread Christoph Hellwig
To prepare for looking at the IOMAP_DAX flag in xfs_bmbt_to_iomap pass in
the input mapping flags to xfs_bmbt_to_iomap.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/libxfs/xfs_bmap.c |  4 ++--
 fs/xfs/xfs_aops.c|  2 +-
 fs/xfs/xfs_iomap.c   | 35 ---
 fs/xfs/xfs_iomap.h   |  5 +++--
 fs/xfs/xfs_pnfs.c|  2 +-
 5 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4dccd4d90622d..74198dd82b035 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4551,7 +4551,7 @@ xfs_bmapi_convert_delalloc(
 * the extent.  Just return the real extent at this offset.
 */
if (!isnullstartblock(bma.got.br_startblock)) {
-   xfs_bmbt_to_iomap(ip, iomap, , flags);
+   xfs_bmbt_to_iomap(ip, iomap, , 0, flags);
*seq = READ_ONCE(ifp->if_seq);
goto out_trans_cancel;
}
@@ -4598,7 +4598,7 @@ xfs_bmapi_convert_delalloc(
XFS_STATS_INC(mp, xs_xstrat_quick);
 
ASSERT(!isnullstartblock(bma.got.br_startblock));
-   xfs_bmbt_to_iomap(ip, iomap, , flags);
+   xfs_bmbt_to_iomap(ip, iomap, , 0, flags);
*seq = READ_ONCE(ifp->if_seq);
 
if (whichfork == XFS_COW_FORK)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c8c15c3c31471..6ac3449a68ba0 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -359,7 +359,7 @@ xfs_map_blocks(
isnullstartblock(imap.br_startblock))
goto allocate_blocks;
 
-   xfs_bmbt_to_iomap(ip, >iomap, , 0);
+   xfs_bmbt_to_iomap(ip, >iomap, , 0, 0);
trace_xfs_map_blocks_found(ip, offset, count, whichfork, );
return 0;
 allocate_blocks:
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 9b7f92c6aef33..d6beb1502f8bc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -53,7 +53,8 @@ xfs_bmbt_to_iomap(
struct xfs_inode*ip,
struct iomap*iomap,
struct xfs_bmbt_irec*imap,
-   u16 flags)
+   unsigned intmapping_flags,
+   u16 iomap_flags)
 {
struct xfs_mount*mp = ip->i_mount;
struct xfs_buftarg  *target = xfs_inode_buftarg(ip);
@@ -79,7 +80,7 @@ xfs_bmbt_to_iomap(
iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
iomap->bdev = target->bt_bdev;
iomap->dax_dev = target->bt_daxdev;
-   iomap->flags = flags;
+   iomap->flags = iomap_flags;
 
if (xfs_ipincount(ip) &&
(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP))
@@ -799,7 +800,7 @@ xfs_direct_write_iomap_begin(
 
xfs_iunlock(ip, lockmode);
trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, );
-   return xfs_bmbt_to_iomap(ip, iomap, , iomap_flags);
+   return xfs_bmbt_to_iomap(ip, iomap, , flags, iomap_flags);
 
 allocate_blocks:
error = -EAGAIN;
@@ -830,18 +831,19 @@ xfs_direct_write_iomap_begin(
return error;
 
trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, );
-   return xfs_bmbt_to_iomap(ip, iomap, , iomap_flags | IOMAP_F_NEW);
+   return xfs_bmbt_to_iomap(ip, iomap, , flags,
+iomap_flags | IOMAP_F_NEW);
 
 out_found_cow:
xfs_iunlock(ip, lockmode);
length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, );
if (imap.br_startblock != HOLESTARTBLOCK) {
-   error = xfs_bmbt_to_iomap(ip, srcmap, , 0);
+   error = xfs_bmbt_to_iomap(ip, srcmap, , flags, 0);
if (error)
return error;
}
-   return xfs_bmbt_to_iomap(ip, iomap, , IOMAP_F_SHARED);
+   return xfs_bmbt_to_iomap(ip, iomap, , flags, IOMAP_F_SHARED);
 
 out_unlock:
if (lockmode)
@@ -1051,23 +1053,24 @@ xfs_buffered_write_iomap_begin(
 */
xfs_iunlock(ip, XFS_ILOCK_EXCL);
trace_xfs_iomap_alloc(ip, offset, count, allocfork, );
-   return xfs_bmbt_to_iomap(ip, iomap, , IOMAP_F_NEW);
+   return xfs_bmbt_to_iomap(ip, iomap, , flags, IOMAP_F_NEW);
 
 found_imap:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
-   return xfs_bmbt_to_iomap(ip, iomap, , 0);
+   return xfs_bmbt_to_iomap(ip, iomap, , flags, 0);
 
 found_cow:
xfs_iunlock(ip, XFS_ILOCK_EXCL);
if (imap.br_startoff <= offset_fsb) {
-   error = xfs_bmbt_to_iomap(ip, srcmap, , 0);
+   error = xfs_bmbt_to_iomap(ip, srcmap, , flags, 0);
if (error)
return error;
-   return xfs_bmbt_to_iomap(ip, iomap, , IOMAP_F_SHARED);
+   return xfs_bmbt_to_iomap(ip, iomap, , flags,
+IOMAP_F_SHARED);
}
 
   

[PATCH 20/29] ext4: cleanup the dax handling in ext4_fill_super

2021-11-29 Thread Christoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove
the need for the dax_dev local variable.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/ext4/super.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index fd3d68f10ee55..8d7e3449c6472 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3878,7 +3878,6 @@ static void ext4_setup_csum_trigger(struct super_block 
*sb,
 
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 {
-   struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
char *orig_data = kstrdup(data, GFP_KERNEL);
struct buffer_head *bh, **group_desc;
struct ext4_super_block *es = NULL;
@@ -3909,12 +3908,12 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
if ((data && !orig_data) || !sbi)
goto out_free_base;
 
-   sbi->s_daxdev = dax_dev;
sbi->s_blockgroup_lock =
kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
if (!sbi->s_blockgroup_lock)
goto out_free_base;
 
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
sb->s_fs_info = sbi;
sbi->s_sb = sb;
sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS;
@@ -4299,7 +4298,7 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
goto failed_mount;
}
 
-   if (dax_dev) {
+   if (sbi->s_daxdev) {
if (blocksize == PAGE_SIZE)
set_bit(EXT4_FLAGS_BDEV_IS_DAX, >s_ext4_flags);
else
@@ -5095,10 +5094,10 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
 out_fail:
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);
+   fs_put_dax(sbi->s_daxdev);
 out_free_base:
kfree(sbi);
kfree(orig_data);
-   fs_put_dax(dax_dev);
return err ? err : ret;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 22/29] xfs: use xfs_direct_write_iomap_ops for DAX zeroing

2021-11-29 Thread Christoph Hellwig
While the buffered write iomap ops do work due to the fact that zeroing
never allocates blocks, the DAX zeroing should use the direct ops just
like actual DAX I/O.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_iomap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6a0c3b307bd73..9b7f92c6aef33 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1322,7 +1322,7 @@ xfs_zero_range(
 
if (IS_DAX(inode))
return dax_zero_range(inode, pos, len, did_zero,
- _buffered_write_iomap_ops);
+ _direct_write_iomap_ops);
return iomap_zero_range(inode, pos, len, did_zero,
_buffered_write_iomap_ops);
 }
@@ -1337,7 +1337,7 @@ xfs_truncate_page(
 
if (IS_DAX(inode))
return dax_truncate_page(inode, pos, did_zero,
-   _buffered_write_iomap_ops);
+   _direct_write_iomap_ops);
return iomap_truncate_page(inode, pos, did_zero,
   _buffered_write_iomap_ops);
 }
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 21/29] xfs: move dax device handling into xfs_{alloc, free}_buftarg

2021-11-29 Thread Christoph Hellwig
Hide the DAX device lookup from the xfs_super.c code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
Reviewed-by: Dan Williams 
---
 fs/xfs/xfs_buf.c   |  8 
 fs/xfs/xfs_buf.h   |  4 ++--
 fs/xfs/xfs_super.c | 26 +-
 3 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 631c5a61d89b7..4d4553ffa7050 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1892,6 +1892,7 @@ xfs_free_buftarg(
list_lru_destroy(>bt_lru);
 
blkdev_issue_flush(btp->bt_bdev);
+   fs_put_dax(btp->bt_daxdev);
 
kmem_free(btp);
 }
@@ -1932,11 +1933,10 @@ xfs_setsize_buftarg_early(
return xfs_setsize_buftarg(btp, bdev_logical_block_size(bdev));
 }
 
-xfs_buftarg_t *
+struct xfs_buftarg *
 xfs_alloc_buftarg(
struct xfs_mount*mp,
-   struct block_device *bdev,
-   struct dax_device   *dax_dev)
+   struct block_device *bdev)
 {
xfs_buftarg_t   *btp;
 
@@ -1945,7 +1945,7 @@ xfs_alloc_buftarg(
btp->bt_mount = mp;
btp->bt_dev =  bdev->bd_dev;
btp->bt_bdev = bdev;
-   btp->bt_daxdev = dax_dev;
+   btp->bt_daxdev = fs_dax_get_by_bdev(bdev);
 
/*
 * Buffer IO error rate limiting. Limit it to no more than 10 messages
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 6b0200b8007d1..bd7f709f0d232 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -338,8 +338,8 @@ xfs_buf_update_cksum(struct xfs_buf *bp, unsigned long 
cksum_offset)
 /*
  * Handling of buftargs.
  */
-extern struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *,
-   struct block_device *, struct dax_device *);
+struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *mp,
+   struct block_device *bdev);
 extern void xfs_free_buftarg(struct xfs_buftarg *);
 extern void xfs_buftarg_wait(struct xfs_buftarg *);
 extern void xfs_buftarg_drain(struct xfs_buftarg *);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index c4297206f4834..3584cfc3c5930 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -391,26 +391,19 @@ STATIC void
 xfs_close_devices(
struct xfs_mount*mp)
 {
-   struct dax_device *dax_ddev = mp->m_ddev_targp->bt_daxdev;
-
if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
struct block_device *logdev = mp->m_logdev_targp->bt_bdev;
-   struct dax_device *dax_logdev = mp->m_logdev_targp->bt_daxdev;
 
xfs_free_buftarg(mp->m_logdev_targp);
xfs_blkdev_put(logdev);
-   fs_put_dax(dax_logdev);
}
if (mp->m_rtdev_targp) {
struct block_device *rtdev = mp->m_rtdev_targp->bt_bdev;
-   struct dax_device *dax_rtdev = mp->m_rtdev_targp->bt_daxdev;
 
xfs_free_buftarg(mp->m_rtdev_targp);
xfs_blkdev_put(rtdev);
-   fs_put_dax(dax_rtdev);
}
xfs_free_buftarg(mp->m_ddev_targp);
-   fs_put_dax(dax_ddev);
 }
 
 /*
@@ -428,8 +421,6 @@ xfs_open_devices(
struct xfs_mount*mp)
 {
struct block_device *ddev = mp->m_super->s_bdev;
-   struct dax_device   *dax_ddev = fs_dax_get_by_bdev(ddev);
-   struct dax_device   *dax_logdev = NULL, *dax_rtdev = NULL;
struct block_device *logdev = NULL, *rtdev = NULL;
int error;
 
@@ -439,8 +430,7 @@ xfs_open_devices(
if (mp->m_logname) {
error = xfs_blkdev_get(mp, mp->m_logname, );
if (error)
-   goto out;
-   dax_logdev = fs_dax_get_by_bdev(logdev);
+   return error;
}
 
if (mp->m_rtname) {
@@ -454,25 +444,24 @@ xfs_open_devices(
error = -EINVAL;
goto out_close_rtdev;
}
-   dax_rtdev = fs_dax_get_by_bdev(rtdev);
}
 
/*
 * Setup xfs_mount buffer target pointers
 */
error = -ENOMEM;
-   mp->m_ddev_targp = xfs_alloc_buftarg(mp, ddev, dax_ddev);
+   mp->m_ddev_targp = xfs_alloc_buftarg(mp, ddev);
if (!mp->m_ddev_targp)
goto out_close_rtdev;
 
if (rtdev) {
-   mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev, dax_rtdev);
+   mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
if (!mp->m_rtdev_targp)
goto out_free_ddev_targ;
}
 
if (logdev && logdev != ddev) {
-   mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev, dax_logdev);
+   mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev);
if (!mp->m_logdev_targp)
goto out_free_rtdev_targ;
} e

[PATCH 16/29] fsdax: simplify the offset check in dax_iomap_zero

2021-11-29 Thread Christoph Hellwig
The file relative offset must have the same alignment as the storage
offset, so use that and get rid of the call to iomap_sector.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/dax.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5364549d67a48..d7a923d152240 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1123,7 +1123,6 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, 
struct vm_fault *vmf,
 
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
-   sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
void *kaddr;
@@ -1131,8 +1130,7 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap 
*iomap)
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
-   if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
-   (size == PAGE_SIZE))
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
page_aligned = true;
 
id = dax_read_lock();
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 19/29] ext2: cleanup the dax handling in ext2_fill_super

2021-11-29 Thread Christoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove
the need for the dax_dev local variable.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/ext2/super.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index a964066a80aa7..7e23482862e69 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -802,7 +802,6 @@ static unsigned long descriptor_loc(struct super_block *sb,
 
 static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 {
-   struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
struct buffer_head * bh;
struct ext2_sb_info * sbi;
struct ext2_super_block * es;
@@ -822,17 +821,17 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
 
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
-   goto failed;
+   return -ENOMEM;
 
sbi->s_blockgroup_lock =
kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
if (!sbi->s_blockgroup_lock) {
kfree(sbi);
-   goto failed;
+   return -ENOMEM;
}
sb->s_fs_info = sbi;
sbi->s_sb_block = sb_block;
-   sbi->s_daxdev = dax_dev;
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
 
spin_lock_init(>s_lock);
ret = -EINVAL;
@@ -946,7 +945,7 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
 
if (test_opt(sb, DAX)) {
-   if (!dax_dev) {
+   if (!sbi->s_daxdev) {
ext2_msg(sb, KERN_ERR,
"DAX unsupported by block device. Turning off 
DAX.");
clear_opt(sbi->s_mount_opt, DAX);
@@ -1201,11 +1200,10 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
 failed_mount:
brelse(bh);
 failed_sbi:
+   fs_put_dax(sbi->s_daxdev);
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);
kfree(sbi);
-failed:
-   fs_put_dax(dax_dev);
return ret;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 13/29] fsdax: use a saner calling convention for copy_cow_page_dax

2021-11-29 Thread Christoph Hellwig
Just pass the vm_fault and iomap_iter structures, and figure out the rest
locally.  Note that this requires moving dax_iomap_sector up in the file.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/dax.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 73bd1439d8089..e51b4129d1b65 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,26 +709,31 @@ int dax_invalidate_mapping_entry_sync(struct 
address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
-static int copy_cow_page_dax(struct block_device *bdev, struct dax_device 
*dax_dev,
-sector_t sector, struct page *to, unsigned long 
vaddr)
+static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
 {
+   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
+}
+
+static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
+{
+   sector_t sector = dax_iomap_sector(>iomap, iter->pos);
void *vto, *kaddr;
pgoff_t pgoff;
long rc;
int id;
 
-   rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, );
+   rc = bdev_dax_pgoff(iter->iomap.bdev, sector, PAGE_SIZE, );
if (rc)
return rc;
 
id = dax_read_lock();
-   rc = dax_direct_access(dax_dev, pgoff, 1, , NULL);
+   rc = dax_direct_access(iter->iomap.dax_dev, pgoff, 1, , NULL);
if (rc < 0) {
dax_read_unlock(id);
return rc;
}
-   vto = kmap_atomic(to);
-   copy_user_page(vto, kaddr, vaddr, to);
+   vto = kmap_atomic(vmf->cow_page);
+   copy_user_page(vto, kaddr, vmf->address, vmf->cow_page);
kunmap_atomic(vto);
dax_read_unlock(id);
return 0;
@@ -1005,11 +1010,6 @@ int dax_writeback_mapping_range(struct address_space 
*mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
-{
-   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
-}
-
 static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 pfn_t *pfnp)
 {
@@ -1332,19 +1332,16 @@ static vm_fault_t dax_fault_synchronous_pfnp(pfn_t 
*pfnp, pfn_t pfn)
 static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
const struct iomap_iter *iter)
 {
-   sector_t sector = dax_iomap_sector(>iomap, iter->pos);
-   unsigned long vaddr = vmf->address;
vm_fault_t ret;
int error = 0;
 
switch (iter->iomap.type) {
case IOMAP_HOLE:
case IOMAP_UNWRITTEN:
-   clear_user_highpage(vmf->cow_page, vaddr);
+   clear_user_highpage(vmf->cow_page, vmf->address);
break;
case IOMAP_MAPPED:
-   error = copy_cow_page_dax(iter->iomap.bdev, iter->iomap.dax_dev,
- sector, vmf->cow_page, vaddr);
+   error = copy_cow_page_dax(vmf, iter);
break;
default:
WARN_ON_ONCE(1);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-29 Thread Christoph Hellwig
Unshare the DAX and iomap buffered I/O page zeroing code.  This code
previously did a IS_DAX check deep inside the iomap code, which in
fact was the only DAX check in the code.  Instead move these checks
into the callers.  Most callers already have DAX special casing anyway
and XFS will need it for reflink support as well.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
---
 fs/dax.c   | 77 ++
 fs/ext2/inode.c|  7 ++--
 fs/ext4/inode.c|  5 +--
 fs/iomap/buffered-io.c | 35 +++
 fs/xfs/xfs_iomap.c |  7 +++-
 include/linux/dax.h|  7 +++-
 6 files changed, 94 insertions(+), 44 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d5db1297a0bb6..43d58b4219fd0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1135,24 +1135,73 @@ static int dax_memzero(struct dax_device *dax_dev, 
pgoff_t pgoff,
return ret;
 }
 
-s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
+static s64 dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
-   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
-   long rc, id;
-   unsigned offset = offset_in_page(pos);
-   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
+   const struct iomap *iomap = >iomap;
+   const struct iomap *srcmap = iomap_iter_srcmap(iter);
+   loff_t pos = iter->pos;
+   u64 length = iomap_length(iter);
+   s64 written = 0;
+
+   /* already zeroed?  we're done. */
+   if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+   return length;
+
+   do {
+   unsigned offset = offset_in_page(pos);
+   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
+   long rc;
+   int id;
 
-   id = dax_read_lock();
-   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
-   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
-   else
-   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
-   dax_read_unlock(id);
+   id = dax_read_lock();
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
+   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
+   else
+   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
+   dax_read_unlock(id);
 
-   if (rc < 0)
-   return rc;
-   return size;
+   if (rc < 0)
+   return rc;
+   pos += size;
+   length -= size;
+   written += size;
+   if (did_zero)
+   *did_zero = true;
+   } while (length > 0);
+
+   return written;
+}
+
+int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
+   const struct iomap_ops *ops)
+{
+   struct iomap_iter iter = {
+   .inode  = inode,
+   .pos= pos,
+   .len= len,
+   .flags  = IOMAP_ZERO,
+   };
+   int ret;
+
+   while ((ret = iomap_iter(, ops)) > 0)
+   iter.processed = dax_zero_iter(, did_zero);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dax_zero_range);
+
+int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
+   const struct iomap_ops *ops)
+{
+   unsigned int blocksize = i_blocksize(inode);
+   unsigned int off = pos & (blocksize - 1);
+
+   /* Block boundary? Nothing to do */
+   if (!off)
+   return 0;
+   return dax_zero_range(inode, pos, blocksize - off, did_zero, ops);
 }
+EXPORT_SYMBOL_GPL(dax_truncate_page);
 
 static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
struct iov_iter *iter)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 333fa62661d56..01d69618277de 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext2.h"
 #include "acl.h"
 #include "xattr.h"
@@ -1297,9 +1298,9 @@ static int ext2_setsize(struct inode *inode, loff_t 
newsize)
inode_dio_wait(inode);
 
if (IS_DAX(inode)) {
-   error = iomap_zero_range(inode, newsize,
-PAGE_ALIGN(newsize) - newsize, NULL,
-_iomap_ops);
+   error = dax_zero_range(inode, newsize,
+  PAGE_ALIGN(newsize) - newsize, NULL,
+  _iomap_ops);
} else if (test_opt(inode->i_sb, NOBH))
error = nobh_truncate_page(inode->i_mapping,
newsize, ext2_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index bfd3545f1e5d9..d316a2009489b 100644
--- a/fs/ext

[PATCH 17/29] fsdax: factor out a dax_memzero helper

2021-11-29 Thread Christoph Hellwig
Factor out a helper for the "manual" zeroing of a DAX range to clean
up dax_iomap_zero a lot.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/dax.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d7a923d152240..d5db1297a0bb6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1121,34 +1121,36 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
*xas, struct vm_fault *vmf,
 }
 #endif /* CONFIG_FS_DAX_PMD */
 
+static int dax_memzero(struct dax_device *dax_dev, pgoff_t pgoff,
+   unsigned int offset, size_t size)
+{
+   void *kaddr;
+   long ret;
+
+   ret = dax_direct_access(dax_dev, pgoff, 1, , NULL);
+   if (ret > 0) {
+   memset(kaddr + offset, 0, size);
+   dax_flush(dax_dev, kaddr + offset, size);
+   }
+   return ret;
+}
+
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
-   void *kaddr;
-   bool page_aligned = false;
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
-   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
-   page_aligned = true;
-
id = dax_read_lock();
-
-   if (page_aligned)
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
else
-   rc = dax_direct_access(iomap->dax_dev, pgoff, 1, , NULL);
-   if (rc < 0) {
-   dax_read_unlock(id);
-   return rc;
-   }
-
-   if (!page_aligned) {
-   memset(kaddr + offset, 0, size);
-   dax_flush(iomap->dax_dev, kaddr + offset, size);
-   }
+   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
dax_read_unlock(id);
+
+   if (rc < 0)
+   return rc;
return size;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 14/29] fsdax: simplify the pgoff calculation

2021-11-29 Thread Christoph Hellwig
Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a
single dax_iomap_pgoff helper that avoids lots of cumbersome sector
conversions.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 drivers/dax/super.c | 14 --
 fs/dax.c| 35 ++-
 include/linux/dax.h |  1 -
 3 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 90b5733f5a709..45d931aefd063 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -66,20 +66,6 @@ void dax_remove_host(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(dax_remove_host);
 
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
 /**
  * fs_dax_get_by_bdev() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
diff --git a/fs/dax.c b/fs/dax.c
index e51b4129d1b65..5364549d67a48 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,23 +709,22 @@ int dax_invalidate_mapping_entry_sync(struct 
address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
-static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
+static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos)
 {
-   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
+   phys_addr_t paddr = iomap->addr + (pos & PAGE_MASK) - iomap->offset;
+
+   if (iomap->bdev)
+   paddr += (get_start_sect(iomap->bdev) << SECTOR_SHIFT);
+   return PHYS_PFN(paddr);
 }
 
 static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
 {
-   sector_t sector = dax_iomap_sector(>iomap, iter->pos);
+   pgoff_t pgoff = dax_iomap_pgoff(>iomap, iter->pos);
void *vto, *kaddr;
-   pgoff_t pgoff;
long rc;
int id;
 
-   rc = bdev_dax_pgoff(iter->iomap.bdev, sector, PAGE_SIZE, );
-   if (rc)
-   return rc;
-
id = dax_read_lock();
rc = dax_direct_access(iter->iomap.dax_dev, pgoff, 1, , NULL);
if (rc < 0) {
@@ -1013,14 +1012,10 @@ EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 pfn_t *pfnp)
 {
-   const sector_t sector = dax_iomap_sector(iomap, pos);
-   pgoff_t pgoff;
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
int id, rc;
long length;
 
-   rc = bdev_dax_pgoff(iomap->bdev, sector, size, );
-   if (rc)
-   return rc;
id = dax_read_lock();
length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
   NULL, pfnp);
@@ -1129,7 +1124,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, 
struct vm_fault *vmf,
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
-   pgoff_t pgoff;
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
void *kaddr;
bool page_aligned = false;
@@ -1140,10 +1135,6 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap 
*iomap)
(size == PAGE_SIZE))
page_aligned = true;
 
-   rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, );
-   if (rc)
-   return rc;
-
id = dax_read_lock();
 
if (page_aligned)
@@ -1169,7 +1160,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
const struct iomap *iomap = >iomap;
loff_t length = iomap_length(iomi);
loff_t pos = iomi->pos;
-   struct block_device *bdev = iomap->bdev;
struct dax_device *dax_dev = iomap->dax_dev;
loff_t end = pos + length, done = 0;
ssize_t ret = 0;
@@ -1203,9 +1193,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
const size_t size = ALIGN(length + offset, PAGE_SIZE);
-   const sector_t sector = dax_iomap_sector(iomap, pos);
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
ssize_t map_len;
-   pgoff_t pgoff;
void *kaddr;
 
if (fatal_signal_pending(current)) {
@@ -1213,10 +1202,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
break;
}
 
-   ret = bdev_dax_pgoff(bdev, sector,

[PATCH 10/29] dm-log-writes: add a log_writes_dax_pgoff helper

2021-11-29 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
Reviewed-by: Dan Williams 
---
 drivers/md/dm-log-writes.c | 42 +++---
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 3155875d4e5b0..cdb22e7a1d0da 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -947,17 +947,21 @@ static int log_dax(struct log_writes_c *lc, sector_t 
sector, size_t bytes,
return 0;
 }
 
+static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
+   pgoff_t *pgoff)
+{
+   struct log_writes_c *lc = ti->private;
+
+   *pgoff += (get_start_sect(lc->dev->bdev) >> PAGE_SECTORS_SHIFT);
+   return lc->dev->dax_dev;
+}
+
 static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 long nr_pages, void **kaddr, pfn_t 
*pfn)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
-   int ret;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages * PAGE_SIZE, 
);
-   if (ret)
-   return ret;
-   return dax_direct_access(lc->dev->dax_dev, pgoff, nr_pages, kaddr, pfn);
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
@@ -966,11 +970,9 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
 {
struct log_writes_c *lc = ti->private;
sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
int err;
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
))
-   return 0;
-
/* Don't bother doing anything if logging has been disabled */
if (!lc->logging_enabled)
goto dax_copy;
@@ -981,34 +983,24 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
return 0;
}
 dax_copy:
-   return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
  pgoff_t pgoff, void *addr, size_t 
bytes,
  struct iov_iter *i)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
))
-   return 0;
-   return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages << PAGE_SHIFT,
-);
-   if (ret)
-   return ret;
-   return dax_zero_page_range(lc->dev->dax_dev, pgoff,
-  nr_pages << PAGE_SHIFT);
+   return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
 }
 
 #else
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 15/29] xfs: add xfs_zero_range and xfs_truncate_page helpers

2021-11-29 Thread Christoph Hellwig
From: Shiyang Ruan 

Add helpers to prepare for using different DAX operations.

Signed-off-by: Shiyang Ruan 
[hch: split from a larger patch + slight cleanups]
Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_bmap_util.c |  7 +++
 fs/xfs/xfs_file.c  |  3 +--
 fs/xfs/xfs_iomap.c | 25 +
 fs/xfs/xfs_iomap.h |  4 
 fs/xfs/xfs_iops.c  |  7 +++
 fs/xfs/xfs_reflink.c   |  3 +--
 6 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 73a36b7be3bd1..797ea0c8b14e1 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1001,7 +1001,7 @@ xfs_free_file_space(
 
/*
 * Now that we've unmap all full blocks we'll have to zero out any
-* partial block at the beginning and/or end.  iomap_zero_range is smart
+* partial block at the beginning and/or end.  xfs_zero_range is smart
 * enough to skip any holes, including those we just created, but we
 * must take care not to zero beyond EOF and enlarge i_size.
 */
@@ -1009,15 +1009,14 @@ xfs_free_file_space(
return 0;
if (offset + len > XFS_ISIZE(ip))
len = XFS_ISIZE(ip) - offset;
-   error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
-   _buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, offset, len, NULL);
if (error)
return error;
 
/*
 * If we zeroed right up to EOF and EOF straddles a page boundary we
 * must make sure that the post-EOF area is also zeroed because the
-* page could be mmap'd and iomap_zero_range doesn't do that for us.
+* page could be mmap'd and xfs_zero_range doesn't do that for us.
 * Writeback of the eof page will do this, albeit clumsily.
 */
if (offset + len >= XFS_ISIZE(ip) && offset_in_page(offset + len) > 0) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 27594738b0d18..8d4c5ca261bd7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -437,8 +437,7 @@ xfs_file_write_checks(
}
 
trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
-   error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
-   NULL, _buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
if (error)
return error;
} else
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 093758440ad53..d6d71ae9f2ae4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1311,3 +1311,28 @@ xfs_xattr_iomap_begin(
 const struct iomap_ops xfs_xattr_iomap_ops = {
.iomap_begin= xfs_xattr_iomap_begin,
 };
+
+int
+xfs_zero_range(
+   struct xfs_inode*ip,
+   loff_t  pos,
+   loff_t  len,
+   bool*did_zero)
+{
+   struct inode*inode = VFS_I(ip);
+
+   return iomap_zero_range(inode, pos, len, did_zero,
+   _buffered_write_iomap_ops);
+}
+
+int
+xfs_truncate_page(
+   struct xfs_inode*ip,
+   loff_t  pos,
+   bool*did_zero)
+{
+   struct inode*inode = VFS_I(ip);
+
+   return iomap_truncate_page(inode, pos, did_zero,
+  _buffered_write_iomap_ops);
+}
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 7d3703556d0e0..f1a281ab9328c 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -20,6 +20,10 @@ xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode 
*ip,
 int xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
struct xfs_bmbt_irec *, u16);
 
+int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len,
+   bool *did_zero);
+int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, bool *did_zero);
+
 static inline xfs_filblks_t
 xfs_aligned_fsb_count(
xfs_fileoff_t   offset_fsb,
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a607d6aca5c4d..ab5ef52b2a9ff 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -911,8 +911,8 @@ xfs_setattr_size(
 */
if (newsize > oldsize) {
trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
-   error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-   _zeroing, _buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, oldsize, newsize - oldsize,
+   _zeroing);
} else {
/*
 * iomap won't detect a dirty page over an unwritten block (or a
@@ -924,8 +924,7 @@ xfs_setattr_size(
 newsize);
if (error)
 

[PATCH 11/29] dm-stripe: add a stripe_dax_pgoff helper

2021-11-29 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
Reviewed-by: Dan Williams 
---
 drivers/md/dm-stripe.c | 63 ++
 1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index f084607220293..50dba3f39274c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio 
*bio)
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
-static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
-   long nr_pages, void **kaddr, pfn_t *pfn)
+static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
struct block_device *bdev;
+   sector_t dev_sector;
uint32_t stripe;
-   long ret;
 
-   stripe_map_sector(sc, sector, , _sector);
+   stripe_map_sector(sc, *pgoff * PAGE_SECTORS, , _sector);
dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
bdev = sc->stripe[stripe].dev->bdev;
 
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, );
-   if (ret)
-   return ret;
+   *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
+   return sc->stripe[stripe].dev->dax_dev;
+}
+
+static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
-
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, );
-   if (ret)
-   return ret;
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 12/29] fsdax: remove a pointless __force cast in copy_cow_page_dax

2021-11-29 Thread Christoph Hellwig
Despite its name copy_user_page expected kernel addresses, which is what
we already have.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4e3e5a283a916..73bd1439d8089 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -728,7 +728,7 @@ static int copy_cow_page_dax(struct block_device *bdev, 
struct dax_device *dax_d
return rc;
}
vto = kmap_atomic(to);
-   copy_user_page(vto, (void __force *)kaddr, vaddr, to);
+   copy_user_page(vto, kaddr, vaddr, to);
kunmap_atomic(vto);
dax_read_unlock(id);
return 0;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/29] xfs: factor out a xfs_setup_dax_always helper

2021-11-29 Thread Christoph Hellwig
Factor out another DAX setup helper to simplify future changes.  Also
move the experimental warning after the checks to not clutter the log
too much if the setup failed.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/xfs_super.c | 47 +++---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e21459f9923a8..875fd3151d6c9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -340,6 +340,32 @@ xfs_buftarg_is_dax(
bdev_nr_sectors(bt->bt_bdev));
 }
 
+static int
+xfs_setup_dax_always(
+   struct xfs_mount*mp)
+{
+   struct super_block  *sb = mp->m_super;
+
+   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
+  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+   xfs_alert(mp,
+   "DAX unsupported by block device. Turning off DAX.");
+   goto disable_dax;
+   }
+
+   if (xfs_has_reflink(mp)) {
+   xfs_alert(mp, "DAX and reflink cannot be used together!");
+   return -EINVAL;
+   }
+
+   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
+   return 0;
+
+disable_dax:
+   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
+   return 0;
+}
+
 STATIC int
 xfs_blkdev_get(
xfs_mount_t *mp,
@@ -1593,26 +1619,9 @@ xfs_fs_fill_super(
sb->s_flags |= SB_I_VERSION;
 
if (xfs_has_dax_always(mp)) {
-   bool rtdev_is_dax = false, datadev_is_dax;
-
-   xfs_warn(mp,
-   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-
-   datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
-   if (mp->m_rtdev_targp)
-   rtdev_is_dax = xfs_buftarg_is_dax(sb,
-   mp->m_rtdev_targp);
-   if (!rtdev_is_dax && !datadev_is_dax) {
-   xfs_alert(mp,
-   "DAX unsupported by block device. Turning off DAX.");
-   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
-   }
-   if (xfs_has_reflink(mp)) {
-   xfs_alert(mp,
-   "DAX and reflink cannot be used together!");
-   error = -EINVAL;
+   error = xfs_setup_dax_always(mp);
+   if (error)
goto out_filestream_unmount;
-   }
}
 
if (xfs_has_discard(mp)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/29] dax: remove CONFIG_DAX_DRIVER

2021-11-29 Thread Christoph Hellwig
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
---
 drivers/dax/Kconfig| 4 
 drivers/nvdimm/Kconfig | 2 +-
 drivers/s390/block/Kconfig | 2 +-
 fs/fuse/Kconfig| 2 +-
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d2834c2cfa10d..954ab14ba7778 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,8 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config DAX_DRIVER
-   select DAX
-   bool
-
 menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index b7d1eb38b27d4..347fe7afa5830 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -22,7 +22,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
-   select DAX_DRIVER
+   select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index d0416dbd0cd81..e3710a762abae 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -5,7 +5,7 @@ comment "S/390 block device drivers"
 config DCSSBLK
def_tristate m
select FS_DAX_LIMITED
-   select DAX_DRIVER
+   select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 40ce9a1c12e5d..038ed0b9aaa5d 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -45,7 +45,7 @@ config FUSE_DAX
select INTERVAL_TREE
depends on VIRTIO_FS
depends on FS_DAX
-   depends on DAX_DRIVER
+   depends on DAX
help
  This allows bypassing guest page cache and allows mapping host page
  cache directly in guest address space.
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 04/29] dax: simplify the dax_device <-> gendisk association

2021-11-29 Thread Christoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/dax/bus.c|   6 +-
 drivers/dax/super.c  | 109 +--
 drivers/md/dm.c  |   6 +-
 drivers/nvdimm/pmem.c|  10 +++-
 drivers/s390/block/dcssblk.c |  11 +++-
 fs/fuse/virtio_fs.c  |   2 +-
 include/linux/dax.h  |  19 --
 7 files changed, 66 insertions(+), 97 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d9..bd7af2f7c5b0a 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1323,10 +1323,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
}
 
/*
-* No 'host' or dax_operations since there is no access to this
-* device outside of mmap of the resulting character device.
+* No dax_operations since there is no access to this device outside of
+* mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e20d0cef10a18..bf77c3da5d56d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,10 +7,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -21,15 +19,12 @@
  * struct dax_device - anchor object for dax services
  * @inode: core vfs
  * @cdev: optional character interface for "device dax"
- * @host: optional name for lookups where the device path is not available
  * @private: dax driver private data
  * @flags: state and boolean properties
  */
 struct dax_device {
-   struct hlist_node list;
struct inode inode;
struct cdev cdev;
-   const char *host;
void *private;
unsigned long flags;
const struct dax_operations *ops;
@@ -42,10 +37,6 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
-#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
-static struct hlist_head dax_host_list[DAX_HASH_SIZE];
-static DEFINE_SPINLOCK(dax_host_lock);
-
 int dax_read_lock(void)
 {
return srcu_read_lock(_srcu);
@@ -58,13 +49,22 @@ void dax_read_unlock(int id)
 }
 EXPORT_SYMBOL_GPL(dax_read_unlock);
 
-static int dax_host_hash(const char *host)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
+#include 
+
+static DEFINE_XARRAY(dax_hosts);
+
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
 {
-   return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+   return xa_insert(_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(dax_add_host);
 
-#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-#include 
+void dax_remove_host(struct gendisk *disk)
+{
+   xa_erase(_hosts, (unsigned long)disk);
+}
+EXPORT_SYMBOL_GPL(dax_remove_host);
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
pgoff_t *pgoff)
@@ -81,41 +81,24 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
sector, size_t size,
 EXPORT_SYMBOL(bdev_dax_pgoff);
 
 /**
- * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
- * @host: alternate name for the device registered by a dax driver
+ * fs_dax_get_by_bdev() - temporary lookup mechanism for filesystem-dax
+ * @bdev: block device to find a dax_device for
  */
-static struct dax_device *dax_get_by_host(const char *host)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
-   struct dax_device *dax_dev, *found = NULL;
-   int hash, id;
+   struct dax_device *dax_dev;
+   int id;
 
-   if (!host)
+   if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   hash = dax_host_hash(host);
-
id = dax_read_lock();
-   spin_lock(_host_lock);
-   hlist_for_each_entry(dax_dev, _host_list[hash], list) {
-   if (!dax_alive(dax_dev)
-   || strcmp(host, dax_dev->host) != 0)
-   continue;
-
-   if (igrab(_dev->inode))
-   found = dax_dev;
-   break;
-   }
-   spin_unlock(_host_lock);
+   dax_dev = xa_load(_hosts, (unsigned long)bdev->bd_disk);
+   if (!dax_dev || !dax_alive(dax_dev) || !igrab(_dev->inode))
+   dax_dev = NULL;
dax_read_unlock(id);
 
-   return found;
-}
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
-{
-   if (!blk_queue_

[PATCH 08/29] dax: remove dax_capable

2021-11-29 Thread Christoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
Reviewed-by: Gao Xiang  [erofs]
Reviewed-by: Dan Williams 
Reviewed-by: Darrick J. Wong 
---
 drivers/dax/super.c  | 36 
 drivers/md/dm-table.c| 22 +++---
 drivers/md/dm.c  | 21 -
 drivers/md/dm.h  |  4 
 drivers/nvdimm/pmem.c|  1 -
 drivers/s390/block/dcssblk.c |  1 -
 fs/erofs/super.c | 11 +++
 fs/ext2/super.c  |  6 --
 fs/ext4/super.c  |  9 ++---
 fs/xfs/xfs_super.c   | 21 -
 include/linux/dax.h  | 14 --
 11 files changed, 36 insertions(+), 110 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index f2cef47bdeafd..90b5733f5a709 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,42 +107,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
-
-bool generic_fsdax_supported(struct dax_device *dax_dev,
-   struct block_device *bdev, int blocksize, sector_t start,
-   sector_t sectors)
-{
-   if (blocksize != PAGE_SIZE) {
-   pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
-   return false;
-   }
-
-   if (!dax_dev) {
-   pr_debug("%pg: error: dax unsupported by block device\n", bdev);
-   return false;
-   }
-
-   return true;
-}
-EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
-   int blocksize, sector_t start, sector_t len)
-{
-   bool ret = false;
-   int id;
-
-   if (!dax_dev)
-   return false;
-
-   id = dax_read_lock();
-   if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
-   ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
- start, len);
-   dax_read_unlock(id);
-   return ret;
-}
-EXPORT_SYMBOL_GPL(dax_supported);
 #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index aa173f5bdc3dd..e43096cfe9e22 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -806,12 +806,14 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
+static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   int blocksize = *(int *) data;
+   if (dev->dax_dev)
+   return false;
 
-   return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   DMDEBUG("%pg: error: dax unsupported by block device", dev->bdev);
+   return true;
 }
 
 /* Check devices support synchronous DAX */
@@ -821,8 +823,8 @@ static int device_not_dax_synchronous_capable(struct 
dm_target *ti, struct dm_de
return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
 }
 
-bool dm_table_supports_dax(struct dm_table *t,
-  iterate_devices_callout_fn iterate_fn, int 
*blocksize)
+static bool dm_table_supports_dax(struct dm_table *t,
+  iterate_devices_callout_fn iterate_fn)
 {
struct dm_target *ti;
unsigned i;
@@ -835,7 +837,7 @@ bool dm_table_supports_dax(struct dm_table *t,
return false;
 
if (!ti->type->iterate_devices ||
-   ti->type->iterate_devices(ti, iterate_fn, blocksize))
+   ti->type->iterate_devices(ti, iterate_fn, NULL))
return false;
}
 
@@ -862,7 +864,6 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
-   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -906,7 +907,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, device_not_dax_capable, 
_size) ||
+   if (dm_table_supports_dax(t, device_not_dax_capable) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;

[PATCH 02/29] dm: make the DAX support depend on CONFIG_FS_DAX

2021-11-29 Thread Christoph Hellwig
The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax.  Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER.  This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/dax/super.c| 6 ++
 drivers/md/dm-linear.c | 2 +-
 drivers/md/dm-log-writes.c | 2 +-
 drivers/md/dm-stripe.c | 2 +-
 drivers/md/dm-writecache.c | 2 +-
 drivers/md/dm.c| 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index b882cf8106ea3..e20d0cef10a18 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,7 +63,7 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
-#ifdef CONFIG_BLOCK
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 #include 
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
@@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
sector, size_t size,
 }
 EXPORT_SYMBOL(bdev_dax_pgoff);
 
-#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct 
block_device *bdev,
return ret;
 }
 EXPORT_SYMBOL_GPL(dax_supported);
-#endif /* CONFIG_FS_DAX */
-#endif /* CONFIG_BLOCK */
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
/* !alive + rcu grace period == no new operations / mappings */
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 66ba16713f696..0a260c35aeeed 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -162,7 +162,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 0b3ef977ceeba..3155875d4e5b0 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -901,7 +901,7 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
   struct iov_iter *i)
 {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 6660b6b53d5bf..f084607220293 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 4b8991cde223d..4f31591d2d25e 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -38,7 +38,7 @@
 #define BITMAP_GRANULARITY PAGE_SIZE
 #endif
 
-#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX)
 #define DM_WRITECACHE_HAS_PMEM
 #endif
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index acc84dc1bded5..b93fcc91176e5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
+   if (IS_ENABLED(CONFIG_FS_DAX)) {
md->dax_dev = alloc_dax(md, md->disk->disk_name,
_dax_ops, 0);
if (IS_ERR(md->dax_dev)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 06/29] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-11-29 Thread Christoph Hellwig
fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
---
 drivers/dax/super.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c8500b7e2d8a2..f2cef47bdeafd 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -92,6 +92,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
+   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
+   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   pr_info("%pg: error: unaligned partition for dax\n", bdev);
+   return NULL;
+   }
+
id = dax_read_lock();
dax_dev = xa_load(_hosts, (unsigned long)bdev->bd_disk);
if (!dax_dev || !dax_alive(dax_dev) || !igrab(_dev->inode))
@@ -106,10 +112,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   pgoff_t pgoff, pgoff_end;
-   sector_t last_page;
-   int err;
-
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
return false;
@@ -120,19 +122,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, );
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
-   last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
-   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, _end);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/29] dm: fix alloc_dax error handling in alloc_dev

2021-11-29 Thread Christoph Hellwig
Make sure ->dax_dev is NULL on error so that the cleanup path doesn't
trip over an ERR_PTR.

Reported-by: Dan Williams 
Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 662742a310cbb..acc84dc1bded5 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1786,8 +1786,10 @@ static struct mapped_device *alloc_dev(int minor)
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
md->dax_dev = alloc_dax(md, md->disk->disk_name,
_dax_ops, 0);
-   if (IS_ERR(md->dax_dev))
+   if (IS_ERR(md->dax_dev)) {
+   md->dax_dev = NULL;
goto bad;
+   }
}
 
format_dev_t(md->name, MKDEV(_major, minor));
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 09/29] dm-linear: add a linear_dax_pgoff helper

2021-11-29 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
Reviewed-by: Dan Williams 
---
 drivers/md/dm-linear.c | 49 +-
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 0a260c35aeeed..90de42f6743ac 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -163,63 +163,44 @@ static int linear_iterate_devices(struct dm_target *ti,
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
+static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
+{
+   struct linear_c *lc = ti->private;
+   sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
+
+   *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
+   return lc->dev->dax_dev;
+}
+
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
-   long ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, );
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, );
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
+
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/29] dax: remove the pgmap sanity checks in generic_fsdax_supported

2021-11-29 Thread Christoph Hellwig
Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
---
 drivers/dax/super.c | 49 +
 1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index bf77c3da5d56d..c8500b7e2d8a2 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -106,13 +106,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   bool dax_enabled = false;
pgoff_t pgoff, pgoff_end;
-   void *kaddr, *end_kaddr;
-   pfn_t pfn, end_pfn;
sector_t last_page;
-   long len, len2;
-   int err, id;
+   int err;
 
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
@@ -137,49 +133,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   id = dax_read_lock();
-   len = dax_direct_access(dax_dev, pgoff, 1, , );
-   len2 = dax_direct_access(dax_dev, pgoff_end, 1, _kaddr, _pfn);
-
-   if (len < 1 || len2 < 1) {
-   pr_info("%pg: error: dax access failed (%ld)\n",
-   bdev, len < 1 ? len : len2);
-   dax_read_unlock(id);
-   return false;
-   }
-
-   if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
-   /*
-* An arch that has enabled the pmem api should also
-* have its drivers support pfn_t_devmap()
-*
-* This is a developer warning and should not trigger in
-* production. dax_flush() will crash since it depends
-* on being able to do (page_address(pfn_to_page())).
-*/
-   WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
-   dax_enabled = true;
-   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
-   struct dev_pagemap *pgmap, *end_pgmap;
-
-   pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
-   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
-   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
-   && pfn_t_to_page(pfn)->pgmap == pgmap
-   && pfn_t_to_page(end_pfn)->pgmap == pgmap
-   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
-   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
-   dax_enabled = true;
-   put_dev_pagemap(pgmap);
-   put_dev_pagemap(end_pgmap);
-
-   }
-   dax_read_unlock(id);
-
-   if (!dax_enabled) {
-   pr_info("%pg: error: dax support not enabled\n", bdev);
-   return false;
-   }
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


decouple DAX from block devices v2

2021-11-29 Thread Christoph Hellwig
Hi Dan,

this series decouples the DAX from the block layer so that the
block_device is not needed at all for the DAX I/O path.

Changes since v1:
 - rebase on latest v5.16-rc
 - ensure the new dax zeroing helpers are always declared
 - fix a dax_dev leak in pmem_attach_disk
 - remove '\n' from an xfs format string
 - fix a pre-existing error handling bug in alloc_dev
 - fix a few whitespace issues
 - tighten an error check
 - use s64/u64 a little more
 - improve a few commit messages
 - add a CONFIG_FS_DAX ifdef to stub out IOMAP_DAX
 - improve how IOMAP_DAX is introduced and better document why it is
   added
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 23/29] xfs: use IOMAP_DAX to check for DAX mappings

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 03:01:24PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 09, 2021 at 09:33:03AM +0100, Christoph Hellwig wrote:
> > Use the explicit DAX flag instead of checking the inode flag in the
> > iomap code.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Any particular reason to pass this in as a flag vs. querying the inode?

Same reason as the addition of IOMAP_DAX.  But I think I'll redo this
a bit to do the XFS paramater passing first and then actually check
IOMAP_DAX together with introducing it to make it all a little more clear.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 22/29] iomap: add a IOMAP_DAX flag

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 06:47:10PM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:34 AM Christoph Hellwig  wrote:
> >
> > Add a flag so that the file system can easily detect DAX operations.
> 
> Looks ok, but I would have preferred a quick note about the rationale
> here before needing to read other patches to figure that out.

The reason is to only apply the DAX partition offsets to actual DAX
operations, and not to e.g. fiemap.  I'll document that more clearly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 25/29] dax: return the partition offset from fs_dax_get_by_bdev

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 06:56:29PM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:34 AM Christoph Hellwig  wrote:
> >
> > Prepare from removing the block_device from the DAX I/O path by returning
> 
> s/from removing/for the removal of/

Fixed.

> > td->dm_dev.bdev = bdev;
> > -   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev);
> > +   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, _off);
> 
> Perhaps allow NULL as an argument for callers that do not care about
> the start offset?

All callers currently care, dm just has another way to get at the
information.  So for now I'd like to not add the NULL special case,
but we can reconsider that as needed if/when more callers show up.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 21/29] xfs: move dax device handling into xfs_{alloc,free}_buftarg

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 06:40:47PM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:34 AM Christoph Hellwig  wrote:
> >
> > Hide the DAX device lookup from the xfs_super.c code.
> >
> > Reviewed-by: Christoph Hellwig 
> 
> That's an interesting spelling of "Signed-off-by", but patch looks
> good to me too. I would have expected a robot to complain about
> missing sign-off?

Hah.  I'll fix it up.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 20/29] ext4: cleanup the dax handling in ext4_fill_super

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:54:30PM -0800, Darrick J. Wong wrote:
> Nit: no space before the paren  ^ here.

Fixed.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:53:15PM -0800, Darrick J. Wong wrote:
> > -s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
> > +static loff_t dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
> 
> Shouldn't this return value remain s64 to match iomap_iter.processed?

I'll switch it over.  Given that loff_t is always the same as s64
it shouldn't really matter.

(same for the others)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 01:46:35PM -0800, Dan Williams wrote:
> > +   const struct iomap_ops *ops)
> > +{
> > +   unsigned int blocksize = i_blocksize(inode);
> > +   unsigned int off = pos & (blocksize - 1);
> > +
> > +   /* Block boundary? Nothing to do */
> > +   if (!off)
> > +   return 0;
> 
> It took me a moment to figure out why this was correct. I see it was
> also copied from iomap_truncate_page(). It makes sense for DAX where
> blocksize >= PAGE_SIZE so it's always the case that the amount of
> capacity to zero relative to a page is from @pos to the end of the
> block. Is there something else that protects the blocksize < PAGE_SIZE
> case outside of DAX?
> 
> Nothing to change for this patch, just a question I had while reviewing.

This is a helper for truncate ->setattr, where everything outside the
block is deallocated.  So zeroing is only needed inside the block.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 17/29] fsdax: factor out a dax_memzero helper

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 01:22:13PM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:34 AM Christoph Hellwig  wrote:
> >
> > Factor out a helper for the "manual" zeroing of a DAX range to clean
> > up dax_iomap_zero a lot.
> >
> 
> Small / optional fixup below:

Incorporated.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 14/29] fsdax: simplify the pgoff calculation

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:36:42PM -0800, Darrick J. Wong wrote:
> > -   phys_addr_t phys_off = (start_sect + sector) * 512;
> > -
> > -   if (pgoff)
> > -   *pgoff = PHYS_PFN(phys_off);
> > -   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> 
> AFAICT, we're relying on fs_dax_get_by_bdev to have validated this
> previously, which is why the error return stuff goes away?

Exactly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 08/29] dax: remove dax_capable

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:31:23PM -0800, Darrick J. Wong wrote:
> > -   struct super_block  *sb = mp->m_super;
> > -
> > -   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
> > -  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
> > +   if (!mp->m_ddev_targp->bt_daxdev &&
> > +  (!mp->m_rtdev_targp || !mp->m_rtdev_targp->bt_daxdev)) {
> 
> Nit: This  ^ paren should be indented one more column because it's a
> sub-clause of the if() test.

Done.

> Nit: xfs_alert() already adds a newline to the end of the format string.

Already done in the current tree.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/29] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:25:55PM -0800, Darrick J. Wong wrote:
> > +   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
> > +   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
> 
> Do we have to be careful about 64-bit division here, or do we not
> support DAX on 32-bit?

I can't find anything in the Kconfig limiting DAX to 32-bit.  But
then again the existing code has divisions like this, so the compiler
is probably smart enough to turn them into shifts.

> > +   pr_info("%pg: error: unaligned partition for dax\n", bdev);
> 
> I also wonder if this should be ratelimited...?

This happens once (or maybe three times for XFS with rt and log devices)
at mount time, so I see no need for a ratelimit.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 04/29] dax: simplify the dax_device <-> gendisk association

2021-11-22 Thread Christoph Hellwig
On Mon, Nov 22, 2021 at 07:33:06PM -0800, Dan Williams wrote:
> Is it time to add a "DAX" symbol namespace?

What would be the benefit?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-11-22 Thread Christoph Hellwig
On Mon, Nov 22, 2021 at 06:54:09PM -0800, Dan Williams wrote:
> On Thu, Nov 18, 2021 at 10:55 PM Christoph Hellwig  wrote:
> >
> > On Wed, Nov 17, 2021 at 09:23:44AM -0800, Dan Williams wrote:
> > > Applied, fixed the spelling of 'dependent' in the subject and picked
> > > up Mike's Ack from the previous send:
> > >
> > > https://lore.kernel.org/r/yyasbvuorceds...@redhat.com
> > >
> > > Christoph, any particular reason you did not pick up the tags from the
> > > last posting?
> >
> > I thought I did, but apparently I've missed some.
> 
> I'll reply with the ones I see missing that need carrying over and add
> my own reviewed-by then you can send me a pull request when ready,
> deal?

Ok.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/29] nvdimm/pmem: move dax_attribute_group from dax to pmem

2021-11-18 Thread Christoph Hellwig
On Wed, Nov 17, 2021 at 09:44:25AM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig  wrote:
> >
> > dax_attribute_group is only used by the pmem driver, and can avoid the
> > completely pointless lookup by the disk name if moved there.  This
> > leaves just a single caller of dax_get_by_host, so move dax_get_by_host
> > into the same ifdef block as that caller.
> >
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Dan Williams 
> > Link: https://lore.kernel.org/r/20210922173431.2454024-3-...@lst.de
> > Signed-off-by: Dan Williams 
> 
> This one already made v5.16-rc1.

Yes, but 5.16-rc1 did not exist yet when I pointed the series.

Note that the series also has a conflict against 5.16-rc1 in pmem.c,
and buildbot pointed out the file systems need explicit dax.h
includes in a few files for some configurations.

The current branch is here, I just did not bother to repost without
any comments:

   
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dax-block-cleanup

no functional changes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-11-18 Thread Christoph Hellwig
On Wed, Nov 17, 2021 at 09:23:44AM -0800, Dan Williams wrote:
> Applied, fixed the spelling of 'dependent' in the subject and picked
> up Mike's Ack from the previous send:
> 
> https://lore.kernel.org/r/yyasbvuorceds...@redhat.com
> 
> Christoph, any particular reason you did not pick up the tags from the
> last posting?

I thought I did, but apparently I've missed some.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 27/29] dax: fix up some of the block device related ifdefs

2021-11-09 Thread Christoph Hellwig
The DAX device <-> block device association is only enabled if
CONFIG_BLOCK is enabled.  Update dax.h to account for that and use
the right conditions for the fs_put_dax stub as well.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dax.h | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 90f95deff504d..5568d3dca941b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -108,28 +108,15 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
 #endif
 
 struct writeback_control;
-#if IS_ENABLED(CONFIG_FS_DAX)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
 void dax_remove_host(struct gendisk *disk);
-
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
+   u64 *start_off);
 static inline void fs_put_dax(struct dax_device *dax_dev)
 {
put_dax(dax_dev);
 }
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
-   u64 *start_off);
-int dax_writeback_mapping_range(struct address_space *mapping,
-   struct dax_device *dax_dev, struct writeback_control *wbc);
-
-struct page *dax_layout_busy_page(struct address_space *mapping);
-struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t 
start, loff_t end);
-dax_entry_t dax_lock_page(struct page *page);
-void dax_unlock_page(struct page *page, dax_entry_t cookie);
-int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
-   const struct iomap_ops *ops);
-int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-   const struct iomap_ops *ops);
 #else
 static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk 
*disk)
 {
@@ -138,17 +125,29 @@ static inline int dax_add_host(struct dax_device 
*dax_dev, struct gendisk *disk)
 static inline void dax_remove_host(struct gendisk *disk)
 {
 }
-
-static inline void fs_put_dax(struct dax_device *dax_dev)
-{
-}
-
 static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
u64 *start_off)
 {
return NULL;
 }
+static inline void fs_put_dax(struct dax_device *dax_dev)
+{
+}
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
+
+#if IS_ENABLED(CONFIG_FS_DAX)
+int dax_writeback_mapping_range(struct address_space *mapping,
+   struct dax_device *dax_dev, struct writeback_control *wbc);
 
+struct page *dax_layout_busy_page(struct address_space *mapping);
+struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t 
start, loff_t end);
+dax_entry_t dax_lock_page(struct page *page);
+void dax_unlock_page(struct page *page, dax_entry_t cookie);
+int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
+   const struct iomap_ops *ops);
+int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
+   const struct iomap_ops *ops);
+#else
 static inline struct page *dax_layout_busy_page(struct address_space *mapping)
 {
return NULL;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 28/29] iomap: build the block based code conditionally

2021-11-09 Thread Christoph Hellwig
Only build the block based iomap code if CONFIG_BLOCK is set.  Currently
that is always the case, but it will change soon.

Signed-off-by: Christoph Hellwig 
---
 fs/Kconfig| 4 ++--
 fs/iomap/Makefile | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index a6313a969bc5f..6d608330a096e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -15,11 +15,11 @@ config VALIDATE_FS_PARSER
  Enable this to perform validation of the parameter description for a
  filesystem when it is registered.
 
-if BLOCK
-
 config FS_IOMAP
bool
 
+if BLOCK
+
 source "fs/ext2/Kconfig"
 source "fs/ext4/Kconfig"
 source "fs/jbd2/Kconfig"
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index 4143a3ff89dbc..fc070184b7faa 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,9 +9,9 @@ ccflags-y += -I $(srctree)/$(src)   # needed for 
trace events
 obj-$(CONFIG_FS_IOMAP) += iomap.o
 
 iomap-y+= trace.o \
-  buffered-io.o \
+  iter.o
+iomap-$(CONFIG_BLOCK)  += buffered-io.o \
   direct-io.o \
   fiemap.o \
-  iter.o \
   seek.o
 iomap-$(CONFIG_SWAP)   += swapfile.o
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 29/29] fsdax: don't require CONFIG_BLOCK

2021-11-09 Thread Christoph Hellwig
The file system DAX code now does not require the block code.  So allow
building a kernel with fuse DAX but not block layer.

Signed-off-by: Christoph Hellwig 
---
 fs/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 6d608330a096e..7a2b11c0b8036 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -42,6 +42,8 @@ source "fs/nilfs2/Kconfig"
 source "fs/f2fs/Kconfig"
 source "fs/zonefs/Kconfig"
 
+endif # BLOCK
+
 config FS_DAX
bool "File system based Direct Access (DAX) support"
depends on MMU
@@ -89,8 +91,6 @@ config FS_DAX_PMD
 config FS_DAX_LIMITED
bool
 
-endif # BLOCK
-
 # Posix ACL utility routines
 #
 # Note: Posix ACLs can be implemented without these helpers.  Never use
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 26/29] fsdax: shift partition offset handling into the file systems

2021-11-09 Thread Christoph Hellwig
Remove the last user of ->bdev in dax.c by requiring the file system to
pass in an address that already includes the DAX offset.  As part of the
only set ->bdev or ->daxdev when actually required in the ->iomap_begin
methods.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c |  6 +-
 fs/erofs/data.c  | 11 --
 fs/erofs/internal.h  |  1 +
 fs/ext2/inode.c  |  8 +--
 fs/ext4/inode.c  | 16 +-
 fs/xfs/libxfs/xfs_bmap.c |  4 ++--
 fs/xfs/xfs_aops.c|  2 +-
 fs/xfs/xfs_iomap.c   | 45 +---
 fs/xfs/xfs_iomap.h   |  5 +++--
 fs/xfs/xfs_pnfs.c|  2 +-
 10 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 0bd6cdcbacfc4..2c13c681edf09 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -711,11 +711,7 @@ int dax_invalidate_mapping_entry_sync(struct address_space 
*mapping,
 
 static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos)
 {
-   phys_addr_t paddr = iomap->addr + (pos & PAGE_MASK) - iomap->offset;
-
-   if (iomap->bdev)
-   paddr += (get_start_sect(iomap->bdev) << SECTOR_SHIFT);
-   return PHYS_PFN(paddr);
+   return PHYS_PFN(iomap->addr + (pos & PAGE_MASK) - iomap->offset);
 }
 
 static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 0e35ef3f9f3d7..9b1bb177ce303 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -159,6 +159,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
/* primary device by default */
map->m_bdev = sb->s_bdev;
map->m_daxdev = EROFS_SB(sb)->dax_dev;
+   map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
 
if (map->m_deviceid) {
down_read(>rwsem);
@@ -169,6 +170,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
}
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
+   map->m_dax_part_off = dif->dax_part_off;
up_read(>rwsem);
} else if (devs->extra_devices) {
down_read(>rwsem);
@@ -185,6 +187,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
map->m_pa -= startoff;
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
+   map->m_dax_part_off = dif->dax_part_off;
break;
}
}
@@ -215,9 +218,13 @@ static int erofs_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
if (ret)
return ret;
 
-   iomap->bdev = mdev.m_bdev;
-   iomap->dax_dev = mdev.m_daxdev;
iomap->offset = map.m_la;
+   if (flags & IOMAP_DAX) {
+   iomap->dax_dev = mdev.m_daxdev;
+   iomap->offset += mdev.m_dax_part_off;
+   } else {
+   iomap->bdev = mdev.m_bdev;
+   }
iomap->length = map.m_llen;
iomap->flags = 0;
iomap->private = NULL;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index c1e65346e9f15..5c2a83876220c 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -438,6 +438,7 @@ static inline int z_erofs_map_blocks_iter(struct inode 
*inode,
 struct erofs_map_dev {
struct block_device *m_bdev;
struct dax_device *m_daxdev;
+   u64 m_dax_part_off;
 
erofs_off_t m_pa;
unsigned int m_deviceid;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index ae9993018a015..da4c301b43051 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -816,9 +816,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
return ret;
 
iomap->flags = 0;
-   iomap->bdev = inode->i_sb->s_bdev;
iomap->offset = (u64)first_block << blkbits;
-   iomap->dax_dev = sbi->s_daxdev;
+   if (flags & IOMAP_DAX)
+   iomap->dax_dev = sbi->s_daxdev;
+   else
+   iomap->bdev = inode->i_sb->s_bdev;
 
if (ret == 0) {
iomap->type = IOMAP_HOLE;
@@ -827,6 +829,8 @@ static int ext2_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
} else {
iomap->type = IOMAP_MAPPED;
iomap->addr = (u64)bno << blkbits;
+   if (flags & IOMAP_DAX)
+   iomap->addr += sbi->s_dax_part_off;
iomap->length = (u64)ret << blkbits;
iomap->flags |= IOMAP_F_MERGED;
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8c443b753b815..6cbecd7ff9383 100644
--- a/fs/ext4/in

[PATCH 25/29] dax: return the partition offset from fs_dax_get_by_bdev

2021-11-09 Thread Christoph Hellwig
Prepare from removing the block_device from the DAX I/O path by returning
the partition offset from fs_dax_get_by_bdev so that the file systems
have it at hand for use during I/O.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 9 ++---
 drivers/md/dm.c | 4 ++--
 fs/erofs/internal.h | 2 ++
 fs/erofs/super.c| 4 ++--
 fs/ext2/ext2.h  | 1 +
 fs/ext2/super.c | 2 +-
 fs/ext4/ext4.h  | 1 +
 fs/ext4/super.c | 2 +-
 fs/xfs/xfs_buf.c| 2 +-
 fs/xfs/xfs_buf.h| 1 +
 include/linux/dax.h | 6 --
 11 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c0910687fbcb2..cc32dcf71c116 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -70,17 +70,20 @@ EXPORT_SYMBOL_GPL(dax_remove_host);
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
+ * @start_off: returns the byte offset into the dax_device that @bdev starts
  */
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 
*start_off)
 {
struct dax_device *dax_dev;
+   u64 part_size;
int id;
 
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
-   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   *start_off = get_start_sect(bdev) * SECTOR_SIZE;
+   part_size = bdev_nr_sectors(bdev) * SECTOR_SIZE;
+   if (*start_off % PAGE_SIZE || part_size % PAGE_SIZE) {
pr_info("%pg: error: unaligned partition for dax\n", bdev);
return NULL;
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 282008afc465f..5ea6115d19bdc 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -637,7 +637,7 @@ static int open_table_device(struct table_device *td, dev_t 
dev,
 struct mapped_device *md)
 {
struct block_device *bdev;
-
+   u64 part_off;
int r;
 
BUG_ON(td->dm_dev.bdev);
@@ -653,7 +653,7 @@ static int open_table_device(struct table_device *td, dev_t 
dev,
}
 
td->dm_dev.bdev = bdev;
-   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev);
+   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, _off);
return 0;
 }
 
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 3265688af7f9f..c1e65346e9f15 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -51,6 +51,7 @@ struct erofs_device_info {
char *path;
struct block_device *bdev;
struct dax_device *dax_dev;
+   u64 dax_part_off;
 
u32 blocks;
u32 mapped_blkaddr;
@@ -109,6 +110,7 @@ struct erofs_sb_info {
 #endif /* CONFIG_EROFS_FS_ZIP */
struct erofs_dev_context *devs;
struct dax_device *dax_dev;
+   u64 dax_part_off;
u64 total_blocks;
u32 primarydevice_blocks;
 
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0aed886473c8d..71efce16024d9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -312,7 +312,7 @@ static int erofs_init_devices(struct super_block *sb,
goto err_out;
}
dif->bdev = bdev;
-   dif->dax_dev = fs_dax_get_by_bdev(bdev);
+   dif->dax_dev = fs_dax_get_by_bdev(bdev, >dax_part_off);
dif->blocks = le32_to_cpu(dis->blocks);
dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
sbi->total_blocks += dif->blocks;
@@ -644,7 +644,7 @@ static int erofs_fc_fill_super(struct super_block *sb, 
struct fs_context *fc)
 
sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
-   sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
+   sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, >dax_part_off);
sbi->devs = ctx->devs;
ctx->devs = NULL;
 
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 3be9dd6412b78..d4f306aa5aceb 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -118,6 +118,7 @@ struct ext2_sb_info {
spinlock_t s_lock;
struct mb_cache *s_ea_block_cache;
struct dax_device *s_daxdev;
+   u64 s_dax_part_off;
 };
 
 static inline spinlock_t *
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7e23482862e69..94f1fbd7d3ac2 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -831,7 +831,7 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
}
sb->s_fs_info = sbi;
sbi->s_sb_block = sb_block;
-   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev, >s_dax_part_off);
 
spin_lock_init(>s_lock);
ret = -EINVAL;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3825195539d74..6f01994a1d52f 100644
--- a/fs/ext4/ext

[PATCH 23/29] xfs: use IOMAP_DAX to check for DAX mappings

2021-11-09 Thread Christoph Hellwig
Use the explicit DAX flag instead of checking the inode flag in the
iomap code.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_iomap.c | 7 ---
 fs/xfs/xfs_iomap.h | 3 ++-
 fs/xfs/xfs_pnfs.c  | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 604000b6243ec..8cef3b68cba78 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -188,6 +188,7 @@ xfs_iomap_write_direct(
struct xfs_inode*ip,
xfs_fileoff_t   offset_fsb,
xfs_fileoff_t   count_fsb,
+   unsigned intflags,
struct xfs_bmbt_irec*imap)
 {
struct xfs_mount*mp = ip->i_mount;
@@ -229,7 +230,7 @@ xfs_iomap_write_direct(
 * the reserve block pool for bmbt block allocation if there is no space
 * left but we need to do unwritten extent conversion.
 */
-   if (IS_DAX(VFS_I(ip))) {
+   if (flags & IOMAP_DAX) {
bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
if (imap->br_state == XFS_EXT_UNWRITTEN) {
force = true;
@@ -620,7 +621,7 @@ imap_needs_alloc(
imap->br_startblock == DELAYSTARTBLOCK)
return true;
/* we convert unwritten extents before copying the data for DAX */
-   if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
+   if ((flags & IOMAP_DAX) && imap->br_state == XFS_EXT_UNWRITTEN)
return true;
return false;
 }
@@ -826,7 +827,7 @@ xfs_direct_write_iomap_begin(
xfs_iunlock(ip, lockmode);
 
error = xfs_iomap_write_direct(ip, offset_fsb, end_fsb - offset_fsb,
-   );
+   flags, );
if (error)
return error;
 
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index f1a281ab9328c..5648262a71736 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -12,7 +12,8 @@ struct xfs_inode;
 struct xfs_bmbt_irec;
 
 int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb,
-   xfs_fileoff_t count_fsb, struct xfs_bmbt_irec *imap);
+   xfs_fileoff_t count_fsb, unsigned int flags,
+   struct xfs_bmbt_irec *imap);
 int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
 xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode *ip,
xfs_fileoff_t end_fsb);
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index 5e1d29d8b2e73..e188e1cf97cc5 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -155,7 +155,7 @@ xfs_fs_map_blocks(
xfs_iunlock(ip, lock_flags);
 
error = xfs_iomap_write_direct(ip, offset_fsb,
-   end_fsb - offset_fsb, );
+   end_fsb - offset_fsb, 0, );
if (error)
goto out_unlock;
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 24/29] xfs: use xfs_direct_write_iomap_ops for DAX zeroing

2021-11-09 Thread Christoph Hellwig
While the buffered write iomap ops do work due to the fact that zeroing
never allocates blocks, the DAX zeroing should use the direct ops just
like actual DAX I/O.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_iomap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 8cef3b68cba78..704292c6ce0c7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1324,7 +1324,7 @@ xfs_zero_range(
 
if (IS_DAX(inode))
return dax_zero_range(inode, pos, len, did_zero,
- _buffered_write_iomap_ops);
+ _direct_write_iomap_ops);
return iomap_zero_range(inode, pos, len, did_zero,
_buffered_write_iomap_ops);
 }
@@ -1339,7 +1339,7 @@ xfs_truncate_page(
 
if (IS_DAX(inode))
return dax_truncate_page(inode, pos, did_zero,
-   _buffered_write_iomap_ops);
+   _direct_write_iomap_ops);
return iomap_truncate_page(inode, pos, did_zero,
   _buffered_write_iomap_ops);
 }
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 22/29] iomap: add a IOMAP_DAX flag

2021-11-09 Thread Christoph Hellwig
Add a flag so that the file system can easily detect DAX operations.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c  | 7 ---
 include/linux/iomap.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5b52b878124ac..0bd6cdcbacfc4 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1180,7 +1180,7 @@ int dax_zero_range(struct inode *inode, loff_t pos, 
loff_t len, bool *did_zero,
.inode  = inode,
.pos= pos,
.len= len,
-   .flags  = IOMAP_ZERO,
+   .flags  = IOMAP_DAX | IOMAP_ZERO,
};
int ret;
 
@@ -1308,6 +1308,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
.inode  = iocb->ki_filp->f_mapping->host,
.pos= iocb->ki_pos,
.len= iov_iter_count(iter),
+   .flags  = IOMAP_DAX,
};
loff_t done = 0;
int ret;
@@ -1461,7 +1462,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
.inode  = mapping->host,
.pos= (loff_t)vmf->pgoff << PAGE_SHIFT,
.len= PAGE_SIZE,
-   .flags  = IOMAP_FAULT,
+   .flags  = IOMAP_DAX | IOMAP_FAULT,
};
vm_fault_t ret = 0;
void *entry;
@@ -1570,7 +1571,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
struct iomap_iter iter = {
.inode  = mapping->host,
.len= PMD_SIZE,
-   .flags  = IOMAP_FAULT,
+   .flags  = IOMAP_DAX | IOMAP_FAULT,
};
vm_fault_t ret = VM_FAULT_FALLBACK;
pgoff_t max_pgoff;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6d1b08d0ae930..146a7e3e3ea11 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -141,6 +141,7 @@ struct iomap_page_ops {
 #define IOMAP_NOWAIT   (1 << 5) /* do not block */
 #define IOMAP_OVERWRITE_ONLY   (1 << 6) /* only pure overwrites allowed */
 #define IOMAP_UNSHARE  (1 << 7) /* unshare_file_range */
+#define IOMAP_DAX  (1 << 8) /* DAX mapping */
 
 struct iomap_ops {
/*
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 20/29] ext4: cleanup the dax handling in ext4_fill_super

2021-11-09 Thread Christoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove
the need for the dax_dev local variable.

Signed-off-by: Christoph Hellwig 
---
 fs/ext4/super.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb4df43abd76e..b60401bb1c310 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3879,7 +3879,6 @@ static void ext4_setup_csum_trigger(struct super_block 
*sb,
 
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 {
-   struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
char *orig_data = kstrdup(data, GFP_KERNEL);
struct buffer_head *bh, **group_desc;
struct ext4_super_block *es = NULL;
@@ -3910,12 +3909,12 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
if ((data && !orig_data) || !sbi)
goto out_free_base;
 
-   sbi->s_daxdev = dax_dev;
sbi->s_blockgroup_lock =
kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
if (!sbi->s_blockgroup_lock)
goto out_free_base;
 
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
sb->s_fs_info = sbi;
sbi->s_sb = sb;
sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS;
@@ -4300,7 +4299,7 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
goto failed_mount;
}
 
-   if (dax_dev) {
+   if (sbi->s_daxdev) {
if (blocksize == PAGE_SIZE)
set_bit(EXT4_FLAGS_BDEV_IS_DAX, >s_ext4_flags);
else
@@ -5096,10 +5095,10 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
 out_fail:
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);
+   fs_put_dax(sbi->s_daxdev );
 out_free_base:
kfree(sbi);
kfree(orig_data);
-   fs_put_dax(dax_dev);
return err ? err : ret;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 21/29] xfs: move dax device handling into xfs_{alloc, free}_buftarg

2021-11-09 Thread Christoph Hellwig
Hide the DAX device lookup from the xfs_super.c code.

Reviewed-by: Christoph Hellwig 
---
 fs/xfs/xfs_buf.c   |  8 
 fs/xfs/xfs_buf.h   |  4 ++--
 fs/xfs/xfs_super.c | 26 +-
 3 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 631c5a61d89b7..4d4553ffa7050 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1892,6 +1892,7 @@ xfs_free_buftarg(
list_lru_destroy(>bt_lru);
 
blkdev_issue_flush(btp->bt_bdev);
+   fs_put_dax(btp->bt_daxdev);
 
kmem_free(btp);
 }
@@ -1932,11 +1933,10 @@ xfs_setsize_buftarg_early(
return xfs_setsize_buftarg(btp, bdev_logical_block_size(bdev));
 }
 
-xfs_buftarg_t *
+struct xfs_buftarg *
 xfs_alloc_buftarg(
struct xfs_mount*mp,
-   struct block_device *bdev,
-   struct dax_device   *dax_dev)
+   struct block_device *bdev)
 {
xfs_buftarg_t   *btp;
 
@@ -1945,7 +1945,7 @@ xfs_alloc_buftarg(
btp->bt_mount = mp;
btp->bt_dev =  bdev->bd_dev;
btp->bt_bdev = bdev;
-   btp->bt_daxdev = dax_dev;
+   btp->bt_daxdev = fs_dax_get_by_bdev(bdev);
 
/*
 * Buffer IO error rate limiting. Limit it to no more than 10 messages
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 6b0200b8007d1..bd7f709f0d232 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -338,8 +338,8 @@ xfs_buf_update_cksum(struct xfs_buf *bp, unsigned long 
cksum_offset)
 /*
  * Handling of buftargs.
  */
-extern struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *,
-   struct block_device *, struct dax_device *);
+struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *mp,
+   struct block_device *bdev);
 extern void xfs_free_buftarg(struct xfs_buftarg *);
 extern void xfs_buftarg_wait(struct xfs_buftarg *);
 extern void xfs_buftarg_drain(struct xfs_buftarg *);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3a45d5caa28d5..7262716afb215 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -391,26 +391,19 @@ STATIC void
 xfs_close_devices(
struct xfs_mount*mp)
 {
-   struct dax_device *dax_ddev = mp->m_ddev_targp->bt_daxdev;
-
if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
struct block_device *logdev = mp->m_logdev_targp->bt_bdev;
-   struct dax_device *dax_logdev = mp->m_logdev_targp->bt_daxdev;
 
xfs_free_buftarg(mp->m_logdev_targp);
xfs_blkdev_put(logdev);
-   fs_put_dax(dax_logdev);
}
if (mp->m_rtdev_targp) {
struct block_device *rtdev = mp->m_rtdev_targp->bt_bdev;
-   struct dax_device *dax_rtdev = mp->m_rtdev_targp->bt_daxdev;
 
xfs_free_buftarg(mp->m_rtdev_targp);
xfs_blkdev_put(rtdev);
-   fs_put_dax(dax_rtdev);
}
xfs_free_buftarg(mp->m_ddev_targp);
-   fs_put_dax(dax_ddev);
 }
 
 /*
@@ -428,8 +421,6 @@ xfs_open_devices(
struct xfs_mount*mp)
 {
struct block_device *ddev = mp->m_super->s_bdev;
-   struct dax_device   *dax_ddev = fs_dax_get_by_bdev(ddev);
-   struct dax_device   *dax_logdev = NULL, *dax_rtdev = NULL;
struct block_device *logdev = NULL, *rtdev = NULL;
int error;
 
@@ -439,8 +430,7 @@ xfs_open_devices(
if (mp->m_logname) {
error = xfs_blkdev_get(mp, mp->m_logname, );
if (error)
-   goto out;
-   dax_logdev = fs_dax_get_by_bdev(logdev);
+   return error;
}
 
if (mp->m_rtname) {
@@ -454,25 +444,24 @@ xfs_open_devices(
error = -EINVAL;
goto out_close_rtdev;
}
-   dax_rtdev = fs_dax_get_by_bdev(rtdev);
}
 
/*
 * Setup xfs_mount buffer target pointers
 */
error = -ENOMEM;
-   mp->m_ddev_targp = xfs_alloc_buftarg(mp, ddev, dax_ddev);
+   mp->m_ddev_targp = xfs_alloc_buftarg(mp, ddev);
if (!mp->m_ddev_targp)
goto out_close_rtdev;
 
if (rtdev) {
-   mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev, dax_rtdev);
+   mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
if (!mp->m_rtdev_targp)
goto out_free_ddev_targ;
}
 
if (logdev && logdev != ddev) {
-   mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev, dax_logdev);
+   mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev);
if (!mp->m_logdev_targp)
goto out_free_rtdev_targ;
} else {
@@ -488,14 +477,9 @@ xfs_open_devices(
   

[PATCH 19/29] ext2: cleanup the dax handling in ext2_fill_super

2021-11-09 Thread Christoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove
the need for the dax_dev local variable.

Signed-off-by: Christoph Hellwig 
---
 fs/ext2/super.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index a964066a80aa7..7e23482862e69 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -802,7 +802,6 @@ static unsigned long descriptor_loc(struct super_block *sb,
 
 static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 {
-   struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
struct buffer_head * bh;
struct ext2_sb_info * sbi;
struct ext2_super_block * es;
@@ -822,17 +821,17 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
 
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
-   goto failed;
+   return -ENOMEM;
 
sbi->s_blockgroup_lock =
kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
if (!sbi->s_blockgroup_lock) {
kfree(sbi);
-   goto failed;
+   return -ENOMEM;
}
sb->s_fs_info = sbi;
sbi->s_sb_block = sb_block;
-   sbi->s_daxdev = dax_dev;
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
 
spin_lock_init(>s_lock);
ret = -EINVAL;
@@ -946,7 +945,7 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
 
if (test_opt(sb, DAX)) {
-   if (!dax_dev) {
+   if (!sbi->s_daxdev) {
ext2_msg(sb, KERN_ERR,
"DAX unsupported by block device. Turning off 
DAX.");
clear_opt(sbi->s_mount_opt, DAX);
@@ -1201,11 +1200,10 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
 failed_mount:
brelse(bh);
 failed_sbi:
+   fs_put_dax(sbi->s_daxdev);
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);
kfree(sbi);
-failed:
-   fs_put_dax(dax_dev);
return ret;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 17/29] fsdax: factor out a dax_memzero helper

2021-11-09 Thread Christoph Hellwig
Factor out a helper for the "manual" zeroing of a DAX range to clean
up dax_iomap_zero a lot.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d7a923d152240..dc9ebeff850ab 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1121,34 +1121,36 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
*xas, struct vm_fault *vmf,
 }
 #endif /* CONFIG_FS_DAX_PMD */
 
+static int dax_memzero(struct dax_device *dax_dev, pgoff_t pgoff,
+   unsigned int offset, size_t size)
+{
+   void *kaddr;
+   long rc;
+
+   rc = dax_direct_access(dax_dev, pgoff, 1, , NULL);
+   if (rc >= 0) {
+   memset(kaddr + offset, 0, size);
+   dax_flush(dax_dev, kaddr + offset, size);
+   }
+   return rc;
+}
+
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
-   void *kaddr;
-   bool page_aligned = false;
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
-   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
-   page_aligned = true;
-
id = dax_read_lock();
-
-   if (page_aligned)
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
else
-   rc = dax_direct_access(iomap->dax_dev, pgoff, 1, , NULL);
-   if (rc < 0) {
-   dax_read_unlock(id);
-   return rc;
-   }
-
-   if (!page_aligned) {
-   memset(kaddr + offset, 0, size);
-   dax_flush(iomap->dax_dev, kaddr + offset, size);
-   }
+   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
dax_read_unlock(id);
+
+   if (rc < 0)
+   return rc;
return size;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-09 Thread Christoph Hellwig
Unshare the DAX and iomap buffered I/O page zeroing code.  This code
previously did a IS_DAX check deep inside the iomap code, which in
fact was the only DAX check in the code.  Instead move these checks
into the callers.  Most callers already have DAX special casing anyway
and XFS will need it for reflink support as well.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c   | 77 ++
 fs/ext2/inode.c|  6 ++--
 fs/ext4/inode.c|  4 +--
 fs/iomap/buffered-io.c | 35 +++
 fs/xfs/xfs_iomap.c |  6 
 include/linux/dax.h|  6 +++-
 6 files changed, 91 insertions(+), 43 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index dc9ebeff850ab..5b52b878124ac 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1135,24 +1135,73 @@ static int dax_memzero(struct dax_device *dax_dev, 
pgoff_t pgoff,
return rc;
 }
 
-s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
+static loff_t dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
-   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
-   long rc, id;
-   unsigned offset = offset_in_page(pos);
-   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
+   const struct iomap *iomap = >iomap;
+   const struct iomap *srcmap = iomap_iter_srcmap(iter);
+   loff_t pos = iter->pos;
+   loff_t length = iomap_length(iter);
+   loff_t written = 0;
+
+   /* already zeroed?  we're done. */
+   if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+   return length;
+
+   do {
+   unsigned offset = offset_in_page(pos);
+   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
+   long rc;
+   int id;
 
-   id = dax_read_lock();
-   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
-   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
-   else
-   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
-   dax_read_unlock(id);
+   id = dax_read_lock();
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
+   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
+   else
+   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
+   dax_read_unlock(id);
 
-   if (rc < 0)
-   return rc;
-   return size;
+   if (rc < 0)
+   return rc;
+   pos += size;
+   length -= size;
+   written += size;
+   if (did_zero)
+   *did_zero = true;
+   } while (length > 0);
+
+   return written;
+}
+
+int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
+   const struct iomap_ops *ops)
+{
+   struct iomap_iter iter = {
+   .inode  = inode,
+   .pos= pos,
+   .len= len,
+   .flags  = IOMAP_ZERO,
+   };
+   int ret;
+
+   while ((ret = iomap_iter(, ops)) > 0)
+   iter.processed = dax_zero_iter(, did_zero);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dax_zero_range);
+
+int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
+   const struct iomap_ops *ops)
+{
+   unsigned int blocksize = i_blocksize(inode);
+   unsigned int off = pos & (blocksize - 1);
+
+   /* Block boundary? Nothing to do */
+   if (!off)
+   return 0;
+   return dax_zero_range(inode, pos, blocksize - off, did_zero, ops);
 }
+EXPORT_SYMBOL_GPL(dax_truncate_page);
 
 static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
struct iov_iter *iter)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 333fa62661d56..ae9993018a015 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1297,9 +1297,9 @@ static int ext2_setsize(struct inode *inode, loff_t 
newsize)
inode_dio_wait(inode);
 
if (IS_DAX(inode)) {
-   error = iomap_zero_range(inode, newsize,
-PAGE_ALIGN(newsize) - newsize, NULL,
-_iomap_ops);
+   error = dax_zero_range(inode, newsize,
+  PAGE_ALIGN(newsize) - newsize, NULL,
+  _iomap_ops);
} else if (test_opt(inode->i_sb, NOBH))
error = nobh_truncate_page(inode->i_mapping,
newsize, ext2_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0f06305167d5a..8c443b753b815 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3783,8 +3783,8 @@ static int ext4_block_zero_page_range(handle_t *handle,
length = max;
 
if (IS_DAX(inode)) {
-  

[PATCH 14/29] fsdax: simplify the pgoff calculation

2021-11-09 Thread Christoph Hellwig
Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a
single dax_iomap_pgoff helper that avoids lots of cumbersome sector
conversions.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 14 --
 fs/dax.c| 35 ++-
 include/linux/dax.h |  1 -
 3 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 803942586d1b6..c0910687fbcb2 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(dax_remove_host);
 
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
diff --git a/fs/dax.c b/fs/dax.c
index e51b4129d1b65..5364549d67a48 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,23 +709,22 @@ int dax_invalidate_mapping_entry_sync(struct 
address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
-static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
+static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos)
 {
-   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
+   phys_addr_t paddr = iomap->addr + (pos & PAGE_MASK) - iomap->offset;
+
+   if (iomap->bdev)
+   paddr += (get_start_sect(iomap->bdev) << SECTOR_SHIFT);
+   return PHYS_PFN(paddr);
 }
 
 static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
 {
-   sector_t sector = dax_iomap_sector(>iomap, iter->pos);
+   pgoff_t pgoff = dax_iomap_pgoff(>iomap, iter->pos);
void *vto, *kaddr;
-   pgoff_t pgoff;
long rc;
int id;
 
-   rc = bdev_dax_pgoff(iter->iomap.bdev, sector, PAGE_SIZE, );
-   if (rc)
-   return rc;
-
id = dax_read_lock();
rc = dax_direct_access(iter->iomap.dax_dev, pgoff, 1, , NULL);
if (rc < 0) {
@@ -1013,14 +1012,10 @@ EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 pfn_t *pfnp)
 {
-   const sector_t sector = dax_iomap_sector(iomap, pos);
-   pgoff_t pgoff;
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
int id, rc;
long length;
 
-   rc = bdev_dax_pgoff(iomap->bdev, sector, size, );
-   if (rc)
-   return rc;
id = dax_read_lock();
length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
   NULL, pfnp);
@@ -1129,7 +1124,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, 
struct vm_fault *vmf,
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
-   pgoff_t pgoff;
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
void *kaddr;
bool page_aligned = false;
@@ -1140,10 +1135,6 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap 
*iomap)
(size == PAGE_SIZE))
page_aligned = true;
 
-   rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, );
-   if (rc)
-   return rc;
-
id = dax_read_lock();
 
if (page_aligned)
@@ -1169,7 +1160,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
const struct iomap *iomap = >iomap;
loff_t length = iomap_length(iomi);
loff_t pos = iomi->pos;
-   struct block_device *bdev = iomap->bdev;
struct dax_device *dax_dev = iomap->dax_dev;
loff_t end = pos + length, done = 0;
ssize_t ret = 0;
@@ -1203,9 +1193,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
const size_t size = ALIGN(length + offset, PAGE_SIZE);
-   const sector_t sector = dax_iomap_sector(iomap, pos);
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
ssize_t map_len;
-   pgoff_t pgoff;
void *kaddr;
 
if (fatal_signal_pending(current)) {
@@ -1213,10 +1202,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
break;
}
 
-   ret = bdev_dax_pgoff(bdev, sector, size, );
-   if (ret)
-  

[PATCH 15/29] xfs: add xfs_zero_range and xfs_truncate_page helpers

2021-11-09 Thread Christoph Hellwig
From: Shiyang Ruan 

Add helpers to prepare for using different DAX operations.

Signed-off-by: Shiyang Ruan 
[hch: split from a larger patch + slight cleanups]
Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_bmap_util.c |  7 +++
 fs/xfs/xfs_file.c  |  3 +--
 fs/xfs/xfs_iomap.c | 25 +
 fs/xfs/xfs_iomap.h |  4 
 fs/xfs/xfs_iops.c  |  7 +++
 fs/xfs/xfs_reflink.c   |  3 +--
 6 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 73a36b7be3bd1..797ea0c8b14e1 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1001,7 +1001,7 @@ xfs_free_file_space(
 
/*
 * Now that we've unmap all full blocks we'll have to zero out any
-* partial block at the beginning and/or end.  iomap_zero_range is smart
+* partial block at the beginning and/or end.  xfs_zero_range is smart
 * enough to skip any holes, including those we just created, but we
 * must take care not to zero beyond EOF and enlarge i_size.
 */
@@ -1009,15 +1009,14 @@ xfs_free_file_space(
return 0;
if (offset + len > XFS_ISIZE(ip))
len = XFS_ISIZE(ip) - offset;
-   error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
-   _buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, offset, len, NULL);
if (error)
return error;
 
/*
 * If we zeroed right up to EOF and EOF straddles a page boundary we
 * must make sure that the post-EOF area is also zeroed because the
-* page could be mmap'd and iomap_zero_range doesn't do that for us.
+* page could be mmap'd and xfs_zero_range doesn't do that for us.
 * Writeback of the eof page will do this, albeit clumsily.
 */
if (offset + len >= XFS_ISIZE(ip) && offset_in_page(offset + len) > 0) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 27594738b0d18..8d4c5ca261bd7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -437,8 +437,7 @@ xfs_file_write_checks(
}
 
trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
-   error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
-   NULL, _buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
if (error)
return error;
} else
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 093758440ad53..d6d71ae9f2ae4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1311,3 +1311,28 @@ xfs_xattr_iomap_begin(
 const struct iomap_ops xfs_xattr_iomap_ops = {
.iomap_begin= xfs_xattr_iomap_begin,
 };
+
+int
+xfs_zero_range(
+   struct xfs_inode*ip,
+   loff_t  pos,
+   loff_t  len,
+   bool*did_zero)
+{
+   struct inode*inode = VFS_I(ip);
+
+   return iomap_zero_range(inode, pos, len, did_zero,
+   _buffered_write_iomap_ops);
+}
+
+int
+xfs_truncate_page(
+   struct xfs_inode*ip,
+   loff_t  pos,
+   bool*did_zero)
+{
+   struct inode*inode = VFS_I(ip);
+
+   return iomap_truncate_page(inode, pos, did_zero,
+  _buffered_write_iomap_ops);
+}
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 7d3703556d0e0..f1a281ab9328c 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -20,6 +20,10 @@ xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode 
*ip,
 int xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
struct xfs_bmbt_irec *, u16);
 
+int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len,
+   bool *did_zero);
+int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, bool *did_zero);
+
 static inline xfs_filblks_t
 xfs_aligned_fsb_count(
xfs_fileoff_t   offset_fsb,
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a607d6aca5c4d..ab5ef52b2a9ff 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -911,8 +911,8 @@ xfs_setattr_size(
 */
if (newsize > oldsize) {
trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
-   error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-   _zeroing, _buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, oldsize, newsize - oldsize,
+   _zeroing);
} else {
/*
 * iomap won't detect a dirty page over an unwritten block (or a
@@ -924,8 +924,7 @@ xfs_setattr_size(
 newsize);
if (error)
return error;
-   erro

[PATCH 12/29] fsdax: remove a pointless __force cast in copy_cow_page_dax

2021-11-09 Thread Christoph Hellwig
Despite its name copy_user_page expected kernel addresses, which is what
we already have.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4e3e5a283a916..73bd1439d8089 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -728,7 +728,7 @@ static int copy_cow_page_dax(struct block_device *bdev, 
struct dax_device *dax_d
return rc;
}
vto = kmap_atomic(to);
-   copy_user_page(vto, (void __force *)kaddr, vaddr, to);
+   copy_user_page(vto, kaddr, vaddr, to);
kunmap_atomic(vto);
dax_read_unlock(id);
return 0;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 16/29] fsdax: simplify the offset check in dax_iomap_zero

2021-11-09 Thread Christoph Hellwig
The file relative offset must have the same alignment as the storage
offset, so use that and get rid of the call to iomap_sector.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5364549d67a48..d7a923d152240 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1123,7 +1123,6 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, 
struct vm_fault *vmf,
 
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
-   sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
void *kaddr;
@@ -1131,8 +1130,7 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap 
*iomap)
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
-   if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
-   (size == PAGE_SIZE))
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
page_aligned = true;
 
id = dax_read_lock();
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/29] dm-stripe: add a stripe_dax_pgoff helper

2021-11-09 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/md/dm-stripe.c | 63 ++
 1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index f084607220293..50dba3f39274c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio 
*bio)
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
-static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
-   long nr_pages, void **kaddr, pfn_t *pfn)
+static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
struct block_device *bdev;
+   sector_t dev_sector;
uint32_t stripe;
-   long ret;
 
-   stripe_map_sector(sc, sector, , _sector);
+   stripe_map_sector(sc, *pgoff * PAGE_SECTORS, , _sector);
dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
bdev = sc->stripe[stripe].dev->bdev;
 
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, );
-   if (ret)
-   return ret;
+   *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
+   return sc->stripe[stripe].dev->dax_dev;
+}
+
+static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
-
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, );
-   if (ret)
-   return ret;
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 09/29] dm-linear: add a linear_dax_pgoff helper

2021-11-09 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/md/dm-linear.c | 49 +-
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 0a260c35aeeed..90de42f6743ac 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -163,63 +163,44 @@ static int linear_iterate_devices(struct dm_target *ti,
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
+static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
+{
+   struct linear_c *lc = ti->private;
+   sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
+
+   *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
+   return lc->dev->dax_dev;
+}
+
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
-   long ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, );
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, );
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
+
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 13/29] fsdax: use a saner calling convention for copy_cow_page_dax

2021-11-09 Thread Christoph Hellwig
Just pass the vm_fault and iomap_iter structures, and figure out the rest
locally.  Note that this requires moving dax_iomap_sector up in the file.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 73bd1439d8089..e51b4129d1b65 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,26 +709,31 @@ int dax_invalidate_mapping_entry_sync(struct 
address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
-static int copy_cow_page_dax(struct block_device *bdev, struct dax_device 
*dax_dev,
-sector_t sector, struct page *to, unsigned long 
vaddr)
+static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
 {
+   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
+}
+
+static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
+{
+   sector_t sector = dax_iomap_sector(>iomap, iter->pos);
void *vto, *kaddr;
pgoff_t pgoff;
long rc;
int id;
 
-   rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, );
+   rc = bdev_dax_pgoff(iter->iomap.bdev, sector, PAGE_SIZE, );
if (rc)
return rc;
 
id = dax_read_lock();
-   rc = dax_direct_access(dax_dev, pgoff, 1, , NULL);
+   rc = dax_direct_access(iter->iomap.dax_dev, pgoff, 1, , NULL);
if (rc < 0) {
dax_read_unlock(id);
return rc;
}
-   vto = kmap_atomic(to);
-   copy_user_page(vto, kaddr, vaddr, to);
+   vto = kmap_atomic(vmf->cow_page);
+   copy_user_page(vto, kaddr, vmf->address, vmf->cow_page);
kunmap_atomic(vto);
dax_read_unlock(id);
return 0;
@@ -1005,11 +1010,6 @@ int dax_writeback_mapping_range(struct address_space 
*mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
-{
-   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
-}
-
 static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 pfn_t *pfnp)
 {
@@ -1332,19 +1332,16 @@ static vm_fault_t dax_fault_synchronous_pfnp(pfn_t 
*pfnp, pfn_t pfn)
 static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
const struct iomap_iter *iter)
 {
-   sector_t sector = dax_iomap_sector(>iomap, iter->pos);
-   unsigned long vaddr = vmf->address;
vm_fault_t ret;
int error = 0;
 
switch (iter->iomap.type) {
case IOMAP_HOLE:
case IOMAP_UNWRITTEN:
-   clear_user_highpage(vmf->cow_page, vaddr);
+   clear_user_highpage(vmf->cow_page, vmf->address);
break;
case IOMAP_MAPPED:
-   error = copy_cow_page_dax(iter->iomap.bdev, iter->iomap.dax_dev,
- sector, vmf->cow_page, vaddr);
+   error = copy_cow_page_dax(vmf, iter);
break;
default:
WARN_ON_ONCE(1);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 10/29] dm-log-writes: add a log_writes_dax_pgoff helper

2021-11-09 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/md/dm-log-writes.c | 42 +++---
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 524bc536922eb..df3cd78223fb2 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -949,17 +949,21 @@ static int log_dax(struct log_writes_c *lc, sector_t 
sector, size_t bytes,
return 0;
 }
 
+static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
+   pgoff_t *pgoff)
+{
+   struct log_writes_c *lc = ti->private;
+
+   *pgoff += (get_start_sect(lc->dev->bdev) >> PAGE_SECTORS_SHIFT);
+   return lc->dev->dax_dev;
+}
+
 static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 long nr_pages, void **kaddr, pfn_t 
*pfn)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
-   int ret;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages * PAGE_SIZE, 
);
-   if (ret)
-   return ret;
-   return dax_direct_access(lc->dev->dax_dev, pgoff, nr_pages, kaddr, pfn);
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
@@ -968,11 +972,9 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
 {
struct log_writes_c *lc = ti->private;
sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
int err;
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
))
-   return 0;
-
/* Don't bother doing anything if logging has been disabled */
if (!lc->logging_enabled)
goto dax_copy;
@@ -983,34 +985,24 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
return 0;
}
 dax_copy:
-   return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
  pgoff_t pgoff, void *addr, size_t 
bytes,
  struct iov_iter *i)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
))
-   return 0;
-   return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages << PAGE_SHIFT,
-);
-   if (ret)
-   return ret;
-   return dax_zero_page_range(lc->dev->dax_dev, pgoff,
-  nr_pages << PAGE_SHIFT);
+   return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
 }
 
 #else
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/29] xfs: factor out a xfs_setup_dax_always helper

2021-11-09 Thread Christoph Hellwig
Factor out another DAX setup helper to simplify future changes.  Also
move the experimental warning after the checks to not clutter the log
too much if the setup failed.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_super.c | 47 +++---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e21459f9923a8..875fd3151d6c9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -340,6 +340,32 @@ xfs_buftarg_is_dax(
bdev_nr_sectors(bt->bt_bdev));
 }
 
+static int
+xfs_setup_dax_always(
+   struct xfs_mount*mp)
+{
+   struct super_block  *sb = mp->m_super;
+
+   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
+  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+   xfs_alert(mp,
+   "DAX unsupported by block device. Turning off DAX.");
+   goto disable_dax;
+   }
+
+   if (xfs_has_reflink(mp)) {
+   xfs_alert(mp, "DAX and reflink cannot be used together!");
+   return -EINVAL;
+   }
+
+   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
+   return 0;
+
+disable_dax:
+   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
+   return 0;
+}
+
 STATIC int
 xfs_blkdev_get(
xfs_mount_t *mp,
@@ -1593,26 +1619,9 @@ xfs_fs_fill_super(
sb->s_flags |= SB_I_VERSION;
 
if (xfs_has_dax_always(mp)) {
-   bool rtdev_is_dax = false, datadev_is_dax;
-
-   xfs_warn(mp,
-   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-
-   datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
-   if (mp->m_rtdev_targp)
-   rtdev_is_dax = xfs_buftarg_is_dax(sb,
-   mp->m_rtdev_targp);
-   if (!rtdev_is_dax && !datadev_is_dax) {
-   xfs_alert(mp,
-   "DAX unsupported by block device. Turning off DAX.");
-   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
-   }
-   if (xfs_has_reflink(mp)) {
-   xfs_alert(mp,
-   "DAX and reflink cannot be used together!");
-   error = -EINVAL;
+   error = xfs_setup_dax_always(mp);
+   if (error)
goto out_filestream_unmount;
-   }
}
 
if (xfs_has_discard(mp)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 04/29] dax: simplify the dax_device <-> gendisk association

2021-11-09 Thread Christoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/dax/bus.c|   6 +-
 drivers/dax/super.c  | 106 +--
 drivers/md/dm.c  |   6 +-
 drivers/nvdimm/pmem.c|   8 ++-
 drivers/s390/block/dcssblk.c |  11 +++-
 fs/fuse/virtio_fs.c  |   2 +-
 include/linux/dax.h  |  19 +--
 7 files changed, 62 insertions(+), 96 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d9..bd7af2f7c5b0a 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1323,10 +1323,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
}
 
/*
-* No 'host' or dax_operations since there is no access to this
-* device outside of mmap of the resulting character device.
+* No dax_operations since there is no access to this device outside of
+* mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e20d0cef10a18..9383c11b21853 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,10 +7,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -26,10 +24,8 @@
  * @flags: state and boolean properties
  */
 struct dax_device {
-   struct hlist_node list;
struct inode inode;
struct cdev cdev;
-   const char *host;
void *private;
unsigned long flags;
const struct dax_operations *ops;
@@ -42,10 +38,6 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
-#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
-static struct hlist_head dax_host_list[DAX_HASH_SIZE];
-static DEFINE_SPINLOCK(dax_host_lock);
-
 int dax_read_lock(void)
 {
return srcu_read_lock(_srcu);
@@ -58,13 +50,22 @@ void dax_read_unlock(int id)
 }
 EXPORT_SYMBOL_GPL(dax_read_unlock);
 
-static int dax_host_hash(const char *host)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
+#include 
+
+static DEFINE_XARRAY(dax_hosts);
+
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
 {
-   return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+   return xa_insert(_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(dax_add_host);
 
-#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-#include 
+void dax_remove_host(struct gendisk *disk)
+{
+   xa_erase(_hosts, (unsigned long)disk);
+}
+EXPORT_SYMBOL_GPL(dax_remove_host);
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
pgoff_t *pgoff)
@@ -82,40 +83,23 @@ EXPORT_SYMBOL(bdev_dax_pgoff);
 
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
- * @host: alternate name for the device registered by a dax driver
+ * @bdev: block device to find a dax_device for
  */
-static struct dax_device *dax_get_by_host(const char *host)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
-   struct dax_device *dax_dev, *found = NULL;
-   int hash, id;
+   struct dax_device *dax_dev;
+   int id;
 
-   if (!host)
+   if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   hash = dax_host_hash(host);
-
id = dax_read_lock();
-   spin_lock(_host_lock);
-   hlist_for_each_entry(dax_dev, _host_list[hash], list) {
-   if (!dax_alive(dax_dev)
-   || strcmp(host, dax_dev->host) != 0)
-   continue;
-
-   if (igrab(_dev->inode))
-   found = dax_dev;
-   break;
-   }
-   spin_unlock(_host_lock);
+   dax_dev = xa_load(_hosts, (unsigned long)bdev->bd_disk);
+   if (!dax_dev || !dax_alive(dax_dev) || !igrab(_dev->inode))
+   dax_dev = NULL;
dax_read_unlock(id);
 
-   return found;
-}
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
-{
-   if (!blk_queue_dax(bdev->bd_disk->queue))
-   return NULL;
-   return dax_get_by_host(bdev->bd_disk->disk_name);
+   return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
 
@@ -361,12 +345,7 @@ void kill_dax(struct dax_device *dax_dev)
return;
 
clear_bit(DAXDEV_ALIVE, _dev->flags);
-
synchronize_srcu(_srcu);
-
-   spin_lock(_host_lock);
-  

[PATCH 08/29] dax: remove dax_capable

2021-11-09 Thread Christoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/dax/super.c  | 36 
 drivers/md/dm-table.c| 22 +++---
 drivers/md/dm.c  | 21 -
 drivers/md/dm.h  |  4 
 drivers/nvdimm/pmem.c|  1 -
 drivers/s390/block/dcssblk.c |  1 -
 fs/erofs/super.c | 11 +++
 fs/ext2/super.c  |  6 --
 fs/ext4/super.c  |  9 ++---
 fs/xfs/xfs_super.c   | 21 -
 include/linux/dax.h  | 14 --
 11 files changed, 36 insertions(+), 110 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 482fe775324a4..803942586d1b6 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
-
-bool generic_fsdax_supported(struct dax_device *dax_dev,
-   struct block_device *bdev, int blocksize, sector_t start,
-   sector_t sectors)
-{
-   if (blocksize != PAGE_SIZE) {
-   pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
-   return false;
-   }
-
-   if (!dax_dev) {
-   pr_debug("%pg: error: dax unsupported by block device\n", bdev);
-   return false;
-   }
-
-   return true;
-}
-EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
-   int blocksize, sector_t start, sector_t len)
-{
-   bool ret = false;
-   int id;
-
-   if (!dax_dev)
-   return false;
-
-   id = dax_read_lock();
-   if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
-   ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
- start, len);
-   dax_read_unlock(id);
-   return ret;
-}
-EXPORT_SYMBOL_GPL(dax_supported);
 #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index bcddc5effd155..f4915a7d5dc84 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -806,12 +806,14 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
+static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   int blocksize = *(int *) data;
+   if (dev->dax_dev)
+   return false;
 
-   return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   DMDEBUG("%pg: error: dax unsupported by block device", dev->bdev);
+   return true;
 }
 
 /* Check devices support synchronous DAX */
@@ -821,8 +823,8 @@ static int device_not_dax_synchronous_capable(struct 
dm_target *ti, struct dm_de
return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
 }
 
-bool dm_table_supports_dax(struct dm_table *t,
-  iterate_devices_callout_fn iterate_fn, int 
*blocksize)
+static bool dm_table_supports_dax(struct dm_table *t,
+  iterate_devices_callout_fn iterate_fn)
 {
struct dm_target *ti;
unsigned i;
@@ -835,7 +837,7 @@ bool dm_table_supports_dax(struct dm_table *t,
return false;
 
if (!ti->type->iterate_devices ||
-   ti->type->iterate_devices(ti, iterate_fn, blocksize))
+   ti->type->iterate_devices(ti, iterate_fn, NULL))
return false;
}
 
@@ -862,7 +864,6 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
-   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -906,7 +907,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, device_not_dax_capable, 
_size) ||
+   if (dm_table_supports_dax(t, device_not_dax_capable) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
}
@@ -1976,7 +1977,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct 
re

[PATCH 01/29] nvdimm/pmem: move dax_attribute_group from dax to pmem

2021-11-09 Thread Christoph Hellwig
dax_attribute_group is only used by the pmem driver, and can avoid the
completely pointless lookup by the disk name if moved there.  This
leaves just a single caller of dax_get_by_host, so move dax_get_by_host
into the same ifdef block as that caller.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Link: https://lore.kernel.org/r/20210922173431.2454024-3-...@lst.de
Signed-off-by: Dan Williams 
---
 drivers/dax/super.c   | 100 --
 drivers/nvdimm/pmem.c |  43 ++
 include/linux/dax.h   |   2 -
 3 files changed, 61 insertions(+), 84 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index fc89e91beea7c..b882cf8106ea3 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,6 +63,24 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
+#ifdef CONFIG_BLOCK
+#include 
+
+int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
+   pgoff_t *pgoff)
+{
+   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
+   phys_addr_t phys_off = (start_sect + sector) * 512;
+
+   if (pgoff)
+   *pgoff = PHYS_PFN(phys_off);
+   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
+   return -EINVAL;
+   return 0;
+}
+EXPORT_SYMBOL(bdev_dax_pgoff);
+
+#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -94,24 +112,6 @@ static struct dax_device *dax_get_by_host(const char *host)
return found;
 }
 
-#ifdef CONFIG_BLOCK
-#include 
-
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
-#if IS_ENABLED(CONFIG_FS_DAX)
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
if (!blk_queue_dax(bdev->bd_disk->queue))
@@ -231,70 +231,6 @@ enum dax_device_flags {
DAXDEV_SYNC,
 };
 
-static ssize_t write_cache_show(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
-   ssize_t rc;
-
-   WARN_ON_ONCE(!dax_dev);
-   if (!dax_dev)
-   return -ENXIO;
-
-   rc = sprintf(buf, "%d\n", !!dax_write_cache_enabled(dax_dev));
-   put_dax(dax_dev);
-   return rc;
-}
-
-static ssize_t write_cache_store(struct device *dev,
-   struct device_attribute *attr, const char *buf, size_t len)
-{
-   bool write_cache;
-   int rc = strtobool(buf, _cache);
-   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
-
-   WARN_ON_ONCE(!dax_dev);
-   if (!dax_dev)
-   return -ENXIO;
-
-   if (rc)
-   len = rc;
-   else
-   dax_write_cache(dax_dev, write_cache);
-
-   put_dax(dax_dev);
-   return len;
-}
-static DEVICE_ATTR_RW(write_cache);
-
-static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
-{
-   struct device *dev = container_of(kobj, typeof(*dev), kobj);
-   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
-
-   WARN_ON_ONCE(!dax_dev);
-   if (!dax_dev)
-   return 0;
-
-#ifndef CONFIG_ARCH_HAS_PMEM_API
-   if (a == _attr_write_cache.attr)
-   return 0;
-#endif
-   return a->mode;
-}
-
-static struct attribute *dax_attributes[] = {
-   _attr_write_cache.attr,
-   NULL,
-};
-
-struct attribute_group dax_attribute_group = {
-   .name = "dax",
-   .attrs = dax_attributes,
-   .is_visible = dax_visible,
-};
-EXPORT_SYMBOL_GPL(dax_attribute_group);
-
 /**
  * dax_direct_access() - translate a device pgoff to an absolute pfn
  * @dax_dev: a dax_device instance representing the logical memory range
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c74d7bceb2224..9cc0d0ebfad16 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -327,6 +327,49 @@ static const struct dax_operations pmem_dax_ops = {
.zero_page_range = pmem_dax_zero_page_range,
 };
 
+static ssize_t write_cache_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct pmem_device *pmem = dev_to_disk(dev)->private_data;
+
+   return sprintf(buf, "%d\n", !!dax_write_cache_enabled(pmem->dax_dev));
+}
+
+static ssize_t write_cache_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t len)
+{
+   struct pmem_device *pmem = d

[PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-11-09 Thread Christoph Hellwig
The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax.  Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER.  This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c| 6 ++
 drivers/md/dm-linear.c | 2 +-
 drivers/md/dm-log-writes.c | 2 +-
 drivers/md/dm-stripe.c | 2 +-
 drivers/md/dm-writecache.c | 2 +-
 drivers/md/dm.c| 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index b882cf8106ea3..e20d0cef10a18 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,7 +63,7 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
-#ifdef CONFIG_BLOCK
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 #include 
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
@@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
sector, size_t size,
 }
 EXPORT_SYMBOL(bdev_dax_pgoff);
 
-#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct 
block_device *bdev,
return ret;
 }
 EXPORT_SYMBOL_GPL(dax_supported);
-#endif /* CONFIG_FS_DAX */
-#endif /* CONFIG_BLOCK */
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
/* !alive + rcu grace period == no new operations / mappings */
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 66ba16713f696..0a260c35aeeed 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -162,7 +162,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 46de085a96709..524bc536922eb 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -903,7 +903,7 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
   struct iov_iter *i)
 {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 6660b6b53d5bf..f084607220293 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 017806096b91e..0af464a863fe6 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -38,7 +38,7 @@
 #define BITMAP_GRANULARITY PAGE_SIZE
 #endif
 
-#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX)
 #define DM_WRITECACHE_HAS_PMEM
 #endif
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 63aa522636585..893fca738a3d8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
+   if (IS_ENABLED(CONFIG_FS_DAX)) {
md->dax_dev = alloc_dax(md, md->disk->disk_name,
_dax_ops, 0);
if (IS_ERR(md->dax_dev))
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/29] dax: remove the pgmap sanity checks in generic_fsdax_supported

2021-11-09 Thread Christoph Hellwig
Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 49 +
 1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9383c11b21853..04fc680542e8d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   bool dax_enabled = false;
pgoff_t pgoff, pgoff_end;
-   void *kaddr, *end_kaddr;
-   pfn_t pfn, end_pfn;
sector_t last_page;
-   long len, len2;
-   int err, id;
+   int err;
 
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
@@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   id = dax_read_lock();
-   len = dax_direct_access(dax_dev, pgoff, 1, , );
-   len2 = dax_direct_access(dax_dev, pgoff_end, 1, _kaddr, _pfn);
-
-   if (len < 1 || len2 < 1) {
-   pr_info("%pg: error: dax access failed (%ld)\n",
-   bdev, len < 1 ? len : len2);
-   dax_read_unlock(id);
-   return false;
-   }
-
-   if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
-   /*
-* An arch that has enabled the pmem api should also
-* have its drivers support pfn_t_devmap()
-*
-* This is a developer warning and should not trigger in
-* production. dax_flush() will crash since it depends
-* on being able to do (page_address(pfn_to_page())).
-*/
-   WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
-   dax_enabled = true;
-   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
-   struct dev_pagemap *pgmap, *end_pgmap;
-
-   pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
-   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
-   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
-   && pfn_t_to_page(pfn)->pgmap == pgmap
-   && pfn_t_to_page(end_pfn)->pgmap == pgmap
-   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
-   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
-   dax_enabled = true;
-   put_dev_pagemap(pgmap);
-   put_dev_pagemap(end_pgmap);
-
-   }
-   dax_read_unlock(id);
-
-   if (!dax_enabled) {
-   pr_info("%pg: error: dax support not enabled\n", bdev);
-   return false;
-   }
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/29] dax: remove CONFIG_DAX_DRIVER

2021-11-09 Thread Christoph Hellwig
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/Kconfig| 4 
 drivers/nvdimm/Kconfig | 2 +-
 drivers/s390/block/Kconfig | 2 +-
 fs/fuse/Kconfig| 2 +-
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d2834c2cfa10d..954ab14ba7778 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,8 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config DAX_DRIVER
-   select DAX
-   bool
-
 menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index b7d1eb38b27d4..347fe7afa5830 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -22,7 +22,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
-   select DAX_DRIVER
+   select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index d0416dbd0cd81..e3710a762abae 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -5,7 +5,7 @@ comment "S/390 block device drivers"
 config DCSSBLK
def_tristate m
select FS_DAX_LIMITED
-   select DAX_DRIVER
+   select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 40ce9a1c12e5d..038ed0b9aaa5d 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -45,7 +45,7 @@ config FUSE_DAX
select INTERVAL_TREE
depends on VIRTIO_FS
depends on FS_DAX
-   depends on DAX_DRIVER
+   depends on DAX
help
  This allows bypassing guest page cache and allows mapping host page
  cache directly in guest address space.
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 06/29] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-11-09 Thread Christoph Hellwig
fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 04fc680542e8d..482fe775324a4 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
+   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
+   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   pr_info("%pg: error: unaligned partition for dax\n", bdev);
+   return NULL;
+   }
+
id = dax_read_lock();
dax_dev = xa_load(_hosts, (unsigned long)bdev->bd_disk);
if (!dax_dev || !dax_alive(dax_dev) || !igrab(_dev->inode))
@@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   pgoff_t pgoff, pgoff_end;
-   sector_t last_page;
-   int err;
-
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
return false;
@@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, );
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
-   last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
-   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, _end);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


decouple DAX from block devices

2021-11-09 Thread Christoph Hellwig
Hi Dan,

this series decouples the DAX from the block layer so that the
block_device is not needed at all for the DAX I/O path.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: futher decouple DAX from block devices

2021-11-04 Thread Christoph Hellwig
On Thu, Nov 04, 2021 at 10:34:17AM -0700, Darrick J. Wong wrote:
> /me wonders, are block devices going away?  Will mkfs.xfs have to learn
> how to talk to certain chardevs?  I guess jffs2 and others already do
> that kind of thing... but I suppose I can wait for the real draft to
> show up to ramble further. ;)

Right now I've mostly been looking into the kernel side.  An no, I
do not expect /dev/pmem* to go away as you'll still need it for a
not DAX aware file system and/or application (such as mkfs initially).

But yes, just pointing mkfs to the chardev should be doable with very
little work.  We can point it to a regular file after all.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: futher decouple DAX from block devices

2021-11-04 Thread Christoph Hellwig
On Wed, Nov 03, 2021 at 12:59:31PM -0500, Eric Sandeen wrote:
> Christoph, can I ask what the end game looks like, here? If dax is completely
> decoupled from block devices, are there user-visible changes?

Yes.

> If I want to
> run fs-dax on a pmem device - what do I point mkfs at, if not a block device?

The rough plan is to use the device dax character devices.  I'll hopefully
have a draft version in the next days.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_blk: corrent types for status handling

2021-10-25 Thread Christoph Hellwig
On Mon, Oct 25, 2021 at 11:24:57AM +0300, Max Gurtovoy wrote:
> Maybe we can compare the returned status to BLK_STS_OK. But I see we don't 
> do it also in NVMe subsystem so I guess we can assume BLK_STS_OK == 0 
> forever.

Jes,  BLK_STS_OK == 0 is an intentional allowed short cut.  It is not
just a block layer design, but part of how the sparse __bitwise__
annotations work.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/11] xfs: factor out a xfs_setup_dax helper

2021-10-19 Thread Christoph Hellwig
On Mon, Oct 18, 2021 at 09:43:51AM -0700, Darrick J. Wong wrote:
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
> > bdev_nr_sectors(bt->bt_bdev));
> >  }
> >  
> > +static int
> > +xfs_setup_dax(
> 
> /me wonders if this should be named xfs_setup_dax_always, since this
> doesn't handle the dax=inode mode?

Sure, why not.

> The only reason I bring that up is that Eric reminded me a while ago
> that we don't actually print any kind of EXPERIMENTAL warning for the
> auto-detection behavior.

Yes, I actually noticed that as well when preparing this series.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/11] dax: move bdev_dax_pgoff to fs/dax.c

2021-10-17 Thread Christoph Hellwig
No functional changet, but this will allow for a tighter integration
with the iomap code, including possible passing the partition offset
in the iomap in the future.  For now it mostly avoids growing more
callers outside of fs/dax.c.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 14 --
 fs/dax.c| 13 +
 include/linux/dax.h |  1 -
 3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 803942586d1b6..c0910687fbcb2 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(dax_remove_host);
 
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
diff --git a/fs/dax.c b/fs/dax.c
index 4e3e5a283a916..eb715363fd667 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,6 +709,19 @@ int dax_invalidate_mapping_entry_sync(struct address_space 
*mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
+static int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t 
size,
+   pgoff_t *pgoff)
+{
+   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
+   phys_addr_t phys_off = (start_sect + sector) * 512;
+
+   if (pgoff)
+   *pgoff = PHYS_PFN(phys_off);
+   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
+   return -EINVAL;
+   return 0;
+}
+
 static int copy_cow_page_dax(struct block_device *bdev, struct dax_device 
*dax_dev,
 sector_t sector, struct page *to, unsigned long 
vaddr)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 439c3c70e347b..324363b798ecd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -107,7 +107,6 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
 #endif
 
 struct writeback_control;
-int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
 #if IS_ENABLED(CONFIG_FS_DAX)
 int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
 void dax_remove_host(struct gendisk *disk);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper

2021-10-17 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-stripe.c | 63 ++
 1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index f084607220293..50dba3f39274c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio 
*bio)
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
-static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
-   long nr_pages, void **kaddr, pfn_t *pfn)
+static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
struct block_device *bdev;
+   sector_t dev_sector;
uint32_t stripe;
-   long ret;
 
-   stripe_map_sector(sc, sector, , _sector);
+   stripe_map_sector(sc, *pgoff * PAGE_SECTORS, , _sector);
dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
bdev = sc->stripe[stripe].dev->bdev;
 
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, );
-   if (ret)
-   return ret;
+   *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
+   return sc->stripe[stripe].dev->dax_dev;
+}
+
+static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, );
 
-   stripe_map_sector(sc, sector, , _sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
-
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, );
-   if (ret)
-   return ret;
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper

2021-10-17 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-log-writes.c | 42 +++---
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 6d694526881d0..5aac60c1b774c 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -949,17 +949,21 @@ static int log_dax(struct log_writes_c *lc, sector_t 
sector, size_t bytes,
return 0;
 }
 
+static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
+   pgoff_t *pgoff)
+{
+   struct log_writes_c *lc = ti->private;
+
+   *pgoff += (get_start_sect(lc->dev->bdev) >> PAGE_SECTORS_SHIFT);
+   return lc->dev->dax_dev;
+}
+
 static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 long nr_pages, void **kaddr, pfn_t 
*pfn)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
-   int ret;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages * PAGE_SIZE, 
);
-   if (ret)
-   return ret;
-   return dax_direct_access(lc->dev->dax_dev, pgoff, nr_pages, kaddr, pfn);
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
@@ -968,11 +972,9 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
 {
struct log_writes_c *lc = ti->private;
sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
int err;
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
))
-   return 0;
-
/* Don't bother doing anything if logging has been disabled */
if (!lc->logging_enabled)
goto dax_copy;
@@ -983,34 +985,24 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
return 0;
}
 dax_copy:
-   return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
  pgoff_t pgoff, void *addr, size_t 
bytes,
  struct iov_iter *i)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
))
-   return 0;
-   return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, );
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages << PAGE_SHIFT,
-);
-   if (ret)
-   return ret;
-   return dax_zero_page_range(lc->dev->dax_dev, pgoff,
-  nr_pages << PAGE_SHIFT);
+   return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
 }
 
 #else
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 06/11] xfs: factor out a xfs_setup_dax helper

2021-10-17 Thread Christoph Hellwig
Factor out another DAX setup helper to simplify future changes.  Also
move the experimental warning after the checks to not clutter the log
too much if the setup failed.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_super.c | 47 +++---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index c4e0cd1c1c8ca..d07020a8eb9e3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
bdev_nr_sectors(bt->bt_bdev));
 }
 
+static int
+xfs_setup_dax(
+   struct xfs_mount*mp)
+{
+   struct super_block  *sb = mp->m_super;
+
+   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
+  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+   xfs_alert(mp,
+   "DAX unsupported by block device. Turning off DAX.");
+   goto disable_dax;
+   }
+
+   if (xfs_has_reflink(mp)) {
+   xfs_alert(mp, "DAX and reflink cannot be used together!");
+   return -EINVAL;
+   }
+
+   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
+   return 0;
+
+disable_dax:
+   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
+   return 0;
+}
+
 STATIC int
 xfs_blkdev_get(
xfs_mount_t *mp,
@@ -1592,26 +1618,9 @@ xfs_fs_fill_super(
sb->s_flags |= SB_I_VERSION;
 
if (xfs_has_dax_always(mp)) {
-   bool rtdev_is_dax = false, datadev_is_dax;
-
-   xfs_warn(mp,
-   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-
-   datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
-   if (mp->m_rtdev_targp)
-   rtdev_is_dax = xfs_buftarg_is_dax(sb,
-   mp->m_rtdev_targp);
-   if (!rtdev_is_dax && !datadev_is_dax) {
-   xfs_alert(mp,
-   "DAX unsupported by block device. Turning off DAX.");
-   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
-   }
-   if (xfs_has_reflink(mp)) {
-   xfs_alert(mp,
-   "DAX and reflink cannot be used together!");
-   error = -EINVAL;
+   error = xfs_setup_dax(mp);
+   if (error)
goto out_filestream_unmount;
-   }
}
 
if (xfs_has_discard(mp)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/11] dax: remove dax_capable

2021-10-17 Thread Christoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c  | 36 
 drivers/md/dm-table.c| 22 +++---
 drivers/md/dm.c  | 21 -
 drivers/md/dm.h  |  4 
 drivers/nvdimm/pmem.c|  1 -
 drivers/s390/block/dcssblk.c |  1 -
 fs/erofs/super.c | 11 +++
 fs/ext2/super.c  |  6 --
 fs/ext4/super.c  |  9 ++---
 fs/xfs/xfs_super.c   | 21 -
 include/linux/dax.h  | 14 --
 11 files changed, 36 insertions(+), 110 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 482fe775324a4..803942586d1b6 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
-
-bool generic_fsdax_supported(struct dax_device *dax_dev,
-   struct block_device *bdev, int blocksize, sector_t start,
-   sector_t sectors)
-{
-   if (blocksize != PAGE_SIZE) {
-   pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
-   return false;
-   }
-
-   if (!dax_dev) {
-   pr_debug("%pg: error: dax unsupported by block device\n", bdev);
-   return false;
-   }
-
-   return true;
-}
-EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
-   int blocksize, sector_t start, sector_t len)
-{
-   bool ret = false;
-   int id;
-
-   if (!dax_dev)
-   return false;
-
-   id = dax_read_lock();
-   if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
-   ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
- start, len);
-   dax_read_unlock(id);
-   return ret;
-}
-EXPORT_SYMBOL_GPL(dax_supported);
 #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 1fa4d5582dca5..4ae671c2168ea 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
+static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   int blocksize = *(int *) data;
+   if (dev->dax_dev)
+   return false;
 
-   return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);
+   return true;
 }
 
 /* Check devices support synchronous DAX */
@@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct 
dm_target *ti, struct dm_de
return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
 }
 
-bool dm_table_supports_dax(struct dm_table *t,
-  iterate_devices_callout_fn iterate_fn, int 
*blocksize)
+static bool dm_table_supports_dax(struct dm_table *t,
+  iterate_devices_callout_fn iterate_fn)
 {
struct dm_target *ti;
unsigned i;
@@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t,
return false;
 
if (!ti->type->iterate_devices ||
-   ti->type->iterate_devices(ti, iterate_fn, blocksize))
+   ti->type->iterate_devices(ti, iterate_fn, NULL))
return false;
}
 
@@ -863,7 +865,6 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
-   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -907,7 +908,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, device_not_dax_capable, 
_size) ||
+   if (dm_table_supports_dax(t, device_not_dax_capable) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
}
@@ -1981,7 +1982,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct 
request_queue *

[PATCH 08/11] dm-linear: add a linear_dax_pgoff helper

2021-10-17 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-linear.c | 49 +-
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 32fbab11bf90c..bf03f73fd0f36 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -164,63 +164,44 @@ static int linear_iterate_devices(struct dm_target *ti,
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
+static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
+{
+   struct linear_c *lc = ti->private;
+   sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
+
+   *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
+   return lc->dev->dax_dev;
+}
+
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
-   long ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, );
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), ))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, );
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, );
+
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/11] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-10-17 Thread Christoph Hellwig
fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 04fc680542e8d..482fe775324a4 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
+   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
+   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   pr_info("%pg: error: unaligned partition for dax\n", bdev);
+   return NULL;
+   }
+
id = dax_read_lock();
dax_dev = xa_load(_hosts, (unsigned long)bdev->bd_disk);
if (!dax_dev || !dax_alive(dax_dev) || !igrab(_dev->inode))
@@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   pgoff_t pgoff, pgoff_end;
-   sector_t last_page;
-   int err;
-
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
return false;
@@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, );
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
-   last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
-   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, _end);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/11] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-10-17 Thread Christoph Hellwig
The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax.  Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER.  This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c| 6 ++
 drivers/md/dm-linear.c | 2 +-
 drivers/md/dm-log-writes.c | 2 +-
 drivers/md/dm-stripe.c | 2 +-
 drivers/md/dm-writecache.c | 2 +-
 drivers/md/dm.c| 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index b882cf8106ea3..e20d0cef10a18 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,7 +63,7 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
-#ifdef CONFIG_BLOCK
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 #include 
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
@@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
sector, size_t size,
 }
 EXPORT_SYMBOL(bdev_dax_pgoff);
 
-#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct 
block_device *bdev,
return ret;
 }
 EXPORT_SYMBOL_GPL(dax_supported);
-#endif /* CONFIG_FS_DAX */
-#endif /* CONFIG_BLOCK */
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
/* !alive + rcu grace period == no new operations / mappings */
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 679b4c0a2eea1..32fbab11bf90c 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -163,7 +163,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index d93a4db235124..6d694526881d0 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -903,7 +903,7 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
   struct iov_iter *i)
 {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 6660b6b53d5bf..f084607220293 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 18320444fb0a9..4c3a6e33604d3 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -38,7 +38,7 @@
 #define BITMAP_GRANULARITY PAGE_SIZE
 #endif
 
-#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX)
 #define DM_WRITECACHE_HAS_PMEM
 #endif
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7870e6460633f..79737aee516b1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
+   if (IS_ENABLED(CONFIG_FS_DAX)) {
md->dax_dev = alloc_dax(md, md->disk->disk_name,
_dax_ops, 0);
if (IS_ERR(md->dax_dev))
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 04/11] dax: remove the pgmap sanity checks in generic_fsdax_supported

2021-10-17 Thread Christoph Hellwig
Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 49 +
 1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9383c11b21853..04fc680542e8d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   bool dax_enabled = false;
pgoff_t pgoff, pgoff_end;
-   void *kaddr, *end_kaddr;
-   pfn_t pfn, end_pfn;
sector_t last_page;
-   long len, len2;
-   int err, id;
+   int err;
 
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
@@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   id = dax_read_lock();
-   len = dax_direct_access(dax_dev, pgoff, 1, , );
-   len2 = dax_direct_access(dax_dev, pgoff_end, 1, _kaddr, _pfn);
-
-   if (len < 1 || len2 < 1) {
-   pr_info("%pg: error: dax access failed (%ld)\n",
-   bdev, len < 1 ? len : len2);
-   dax_read_unlock(id);
-   return false;
-   }
-
-   if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
-   /*
-* An arch that has enabled the pmem api should also
-* have its drivers support pfn_t_devmap()
-*
-* This is a developer warning and should not trigger in
-* production. dax_flush() will crash since it depends
-* on being able to do (page_address(pfn_to_page())).
-*/
-   WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
-   dax_enabled = true;
-   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
-   struct dev_pagemap *pgmap, *end_pgmap;
-
-   pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
-   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
-   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
-   && pfn_t_to_page(pfn)->pgmap == pgmap
-   && pfn_t_to_page(end_pfn)->pgmap == pgmap
-   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
-   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
-   dax_enabled = true;
-   put_dev_pagemap(pgmap);
-   put_dev_pagemap(end_pgmap);
-
-   }
-   dax_read_unlock(id);
-
-   if (!dax_enabled) {
-   pr_info("%pg: error: dax support not enabled\n", bdev);
-   return false;
-   }
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 02/11] dax: remove CONFIG_DAX_DRIVER

2021-10-17 Thread Christoph Hellwig
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/Kconfig| 4 
 drivers/nvdimm/Kconfig | 2 +-
 drivers/s390/block/Kconfig | 2 +-
 fs/fuse/Kconfig| 2 +-
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d2834c2cfa10d..954ab14ba7778 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,8 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config DAX_DRIVER
-   select DAX
-   bool
-
 menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index b7d1eb38b27d4..347fe7afa5830 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -22,7 +22,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
-   select DAX_DRIVER
+   select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index d0416dbd0cd81..e3710a762abae 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -5,7 +5,7 @@ comment "S/390 block device drivers"
 config DCSSBLK
def_tristate m
select FS_DAX_LIMITED
-   select DAX_DRIVER
+   select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 40ce9a1c12e5d..038ed0b9aaa5d 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -45,7 +45,7 @@ config FUSE_DAX
select INTERVAL_TREE
depends on VIRTIO_FS
depends on FS_DAX
-   depends on DAX_DRIVER
+   depends on DAX
help
  This allows bypassing guest page cache and allows mapping host page
  cache directly in guest address space.
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


futher decouple DAX from block devices

2021-10-17 Thread Christoph Hellwig
Hi Dan,

this series cleans up and simplifies the association between DAX and block
devices in preparation of allowing to mount file systems directly on DAX
devices without a detour through block devices.

Diffstat:
 drivers/dax/Kconfig  |4 
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |  220 +--
 drivers/md/dm-linear.c   |   51 +++--
 drivers/md/dm-log-writes.c   |   44 +++-
 drivers/md/dm-stripe.c   |   65 +++-
 drivers/md/dm-table.c|   22 ++--
 drivers/md/dm-writecache.c   |2 
 drivers/md/dm.c  |   29 -
 drivers/md/dm.h  |4 
 drivers/nvdimm/Kconfig   |2 
 drivers/nvdimm/pmem.c|9 -
 drivers/s390/block/Kconfig   |2 
 drivers/s390/block/dcssblk.c |   12 +-
 fs/dax.c |   13 ++
 fs/erofs/super.c |   11 +-
 fs/ext2/super.c  |6 -
 fs/ext4/super.c  |9 +
 fs/fuse/Kconfig  |2 
 fs/fuse/virtio_fs.c  |2 
 fs/xfs/xfs_super.c   |   54 +-
 include/linux/dax.h  |   30 ++---
 22 files changed, 185 insertions(+), 410 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/11] dax: simplify the dax_device <-> gendisk association

2021-10-17 Thread Christoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitl calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/bus.c|   2 +-
 drivers/dax/super.c  | 106 +--
 drivers/md/dm.c  |   6 +-
 drivers/nvdimm/pmem.c|   8 ++-
 drivers/s390/block/dcssblk.c |  11 +++-
 fs/fuse/virtio_fs.c  |   2 +-
 include/linux/dax.h  |  19 +--
 7 files changed, 60 insertions(+), 94 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d9..6d91b0186e3be 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1326,7 +1326,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e20d0cef10a18..9383c11b21853 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,10 +7,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -26,10 +24,8 @@
  * @flags: state and boolean properties
  */
 struct dax_device {
-   struct hlist_node list;
struct inode inode;
struct cdev cdev;
-   const char *host;
void *private;
unsigned long flags;
const struct dax_operations *ops;
@@ -42,10 +38,6 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
-#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
-static struct hlist_head dax_host_list[DAX_HASH_SIZE];
-static DEFINE_SPINLOCK(dax_host_lock);
-
 int dax_read_lock(void)
 {
return srcu_read_lock(_srcu);
@@ -58,13 +50,22 @@ void dax_read_unlock(int id)
 }
 EXPORT_SYMBOL_GPL(dax_read_unlock);
 
-static int dax_host_hash(const char *host)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
+#include 
+
+static DEFINE_XARRAY(dax_hosts);
+
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
 {
-   return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+   return xa_insert(_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(dax_add_host);
 
-#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-#include 
+void dax_remove_host(struct gendisk *disk)
+{
+   xa_erase(_hosts, (unsigned long)disk);
+}
+EXPORT_SYMBOL_GPL(dax_remove_host);
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
pgoff_t *pgoff)
@@ -82,40 +83,23 @@ EXPORT_SYMBOL(bdev_dax_pgoff);
 
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
- * @host: alternate name for the device registered by a dax driver
+ * @bdev: block device to find a dax_device for
  */
-static struct dax_device *dax_get_by_host(const char *host)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
-   struct dax_device *dax_dev, *found = NULL;
-   int hash, id;
+   struct dax_device *dax_dev;
+   int id;
 
-   if (!host)
+   if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   hash = dax_host_hash(host);
-
id = dax_read_lock();
-   spin_lock(_host_lock);
-   hlist_for_each_entry(dax_dev, _host_list[hash], list) {
-   if (!dax_alive(dax_dev)
-   || strcmp(host, dax_dev->host) != 0)
-   continue;
-
-   if (igrab(_dev->inode))
-   found = dax_dev;
-   break;
-   }
-   spin_unlock(_host_lock);
+   dax_dev = xa_load(_hosts, (unsigned long)bdev->bd_disk);
+   if (!dax_dev || !dax_alive(dax_dev) || !igrab(_dev->inode))
+   dax_dev = NULL;
dax_read_unlock(id);
 
-   return found;
-}
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
-{
-   if (!blk_queue_dax(bdev->bd_disk->queue))
-   return NULL;
-   return dax_get_by_host(bdev->bd_disk->disk_name);
+   return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
 
@@ -361,12 +345,7 @@ void kill_dax(struct dax_device *dax_dev)
return;
 
clear_bit(DAXDEV_ALIVE, _dev->flags);
-
synchronize_srcu(_srcu);
-
-   spin_lock(_host_lock);
-   hlist_del_init(_dev->list);
-   spin_unlock(_host_lock);
 }
 EXPORT_SYMBOL_GPL(kill_dax);
 
@@ -398,8 +377,6 @@ static struct dax_device *to_dax_dev(struct inode *inode)
 static void da

Re: [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer

2021-10-12 Thread Christoph Hellwig
The whole series looks good to me:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Christoph Hellwig
On Mon, Oct 11, 2021 at 03:09:09PM -0400, Michael S. Tsirkin wrote:
> The reason we have trouble is that it's not clear what does the API mean
> outside the realm of TDX.
> If we really, truly want an API that says "ioremap and it's a hardened
> driver" then I guess ioremap_hardened_driver is what you want.

Yes.  And why would be we ioremap the BIOS anyway?  It is not I/O memory
in any of the senses we generally use ioremap for.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5] virtio-blk: Add validation for block size in config space

2021-10-11 Thread Christoph Hellwig
On Tue, Oct 05, 2021 at 06:42:43AM -0400, Michael S. Tsirkin wrote:
> Stefan also pointed out this duplicates the logic from 
> 
> if (blksize < 512 || blksize > PAGE_SIZE || !is_power_of_2(blksize))
> return -EINVAL;
> 
> 
> and a bunch of other places.
> 
> 
> Would it be acceptable for blk layer to validate the input
> instead of having each driver do it's own thing?
> Maybe inside blk_queue_logical_block_size?

I'm pretty sure we want down that before.  Let's just add a helper
just for that check for now as part of this series.  Actually validating
in in blk_queue_logical_block_size seems like a good idea, but returning
errors from that has a long tail.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Christoph Hellwig
Just as last time:  This does not make any sense.  ioremap is shared
by definition.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/1] virtio-blk: avoid preallocating big SGL for data

2021-09-27 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] virtio-blk: avoid preallocating big SGL for data

2021-09-27 Thread Christoph Hellwig
On Mon, Sep 27, 2021 at 12:53:14PM +0100, Christoph Hellwig wrote:
> Looks good,
> 
> Reviewed-by: Christoph Hellwig 

Err, sorry.  This was supposed to go to the lastest iteration, I'll
add it there.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] virtio-blk: avoid preallocating big SGL for data

2021-09-27 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


<    1   2   3   4   5   6   7   8   9   10   >