Re: [PATCH 05/26] block: Allow using bio_attempt_back_merge() internally

2024-02-02 Thread Bart Van Assche

On 2/1/24 23:30, Damien Le Moal wrote:

Remove the static definition of bio_attempt_back_merge() to allow using

  
To me this suggests that the function definition is removed entirely but
that is not what this patch does ...

Otherwise this patch looks good to me.

Thanks,

Bart.



Re: [PATCH 03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start()

2024-02-02 Thread Bart Van Assche

On 2/1/24 23:30, Damien Le Moal wrote:

Implement the inline helper functions bio_straddle_zones() and
bio_offset_from_zone_start() to respectively test if a BIO crosses a
zone boundary (the start sector and last sector belong to different
zones) and to obtain the oofset from a zone starting sector of a BIO.


oofset -> offset

+static inline bool bio_straddle_zones(struct bio *bio)
+{
+   return bio_zone_no(bio) !=
+   disk_zone_no(bio->bi_bdev->bd_disk, bio_end_sector(bio) - 1);
+}


It seems to me that the above code is not intended to handle the case
where bi_size == 0, as is the case for an empty flush request. Should a
comment be added above this function or do we perhaps need to add a
WARN_ON_ONCE() statement?

Thanks,

Bart.



Re: [PATCH v5 00/14] dm-raid/md/raid: fix v6.7 regressions

2024-02-02 Thread Benjamin Marzinski
On Thu, Feb 01, 2024 at 05:25:45PM +0800, Yu Kuai wrote:
> From: Yu Kuai 
> I apply this patchset on top of v6.8-rc1, and run lvm2 tests suite with
> folling cmd for 24 round(for about 2 days):
> 
> for t in `ls test/shell`; do
> if cat test/shell/$t | grep raid &> /dev/null; then
> make check T=shell/$t
> fi
> done
> 
> failed count failed test
>   1 ###   failed: [ndev-vanilla] shell/dmsecuretest.sh
>   1 ###   failed: [ndev-vanilla] shell/dmsetup-integrity-keys.sh
>   1 ###   failed: [ndev-vanilla] shell/dmsetup-keyring.sh
>   5 ###   failed: [ndev-vanilla] shell/duplicate-pvs-md0.sh
>   1 ###   failed: [ndev-vanilla] shell/duplicate-vgid.sh
>   2 ###   failed: [ndev-vanilla] shell/duplicate-vgnames.sh
>   1 ###   failed: [ndev-vanilla] shell/fsadm-crypt.sh
>   1 ###   failed: [ndev-vanilla] shell/integrity.sh
>   6 ###   failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
>   2 ###   failed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh
>   5 ###   failed: [ndev-vanilla] 
> shell/lvconvert-raid-reshape-stripes-load-reload.sh
>   4 ###   failed: [ndev-vanilla] 
> shell/lvconvert-raid-restripe-linear.sh
>   1 ###   failed: [ndev-vanilla] 
> shell/lvconvert-raid1-split-trackchanges.sh
>  20 ###   failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
>  20 ###   failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
>  24 ###   failed: [ndev-vanilla] shell/lvextend-raid.sh
> 
> And I ramdomly pick some tests verified by hand that these test will
> fail in v6.6 as well(not all tests):
> 
> shell/lvextend-raid.sh
> shell/lvcreate-large-raid.sh
> shell/lvconvert-repair-raid.sh
> shell/lvchange-rebuild-raid.sh
> shell/lvchange-raid1-writemostly.sh

In my testing with this patchset on top of the head of linus's tree
(5c24e4e9e708) I am seeing failures in
shell/lvconvert-raid-reshape-stripes-load-reload.sh and
shell/lvconvert-repair-raid.sh in about 20% of my runs. I have never
seen either of these these fail running on the 6.6 kernel (ffc253263a13).

lvconvert-repair-raid.sh creates a raid array and then disables one if
its drives before there's enough time to finish the initial sync and
tries to repair it. This is supposed to fail (it uses dm-delay devices
to slow down the sync). When the test succeeds, I see things like this:

[ 0:13.469] #lvconvert-repair-raid.sh:161+ lvcreate --type raid10 -m 1 -i 2 -L 
64 -n LV1 LVMTEST191946vg 
/tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv1 
/tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv2 
/tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv3 
/tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv4
[ 0:13.469]   Using default stripesize 64.00 KiB.
[ 0:13.483]   Logical volume "LV1" created.
[ 0:14.042] 6,8908,1194343108,-;device-mapper: raid: Superblocks created for 
new raid set
[ 0:14.042] 5,8909,1194348704,-;md/raid10:mdX: not clean -- starting background 
reconstruction
[ 0:14.042] 6,8910,1194349443,-;md/raid10:mdX: active with 4 out of 4 devices
[ 0:14.042] 4,8911,1194459161,-;mdX: bitmap file is out of date, doing full 
recovery
[ 0:14.042] 6,8912,1194563810,-;md: resync of RAID array mdX
[ 0:14.042]   WARNING: This metadata update is NOT backed up.
[ 0:14.042] aux disable_dev "$dev4"
[ 0:14.058] #lvconvert-repair-raid.sh:163+ aux disable_dev 
/tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv4
[ 0:14.058] Disabling device 
/tmp/LVMTEST191946.ImUMG6dyqB/dev/mapper/LVMTEST191946pv4 (253:5)
[ 0:14.101] not lvconvert -y --repair $vg/$lv1

When it fails, I see:

[ 0:13.831] #lvconvert-repair-raid.sh:161+ lvcreate --type raid10 -m 1 -i 2 -L 
64 -n LV1 LVMTEST192248vg 
/tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv1 
/tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv2 
/tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv3 
/tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192248pv4
[ 0:13.831]   Using default stripesize 64.00 KiB.
[ 0:13.847]   Logical volume "LV1" created.
[ 0:14.499]   WARNING: This metadata update is NOT backed up.
[ 0:14.499] 6,8925,1187444256,-;device-mapper: raid: Superblocks created for 
new raid set
[ 0:14.499] 5,8926,1187449525,-;md/raid10:mdX: not clean -- starting background 
reconstruction
[ 0:14.499] 6,8927,1187450148,-;md/raid10:mdX: active with 4 out of 4 devices
[ 0:14.499] 6,8928,1187452472,-;md: resync of RAID array mdX
[ 0:14.499] 6,8929,1187453016,-;md: mdX: resync done.
[ 0:14.499] 4,8930,1187555486,-;mdX: bitmap file is out of date, doing full 
recovery
[ 0:14.499] aux disable_dev "$dev4"
[ 0:14.515] #lvconvert-repair-raid.sh:163+ aux disable_dev /tmp/LVMTEST192248.AT
cecgSGfE/dev/mapper/LVMTEST192248pv4
[ 0:14.515] Disabling device /tmp/LVMTEST192248.ATcecgSGfE/dev/mapper/LVMTEST192
248pv4 (253:5)
[ 0:14.554] not lvconvert -y --repair $vg/$lv1

To me the important looking difference (and I admi

[RFC PATCH v4 11/12] dcssblk: Cleanup alloc_dax() error handling

2024-02-02 Thread Mathieu Desnoyers
Now that alloc_dax() returns ERR_PTR(-EOPNOTSUPP) rather than NULL,
the callers do not have to handle NULL return values anymore.

Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
Cc: linux-s...@vger.kernel.org
---
 drivers/s390/block/dcssblk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index a3010849bfed..f363c1d51d9a 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -679,8 +679,8 @@ dcssblk_add_store(struct device *dev, struct 
device_attribute *attr, const char
goto put_dev;
 
dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
-   if (IS_ERR_OR_NULL(dax_dev)) {
-   rc = IS_ERR(dax_dev) ? PTR_ERR(dax_dev) : -EOPNOTSUPP;
+   if (IS_ERR(dax_dev)) {
+   rc = PTR_ERR(dax_dev);
goto put_dev;
}
set_dax_synchronous(dax_dev);
-- 
2.39.2




[RFC PATCH v4 12/12] virtio: Cleanup alloc_dax() error handling

2024-02-02 Thread Mathieu Desnoyers
Now that alloc_dax() returns ERR_PTR(-EOPNOTSUPP) rather than NULL,
the callers do not have to handle NULL return values anymore.

Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
Cc: linux-s...@vger.kernel.org
---
 fs/fuse/virtio_fs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 621b1bca2d55..a28466c2da71 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -809,8 +809,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, 
struct virtio_fs *fs)
return 0;
 
dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
-   if (IS_ERR_OR_NULL(dax_dev)) {
-   int rc = IS_ERR(dax_dev) ? PTR_ERR(dax_dev) : -EOPNOTSUPP;
+   if (IS_ERR(dax_dev)) {
+   int rc = PTR_ERR(dax_dev);
return rc == -EOPNOTSUPP ? 0 : rc;
}
 
-- 
2.39.2




[RFC PATCH v4 09/12] nvdimm/pmem: Cleanup alloc_dax() error handling

2024-02-02 Thread Mathieu Desnoyers
Now that alloc_dax() returns ERR_PTR(-EOPNOTSUPP) rather than NULL,
the callers do not have to handle NULL return values anymore.

Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/nvdimm/pmem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f1d9f5c6dbac..e9898457a7bd 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -558,8 +558,8 @@ static int pmem_attach_disk(struct device *dev,
disk->bb = &pmem->bb;
 
dax_dev = alloc_dax(pmem, &pmem_dax_ops);
-   if (IS_ERR_OR_NULL(dax_dev)) {
-   rc = IS_ERR(dax_dev) ? PTR_ERR(dax_dev) : -EOPNOTSUPP;
+   if (IS_ERR(dax_dev)) {
+   rc = PTR_ERR(dax_dev);
if (rc != -EOPNOTSUPP)
goto out;
} else {
-- 
2.39.2




[RFC PATCH v4 10/12] dm: Cleanup alloc_dax() error handling

2024-02-02 Thread Mathieu Desnoyers
Now that alloc_dax() returns ERR_PTR(-EOPNOTSUPP) rather than NULL,
the callers do not have to handle NULL return values anymore.

Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/md/dm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 2fc22cae9089..acdc00bc05be 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2124,8 +2124,8 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
dax_dev = alloc_dax(md, &dm_dax_ops);
-   if (IS_ERR_OR_NULL(dax_dev)) {
-   if (IS_ERR(dax_dev) && PTR_ERR(dax_dev) != -EOPNOTSUPP)
+   if (IS_ERR(dax_dev)) {
+   if (PTR_ERR(dax_dev) != -EOPNOTSUPP)
goto bad;
} else {
set_dax_nocache(dax_dev);
-- 
2.39.2




[RFC PATCH v4 07/12] Introduce cpu_dcache_is_aliasing() across all architectures

2024-02-02 Thread Mathieu Desnoyers
Introduce a generic way to query whether the data cache is virtually
aliased on all architectures. Its purpose is to ensure that subsystems
which are incompatible with virtually aliased data caches (e.g. FS_DAX)
can reliably query this.

For data cache aliasing, there are three scenarios dependending on the
architecture. Here is a breakdown based on my understanding:

A) The data cache is always aliasing:

* arc
* csky
* m68k (note: shared memory mappings are incoherent ? SHMLBA is missing there.)
* sh
* parisc

B) The data cache aliasing is statically known or depends on querying CPU
   state at runtime:

* arm (cache_is_vivt() || cache_is_vipt_aliasing())
* mips (cpu_has_dc_aliases)
* nios2 (NIOS2_DCACHE_SIZE > PAGE_SIZE)
* sparc32 (vac_cache_size > PAGE_SIZE)
* sparc64 (L1DCACHE_SIZE > PAGE_SIZE)
* xtensa (DCACHE_WAY_SIZE > PAGE_SIZE)

C) The data cache is never aliasing:

* alpha
* arm64 (aarch64)
* hexagon
* loongarch (but with incoherent write buffers, which are disabled since
 commit d23b7795 ("LoongArch: Change SHMLBA from SZ_64K to 
PAGE_SIZE"))
* microblaze
* openrisc
* powerpc
* riscv
* s390
* um
* x86

Require architectures in A) and B) to select ARCH_HAS_CPU_CACHE_ALIASING and
implement "cpu_dcache_is_aliasing()".

Architectures in C) don't select ARCH_HAS_CPU_CACHE_ALIASING, and thus
cpu_dcache_is_aliasing() simply evaluates to "false".

Note that this leaves "cpu_icache_is_aliasing()" to be implemented as future
work. This would be useful to gate features like XIP on architectures
which have aliasing CPU dcache-icache but not CPU dcache-dcache.

Use "cpu_dcache" and "cpu_cache" rather than just "dcache" and "cache"
to clarify that we really mean "CPU data cache" and "CPU cache" to
eliminate any possible confusion with VFS "dentry cache" and "page
cache".

Link: https://lore.kernel.org/lkml/20030910210416.ga24...@mail.jlokier.co.uk/
Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Signed-off-by: Mathieu Desnoyers 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 arch/arc/Kconfig|  1 +
 arch/arc/include/asm/cachetype.h|  9 +
 arch/arm/Kconfig|  1 +
 arch/arm/include/asm/cachetype.h|  2 ++
 arch/csky/Kconfig   |  1 +
 arch/csky/include/asm/cachetype.h   |  9 +
 arch/m68k/Kconfig   |  1 +
 arch/m68k/include/asm/cachetype.h   |  9 +
 arch/mips/Kconfig   |  1 +
 arch/mips/include/asm/cachetype.h   |  9 +
 arch/nios2/Kconfig  |  1 +
 arch/nios2/include/asm/cachetype.h  | 10 ++
 arch/parisc/Kconfig |  1 +
 arch/parisc/include/asm/cachetype.h |  9 +
 arch/sh/Kconfig |  1 +
 arch/sh/include/asm/cachetype.h |  9 +
 arch/sparc/Kconfig  |  1 +
 arch/sparc/include/asm/cachetype.h  | 14 ++
 arch/xtensa/Kconfig |  1 +
 arch/xtensa/include/asm/cachetype.h | 10 ++
 include/linux/cacheinfo.h   |  6 ++
 mm/Kconfig  |  6 ++
 22 files changed, 112 insertions(+)
 create mode 100644 arch/arc/include/asm/cachetype.h
 create mode 100644 arch/csky/include/asm/cachetype.h
 create mode 100644 arch/m68k/include/asm/cachetype.h
 create mode 100644 arch/mips/include/asm/cachetype.h
 create mode 100644 arch/nios2/include/asm/cachetype.h
 create mode 100644 arch/parisc/include/asm/cachetype.h
 create mode 100644 arch/sh/include/asm/cachetype.h
 create mode 100644 arch/sparc/include/asm/cachetype.h
 create mode 100644 arch/xtensa/include/asm/cachetype.h

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 1b0483c51cc1..7d294a3242a4 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -6,6 +6,7 @@
 config ARC
def_bool y
select ARC_TIMERS
+   select ARCH_HAS_CPU_CACHE_ALIASING
select ARCH_HAS_CACHE_LINE_SIZE
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DMA_PREP_COHERENT
diff --git a/arch/arc/include/asm/cachetype.h b/arch/arc/include/asm/cachetype.h
new file mode 100644
index ..05fc7ed59712
--- /dev/null
+++ b/arch/arc/include/asm/cachetype.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_ARC_CACHETYPE_H
+#define __ASM_ARC_CACHETYPE_H
+
+#include 
+
+#define cpu_dcache_is_aliasing()   true
+
+#endif
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index f8567e95f98b..cd13b1788973 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -5,6 +5,7 @@ config ARM
select ARCH_32BIT_OFF_T
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE if HAVE_KRETPROBES && 
FRAME_POINTER && !ARM_UNWIND
  

[RFC PATCH v4 08/12] dax: Fix incorrect list of data cache aliasing architectures

2024-02-02 Thread Mathieu Desnoyers
commit d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
prevents DAX from building on architectures with virtually aliased
dcache with:

  depends on !(ARM || MIPS || SPARC)

This check is too broad (e.g. recent ARMv7 don't have virtually aliased
dcaches), and also misses many other architectures with virtually
aliased data cache.

This is a regression introduced in the v4.0 Linux kernel where the
dax mount option is removed for 32-bit ARMv7 boards which have no data
cache aliasing, and therefore should work fine with FS_DAX.

This was turned into the following check in alloc_dax() by a preparatory
change:

if (ops && (IS_ENABLED(CONFIG_ARM) ||
IS_ENABLED(CONFIG_MIPS) ||
IS_ENABLED(CONFIG_SPARC)))
return NULL;

Use cpu_dcache_is_aliasing() instead to figure out whether the environment
has aliasing data caches.

Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Signed-off-by: Mathieu Desnoyers 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/dax/super.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index ce5bffa86bba..a21a7c262382 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dax-private.h"
 
 /**
@@ -455,9 +456,7 @@ struct dax_device *alloc_dax(void *private, const struct 
dax_operations *ops)
 * except for device-dax (NULL operations pointer), which does
 * not use aliased mappings from the kernel.
 */
-   if (ops && (IS_ENABLED(CONFIG_ARM) ||
-   IS_ENABLED(CONFIG_MIPS) ||
-   IS_ENABLED(CONFIG_SPARC)))
+   if (ops && cpu_dcache_is_aliasing())
return ERR_PTR(-EOPNOTSUPP);
 
if (WARN_ON_ONCE(ops && !ops->zero_page_range))
-- 
2.39.2




[RFC PATCH v4 05/12] virtio: Treat alloc_dax() -EOPNOTSUPP failure as non-fatal

2024-02-02 Thread Mathieu Desnoyers
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of virtio
virtio_fs_setup_dax() to treat alloc_dax() -EOPNOTSUPP failure as
non-fatal.

For the transition, consider that alloc_dax() returning NULL is the
same as returning -EOPNOTSUPP.

Co-developed-by: Dan Williams 
Signed-off-by: Dan Williams 
Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 fs/fuse/virtio_fs.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 5f1be1da92ce..621b1bca2d55 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "fuse_i.h"
 
@@ -795,8 +796,11 @@ static void virtio_fs_cleanup_dax(void *data)
put_dax(dax_dev);
 }
 
+DEFINE_FREE(cleanup_dax, struct dax_dev *, if (!IS_ERR_OR_NULL(_T)) 
virtio_fs_cleanup_dax(_T))
+
 static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs 
*fs)
 {
+   struct dax_device *dax_dev __free(cleanup_dax) = NULL;
struct virtio_shm_region cache_reg;
struct dev_pagemap *pgmap;
bool have_cache;
@@ -804,6 +808,12 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, 
struct virtio_fs *fs)
if (!IS_ENABLED(CONFIG_FUSE_DAX))
return 0;
 
+   dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
+   if (IS_ERR_OR_NULL(dax_dev)) {
+   int rc = IS_ERR(dax_dev) ? PTR_ERR(dax_dev) : -EOPNOTSUPP;
+   return rc == -EOPNOTSUPP ? 0 : rc;
+   }
+
/* Get cache region */
have_cache = virtio_get_shm_region(vdev, &cache_reg,
   (u8)VIRTIO_FS_SHMCAP_ID_CACHE);
@@ -849,10 +859,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, 
struct virtio_fs *fs)
dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 
0x%llx\n",
__func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);
 
-   fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
-   if (IS_ERR(fs->dax_dev))
-   return PTR_ERR(fs->dax_dev);
-
+   fs->dax_dev = no_free_ptr(dax_dev);
return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
fs->dax_dev);
 }
-- 
2.39.2




[RFC PATCH v4 06/12] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Mathieu Desnoyers
Replace the following fs/Kconfig:FS_DAX dependency:

  depends on !(ARM || MIPS || SPARC)

By a runtime check within alloc_dax(). This runtime check returns
ERR_PTR(-EOPNOTSUPP) if the @ops parameter is non-NULL (which means
the kernel is using an aliased mapping) on an architecture which
has data cache aliasing.

Change the return value from NULL to PTR_ERR(-EOPNOTSUPP) for
CONFIG_DAX=n for consistency.

This is done in preparation for using cpu_dcache_is_aliasing() in a
following change which will properly support architectures which detect
data cache aliasing at runtime.

Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Signed-off-by: Mathieu Desnoyers 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/dax/super.c | 15 +++
 fs/Kconfig  |  1 -
 include/linux/dax.h |  6 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0da9232ea175..ce5bffa86bba 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -319,6 +319,11 @@ EXPORT_SYMBOL_GPL(dax_alive);
  * that any fault handlers or operations that might have seen
  * dax_alive(), have completed.  Any operations that start after
  * synchronize_srcu() has run will abort upon seeing !dax_alive().
+ *
+ * Note, because alloc_dax() returns an ERR_PTR() on error, callers
+ * typically store its result into a local variable in order to check
+ * the result. Therefore, care must be taken to populate the struct
+ * device dax_dev field make sure the dax_dev is not leaked.
  */
 void kill_dax(struct dax_device *dax_dev)
 {
@@ -445,6 +450,16 @@ struct dax_device *alloc_dax(void *private, const struct 
dax_operations *ops)
dev_t devt;
int minor;
 
+   /*
+* Unavailable on architectures with virtually aliased data caches,
+* except for device-dax (NULL operations pointer), which does
+* not use aliased mappings from the kernel.
+*/
+   if (ops && (IS_ENABLED(CONFIG_ARM) ||
+   IS_ENABLED(CONFIG_MIPS) ||
+   IS_ENABLED(CONFIG_SPARC)))
+   return ERR_PTR(-EOPNOTSUPP);
+
if (WARN_ON_ONCE(ops && !ops->zero_page_range))
return ERR_PTR(-EINVAL);
 
diff --git a/fs/Kconfig b/fs/Kconfig
index 42837617a55b..e5efdb3b276b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -56,7 +56,6 @@ endif # BLOCK
 config FS_DAX
bool "File system based Direct Access (DAX) support"
depends on MMU
-   depends on !(ARM || MIPS || SPARC)
depends on ZONE_DEVICE || FS_DAX_LIMITED
select FS_IOMAP
select DAX
diff --git a/include/linux/dax.h b/include/linux/dax.h
index b463502b16e1..df2d52b8a245 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -86,11 +86,7 @@ static inline void *dax_holder(struct dax_device *dax_dev)
 static inline struct dax_device *alloc_dax(void *private,
const struct dax_operations *ops)
 {
-   /*
-* Callers should check IS_ENABLED(CONFIG_DAX) to know if this
-* NULL is an error or expected.
-*/
-   return NULL;
+   return ERR_PTR(-EOPNOTSUPP);
 }
 static inline void put_dax(struct dax_device *dax_dev)
 {
-- 
2.39.2




[RFC PATCH v4 04/12] dcssblk: Handle alloc_dax() -EOPNOTSUPP failure

2024-02-02 Thread Mathieu Desnoyers
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of dcssblk
dcssblk_add_store() to handle alloc_dax() -EOPNOTSUPP failures.

Considering that s390 is not a data cache aliasing architecture,
and considering that DCSSBLK selects DAX, a return value of -EOPNOTSUPP
from alloc_dax() should make dcssblk_add_store() fail.

For the transition, consider that alloc_dax() returning NULL is the
same as returning -EOPNOTSUPP.

Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
Cc: linux-s...@vger.kernel.org
---
 drivers/s390/block/dcssblk.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 4b7ecd4fd431..a3010849bfed 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -549,6 +549,7 @@ dcssblk_add_store(struct device *dev, struct 
device_attribute *attr, const char
int rc, i, j, num_of_segments;
struct dcssblk_dev_info *dev_info;
struct segment_info *seg_info, *temp;
+   struct dax_device *dax_dev;
char *local_buf;
unsigned long seg_byte_size;
 
@@ -677,13 +678,13 @@ dcssblk_add_store(struct device *dev, struct 
device_attribute *attr, const char
if (rc)
goto put_dev;
 
-   dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
-   if (IS_ERR(dev_info->dax_dev)) {
-   rc = PTR_ERR(dev_info->dax_dev);
-   dev_info->dax_dev = NULL;
+   dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops);
+   if (IS_ERR_OR_NULL(dax_dev)) {
+   rc = IS_ERR(dax_dev) ? PTR_ERR(dax_dev) : -EOPNOTSUPP;
goto put_dev;
}
-   set_dax_synchronous(dev_info->dax_dev);
+   set_dax_synchronous(dax_dev);
+   dev_info->dax_dev = dax_dev;
rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
if (rc)
goto out_dax;
-- 
2.39.2




[RFC PATCH v4 00/12] Introduce cpu_dcache_is_aliasing() to fix DAX regression

2024-02-02 Thread Mathieu Desnoyers
This commit introduced in v4.0 prevents building FS_DAX on 32-bit ARM,
even on ARMv7 which does not have virtually aliased data caches:

commit d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")

It used to work fine before: I have customers using DAX over pmem on
ARMv7, but this regression will likely prevent them from upgrading their
kernel.

The root of the issue here is the fact that DAX was never designed to
handle virtually aliasing data caches (VIVT and VIPT with aliasing data
cache). It touches the pages through their linear mapping, which is not
consistent with the userspace mappings with virtually aliasing data
caches.

This patch series introduces cpu_dcache_is_aliasing() with the new
Kconfig option ARCH_HAS_CPU_CACHE_ALIASING and implements it for all
architectures. The implementation of cpu_dcache_is_aliasing() is either
evaluated to a constant at compile-time or a runtime check, which is
what is needed on ARM.

With this we can basically narrow down the list of architectures which
are unsupported by DAX to those which are really affected.

Testing done so far:

- Compile allyesconfig on x86-64,
- Compile allyesconfig on x86-64, with FS_DAX=n.
- Compile allyesconfig on x86-64, with DAX=n.
- Boot test after modifying alloc_dax() to force returning -EOPNOTSUPP
  even on x86-64, thus simulating the behavior expected on an
  architecture with data cache aliasing.

There are many more axes to test however. I would welcome Tested-by for:

- affected architectures,
- affected drivers,
- affected filesytems.

Thanks,

Mathieu

Changes since v3:
- Fix a leak on dax_add_host() failure in nvdimm/pmem.
- Split the series into a bissectable sequence of changes.
- Ensure that device-dax use-cases still works on data cache
  aliasing architectures.

Changes since v2:
- Move DAX supported runtime check to alloc_dax(),
- Modify DM to handle alloc_dax() error as non-fatal,
- Remove all filesystem modifications, since the check is now done by
  alloc_dax(),
- rename "dcache" and "cache" to "cpu dcache" and "cpu cache" to
  eliminate confusion with VFS terminology.

Changes since v1:
- The order of the series was completely changed based on the
  feedback received on v1,
- cache_is_aliasing() is renamed to dcache_is_aliasing(),
- ARCH_HAS_CACHE_ALIASING_DYNAMIC is gone,
- dcache_is_aliasing() vs ARCH_HAS_CACHE_ALIASING relationship is
  simplified,
- the dax_is_supported() check was moved to its rightful place in all
  filesystems.

Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
Cc: linux-s...@vger.kernel.org

Mathieu Desnoyers (12):
  nvdimm/pmem: Fix leak on dax_add_host() failure
  nvdimm/pmem: Treat alloc_dax() -EOPNOTSUPP failure as non-fatal
  dm: Treat alloc_dax() -EOPNOTSUPP failure as non-fatal
  dcssblk: Handle alloc_dax() -EOPNOTSUPP failure
  virtio: Treat alloc_dax() -EOPNOTSUPP failure as non-fatal
  dax: Check for data cache aliasing at runtime
  Introduce cpu_dcache_is_aliasing() across all architectures
  dax: Fix incorrect list of data cache aliasing architectures
  nvdimm/pmem: Cleanup alloc_dax() error handling
  dm: Cleanup alloc_dax() error handling
  dcssblk: Cleanup alloc_dax() error handling
  virtio: Cleanup alloc_dax() error handling

 arch/arc/Kconfig|  1 +
 arch/arc/include/asm/cachetype.h|  9 +
 arch/arm/Kconfig|  1 +
 arch/arm/include/asm/cachetype.h|  2 ++
 arch/csky/Kconfig   |  1 +
 arch/csky/include/asm/cachetype.h   |  9 +
 arch/m68k/Kconfig   |  1 +
 arch/m68k/include/asm/cachetype.h   |  9 +
 arch/mips/Kconfig   |  1 +
 arch/mips/include/asm/cachetype.h   |  9 +
 arch/nios2/Kconfig  |  1 +
 arch/nios2/include/asm/cachetype.h  | 10 ++
 arch/parisc/Kconfig |  1 +
 arch/parisc/include/asm/cachetype.h |  9 +
 arch/sh/Kconfig |  1 +
 arch/sh/include/asm/cachetype.h |  9 +
 arch/sparc/Kconfig  |  1 +
 arch/sparc/include/asm/cachetype.h  | 14 ++
 arch/xtensa/Kconfig |  1 +
 arch/xtensa/include/asm/cachetype.h | 10 ++
 drivers/dax/super.c | 14 ++
 drivers/md/dm.c | 17 +
 drivers/nvdimm/pmem.c   | 23 ---
 drivers/s390/block/dcssblk.c| 11 ++-
 fs/Kconfig  |  1 -
 fs/fuse/virtio_fs.c | 15 +++
 include/linux/cacheinfo.h   |  6 ++
 include/linux/dax.h |  6 +-
 mm/Kconfig  |  6 ++
 29 files changed, 165 insertions(+), 34 deleti

[RFC PATCH v4 03/12] dm: Treat alloc_dax() -EOPNOTSUPP failure as non-fatal

2024-02-02 Thread Mathieu Desnoyers
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of dm alloc_dev()
to treat alloc_dax() -EOPNOTSUPP failure as non-fatal.

For the transition, consider that alloc_dax() returning NULL is the
same as returning -EOPNOTSUPP.

Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Suggested-by: Dan Williams 
Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/md/dm.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 23c32cd1f1d8..2fc22cae9089 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2054,6 +2054,7 @@ static void cleanup_mapped_device(struct mapped_device 
*md)
 static struct mapped_device *alloc_dev(int minor)
 {
int r, numa_node_id = dm_get_numa_node();
+   struct dax_device *dax_dev;
struct mapped_device *md;
void *old_md;
 
@@ -2122,15 +2123,15 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   if (IS_ENABLED(CONFIG_FS_DAX)) {
-   md->dax_dev = alloc_dax(md, &dm_dax_ops);
-   if (IS_ERR(md->dax_dev)) {
-   md->dax_dev = NULL;
+   dax_dev = alloc_dax(md, &dm_dax_ops);
+   if (IS_ERR_OR_NULL(dax_dev)) {
+   if (IS_ERR(dax_dev) && PTR_ERR(dax_dev) != -EOPNOTSUPP)
goto bad;
-   }
-   set_dax_nocache(md->dax_dev);
-   set_dax_nomc(md->dax_dev);
-   if (dax_add_host(md->dax_dev, md->disk))
+   } else {
+   set_dax_nocache(dax_dev);
+   set_dax_nomc(dax_dev);
+   md->dax_dev = dax_dev;
+   if (dax_add_host(dax_dev, md->disk))
goto bad;
}
 
-- 
2.39.2




[RFC PATCH v4 02/12] nvdimm/pmem: Treat alloc_dax() -EOPNOTSUPP failure as non-fatal

2024-02-02 Thread Mathieu Desnoyers
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of nvdimm/pmem
pmem_attach_disk() to treat alloc_dax() -EOPNOTSUPP failure as non-fatal.

For the transition, consider that alloc_dax() returning NULL is the
same as returning -EOPNOTSUPP.

Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing 
caches")
Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/nvdimm/pmem.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9fe358090720..f1d9f5c6dbac 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -558,19 +558,21 @@ static int pmem_attach_disk(struct device *dev,
disk->bb = &pmem->bb;
 
dax_dev = alloc_dax(pmem, &pmem_dax_ops);
-   if (IS_ERR(dax_dev)) {
-   rc = PTR_ERR(dax_dev);
-   goto out;
+   if (IS_ERR_OR_NULL(dax_dev)) {
+   rc = IS_ERR(dax_dev) ? PTR_ERR(dax_dev) : -EOPNOTSUPP;
+   if (rc != -EOPNOTSUPP)
+   goto out;
+   } else {
+   set_dax_nocache(dax_dev);
+   set_dax_nomc(dax_dev);
+   if (is_nvdimm_sync(nd_region))
+   set_dax_synchronous(dax_dev);
+   pmem->dax_dev = dax_dev;
+   rc = dax_add_host(dax_dev, disk);
+   if (rc)
+   goto out_cleanup_dax;
+   dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
}
-   set_dax_nocache(dax_dev);
-   set_dax_nomc(dax_dev);
-   if (is_nvdimm_sync(nd_region))
-   set_dax_synchronous(dax_dev);
-   pmem->dax_dev = dax_dev;
-   rc = dax_add_host(dax_dev, disk);
-   if (rc)
-   goto out_cleanup_dax;
-   dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
rc = device_add_disk(dev, disk, pmem_attribute_groups);
if (rc)
goto out_remove_host;
-- 
2.39.2




[RFC PATCH v4 01/12] nvdimm/pmem: Fix leak on dax_add_host() failure

2024-02-02 Thread Mathieu Desnoyers
Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done
before setting pmem->dax_dev, which therefore issues the two following
calls on NULL pointers:

out_cleanup_dax:
kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);

Signed-off-by: Mathieu Desnoyers 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: Mikulas Patocka 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: Matthew Wilcox 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: dm-devel@lists.linux.dev
Cc: nvd...@lists.linux.dev
---
 drivers/nvdimm/pmem.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..9fe358090720 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev,
set_dax_nomc(dax_dev);
if (is_nvdimm_sync(nd_region))
set_dax_synchronous(dax_dev);
+   pmem->dax_dev = dax_dev;
rc = dax_add_host(dax_dev, disk);
if (rc)
goto out_cleanup_dax;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
-   pmem->dax_dev = dax_dev;
-
rc = device_add_disk(dev, disk, pmem_attribute_groups);
if (rc)
goto out_remove_host;
-- 
2.39.2




Re: [git pull] device mapper fixes for 6.8-rc3

2024-02-02 Thread pr-tracker-bot
The pull request you sent on Fri, 2 Feb 2024 13:00:38 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
> tags/for-6.8/dm-fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6897cea7183762b4bc97b0ed1b75274ece9d518b

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html



Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Mathieu Desnoyers

On 2024-02-02 15:14, Dan Williams wrote:

Mathieu Desnoyers wrote:
[..]

Thanks for that. All of those need to be done before the fs goes live
later in virtio_device_ready(), but before that point nothing should be
calling into virtio_fs_dax_ops, so as far as I can see it is safe to
change the order.


Sounds good, I'll do that.

I will soon be ready to send out a RFC v4, which is still only
compiled-tested. Do you happen to have some kind of test suite
you can use to automate some of the runtime testing ?


There is a test suite for the pmem, dm, and dax changes
(https://github.com/pmem/ndctl?tab=readme-ov-file#unit-tests), but not
automated unfortunately. The NVDIMM maintainer team will run that before
pushing patches out to the fixes branch if you just want to lean on
that. For the rest I think we will need to depend on tested-by's from
s390 + virtio_fs folks, and / or sufficient soak time in linux-next.


I suspect this will be necessary. There are just so many combinations
of architectures, drivers and filesystems involved here that I don't
think it is realistic to try to do all this testing manually on my own.

I prefer to voice this up front, so there are no misplaced expectations
about testing.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Dan Williams
Mathieu Desnoyers wrote:
[..]
> > Thanks for that. All of those need to be done before the fs goes live
> > later in virtio_device_ready(), but before that point nothing should be
> > calling into virtio_fs_dax_ops, so as far as I can see it is safe to
> > change the order.
> 
> Sounds good, I'll do that.
> 
> I will soon be ready to send out a RFC v4, which is still only
> compiled-tested. Do you happen to have some kind of test suite
> you can use to automate some of the runtime testing ?

There is a test suite for the pmem, dm, and dax changes
(https://github.com/pmem/ndctl?tab=readme-ov-file#unit-tests), but not
automated unfortunately. The NVDIMM maintainer team will run that before
pushing patches out to the fixes branch if you just want to lean on
that. For the rest I think we will need to depend on tested-by's from
s390 + virtio_fs folks, and / or sufficient soak time in linux-next.



Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Mathieu Desnoyers

On 2024-02-02 14:41, Dan Williams wrote:

Mathieu Desnoyers wrote:

On 2024-02-02 12:37, Dan Williams wrote:

Mathieu Desnoyers wrote:

[...]





The alternative route I intend to take is to audit all callers
of alloc_dax() and make sure they all save the alloc_dax() return
value in a struct dax_device * local variable first for the sake
of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
initialized to NULL in the error case and simplify the rest of
error checking.


I could maybe get on board with that, but it needs a comment somewhere
about the asymmetric subtlety.


Is this "somewhere" at every alloc_dax() call site, or do you have
something else in mind ?


At least kill_dax() should mention the asymmetry I think.


Here is what I intend to add:

 * Note, because alloc_dax() returns an ERR_PTR() on error, callers
 * typically store its result into a local variable in order to check
 * the result. Therefore, care must be taken to populate the struct
 * device dax_dev field make sure the dax_dev is not leaked.




The real question is what to do about device-dax. I *think* it is not
affected by cpu_dcache aliasing because it never accesses user mappings
through a kernel alias. I doubt device-dax is in use on these platforms,
but we might need another fixup for that if someone screams about the
alloc_dax() behavior change making them lose device-dax access.


By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX.

Based on your analysis, is alloc_dax() still the right spot where
to place this runtime check ? Which call sites are responsible
for invoking alloc_dax() for device-dax ?


That is in devm_create_dev_dax().


If we know which call sites do not intend to use the kernel linear
mapping, we could introduce a flag (or a new variant of the alloc_dax()
API) that would either enforce or skip the check.


Hmmm, it looks like there is already a natural flag for that. If
alloc_dax() is passed a NULL operations pointer it means there are no
kernel usages of the aliased mapping. That actually fits rather nicely.


Good, I was reaching the same conclusion when I received your reply.
I'll do that. It ends up being:

/*
 * Unavailable on architectures with virtually aliased data caches,
 * except for device-dax (NULL operations pointer), which does
 * not use aliased mappings from the kernel.
 */
if (ops && cpu_dcache_is_aliasing())
return ERR_PTR(-EOPNOTSUPP);



[..]

@@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, 
struct virtio_fs *fs)
if (!IS_ENABLED(CONFIG_FUSE_DAX))
return 0;

+	dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);

+   if (IS_ERR(dax_dev)) {
+   int rc = PTR_ERR(dax_dev);
+
+   if (rc == -EOPNOTSUPP)
+   return 0;
+   return rc;
+   }


What is gained by moving this allocation here ?


The gain is to fail early in virtio_fs_setup_dax() since the fundamental
dependency of alloc_dax() success is not met. For example why let the
setup progress to devm_memremap_pages() when alloc_dax() is going to
return ERR_PTR(-EOPNOTSUPP).


What I don't know is whether there is a dependency requiring to do
devm_request_mem_region(), devm_kzalloc(), devm_memremap_pages()
before calling alloc_dax() ?

Those 3 calls are used to populate:

  fs->window_phys_addr = (phys_addr_t) cache_reg.addr;
  fs->window_len = (phys_addr_t) cache_reg.len;

and then alloc_dax() takes "fs" as private data parameter. So it's
unclear to me whether we can swap the invocation order. I suspect
that it is not an issue because it is only used to populate
dax_dev->private, but I prefer to confirm this with you just to be
on the safe side.


Thanks for that. All of those need to be done before the fs goes live
later in virtio_device_ready(), but before that point nothing should be
calling into virtio_fs_dax_ops, so as far as I can see it is safe to
change the order.


Sounds good, I'll do that.

I will soon be ready to send out a RFC v4, which is still only
compiled-tested. Do you happen to have some kind of test suite
you can use to automate some of the runtime testing ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Dan Williams
Mathieu Desnoyers wrote:
> On 2024-02-02 12:37, Dan Williams wrote:
> > Mathieu Desnoyers wrote:
> [...]
> >>
> > 
> >> The alternative route I intend to take is to audit all callers
> >> of alloc_dax() and make sure they all save the alloc_dax() return
> >> value in a struct dax_device * local variable first for the sake
> >> of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
> >> initialized to NULL in the error case and simplify the rest of
> >> error checking.
> > 
> > I could maybe get on board with that, but it needs a comment somewhere
> > about the asymmetric subtlety.
> 
> Is this "somewhere" at every alloc_dax() call site, or do you have
> something else in mind ?

At least kill_dax() should mention the asymmetry I think.

> 
> > 
> >>
> >>
> >>>   return;
> >>>
> >>>   if (dax_dev->holder_data != NULL)
> >>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> >>> index 4e8fdcb3f1c8..b69c9e442cf4 100644
> >>> --- a/drivers/nvdimm/pmem.c
> >>> +++ b/drivers/nvdimm/pmem.c
> >>> @@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
> >>>   dax_dev = alloc_dax(pmem, &pmem_dax_ops);
> >>>   if (IS_ERR(dax_dev)) {
> >>>   rc = PTR_ERR(dax_dev);
> >>> - goto out;
> >>> + if (rc != -EOPNOTSUPP)
> >>> + goto out;
> >>
> >> If I compare the before / after this change, if previously
> >> pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
> >> result in a NULL pointer dereference.
> > 
> > No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the
> > CONFIG_FS_DAX=n case.
> 
> Indeed, I was wrong there.
> 
> > So that means that pmem devices on ARM have been
> > possible without FS_DAX. So, in order for alloc_dax() returning
> > ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error
> > path needs to be changed.
> Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX
> Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y,
> which includes configurations that had FS_DAX=n previously.
> 
> I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP
> alloc_dax() return value as non-fatal.
> 
> > 
> >> This would be an error handling fix all by itself. Do we really want
> >> to return successfully if dax is unsupported, or should we return
> >> an error here ?
> > 
> > Per above, there is no error handling fix, and pmem block device
> > available should not depend on alloc_dax() succeeding.
> 
> I agree on treating alloc_dax() failure as non-fatal. There is
> however one error handling fix to nvdimm/pmem which I plan to
> introduce as an initial patch before this change:
> 
>  nvdimm/pmem: Fix leak on dax_add_host() failure
>  
>  Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done
>  before setting pmem->dax_dev, which therefore issues the two following
>  calls on NULL pointers:
>  
>  out_cleanup_dax:
>  kill_dax(pmem->dax_dev);
>  put_dax(pmem->dax_dev);
>  
>  Signed-off-by: Mathieu Desnoyers 
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 4e8fdcb3f1c8..9fe358090720 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev,
>   set_dax_nomc(dax_dev);
>   if (is_nvdimm_sync(nd_region))
>   set_dax_synchronous(dax_dev);
> + pmem->dax_dev = dax_dev;
>   rc = dax_add_host(dax_dev, disk);
>   if (rc)
>   goto out_cleanup_dax;
>   dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
> - pmem->dax_dev = dax_dev;
> -
>   rc = device_add_disk(dev, disk, pmem_attribute_groups);
>   if (rc)
>   goto out_remove_host;

Yup, looks good.

> > The real question is what to do about device-dax. I *think* it is not
> > affected by cpu_dcache aliasing because it never accesses user mappings
> > through a kernel alias. I doubt device-dax is in use on these platforms,
> > but we might need another fixup for that if someone screams about the
> > alloc_dax() behavior change making them lose device-dax access.
> 
> By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX.
> 
> Based on your analysis, is alloc_dax() still the right spot where
> to place this runtime check ? Which call sites are responsible
> for invoking alloc_dax() for device-dax ?

That is in devm_create_dev_dax().

> If we know which call sites do not intend to use the kernel linear
> mapping, we could introduce a flag (or a new variant of the alloc_dax()
> API) that would either enforce or skip the check.

Hmmm, it looks like there is already a natural flag for that. If
alloc_dax() is passed a NULL operations pointer it means there are no
kernel usages of the aliased mapping. That actually fits rather nicely.

[..]
> >>> @@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct 

Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Mathieu Desnoyers

On 2024-02-02 12:37, Dan Williams wrote:

Mathieu Desnoyers wrote:

[...]





The alternative route I intend to take is to audit all callers
of alloc_dax() and make sure they all save the alloc_dax() return
value in a struct dax_device * local variable first for the sake
of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
initialized to NULL in the error case and simplify the rest of
error checking.


I could maybe get on board with that, but it needs a comment somewhere
about the asymmetric subtlety.


Is this "somewhere" at every alloc_dax() call site, or do you have
something else in mind ?







return;
   
   	if (dax_dev->holder_data != NULL)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..b69c9e442cf4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
dax_dev = alloc_dax(pmem, &pmem_dax_ops);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
-   goto out;
+   if (rc != -EOPNOTSUPP)
+   goto out;


If I compare the before / after this change, if previously
pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
result in a NULL pointer dereference.


No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the
CONFIG_FS_DAX=n case.


Indeed, I was wrong there.


So that means that pmem devices on ARM have been
possible without FS_DAX. So, in order for alloc_dax() returning
ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error
path needs to be changed.

Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX
Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y,
which includes configurations that had FS_DAX=n previously.

I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP
alloc_dax() return value as non-fatal.




This would be an error handling fix all by itself. Do we really want
to return successfully if dax is unsupported, or should we return
an error here ?


Per above, there is no error handling fix, and pmem block device
available should not depend on alloc_dax() succeeding.


I agree on treating alloc_dax() failure as non-fatal. There is
however one error handling fix to nvdimm/pmem which I plan to
introduce as an initial patch before this change:

nvdimm/pmem: Fix leak on dax_add_host() failure

Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done

before setting pmem->dax_dev, which therefore issues the two following
calls on NULL pointers:

out_cleanup_dax:

kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);

Signed-off-by: Mathieu Desnoyers 


diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..9fe358090720 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev,
set_dax_nomc(dax_dev);
if (is_nvdimm_sync(nd_region))
set_dax_synchronous(dax_dev);
+   pmem->dax_dev = dax_dev;
rc = dax_add_host(dax_dev, disk);
if (rc)
goto out_cleanup_dax;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
-   pmem->dax_dev = dax_dev;
-
rc = device_add_disk(dev, disk, pmem_attribute_groups);
if (rc)
goto out_remove_host;



The real question is what to do about device-dax. I *think* it is not
affected by cpu_dcache aliasing because it never accesses user mappings
through a kernel alias. I doubt device-dax is in use on these platforms,
but we might need another fixup for that if someone screams about the
alloc_dax() behavior change making them lose device-dax access.


By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX.

Based on your analysis, is alloc_dax() still the right spot where
to place this runtime check ? Which call sites are responsible
for invoking alloc_dax() for device-dax ?

If we know which call sites do not intend to use the kernel linear
mapping, we could introduce a flag (or a new variant of the alloc_dax()
API) that would either enforce or skip the check.

[...]



Here what I'm seeing so far:

- devm_release_mem_region() is never called after devm_request_mem_region(). Not
on error, neither on teardown,


devm_release_mem_region() is called from virtio_fs_probe() context. That


I guess you mean "devm_request_mem_region()" here.


means that when virtio_fs_probe() returns an error the driver core will
automatically call devm_request_mem_region().


And "devm_release_mem_region()" here.




- pgmap is never freed on error after devm_kzalloc.


That is what the "devm_" in devm_kzalloc() does, free the memory on
driver-probe failure, or after the driver remove callback is invoked.


Got it.






   {
+   struct dax_device *dax_dev __free(cleanup_dax) = NULL;
struct vi

[git pull] device mapper fixes for 6.8-rc3

2024-02-02 Thread Mike Snitzer
Hi Linus,

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

  Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git 
tags/for-6.8/dm-fixes

for you to fetch changes up to 0a9bab391e336489169b95cb0d4553d921302189:

  dm-crypt, dm-verity: disable tasklets (2024-02-02 12:33:50 -0500)

Please pull, thanks.
Mike


- Fix DM ioctl interface to avoid INT_MAX overflow warnings from
  kvmalloc by limiting the number of targets and parameter size area.

- Fix DM stats to avoid INT_MAX overflow warnings from kvmalloc by
  limiting the number of entries supported.

- Fix DM writecache to support mapping devices larger than 1 TiB by
  switching from using kvmalloc_array to vmalloc_array -- which avoids
  INT_MAX overflow in kvmalloc_node and associated warnings.

- Remove the (ab)use of tasklets from both the DM crypt and verity
  targets. They will be converted to use BH workqueue in future.
-BEGIN PGP SIGNATURE-

iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmW9LCsACgkQxSPxCi2d
A1qILwgAmL9XOtavSKJ/8o9scJygutYpNSLE0f6mdkdCgJB2nknJ1vR38bXyDpNr
X3s6QC5TqKTtG7DtRTfnZc8zgtBHajjUZTFBu1NUF9kgNQcrjG3jW+quZ51pxkV0
1rvzOiYts6ca8csbFViMPS9FJVq1h3PnAyrkhI0SUS7+jEvDZy/QIX4DP20ye9SX
wKguOSK544haSLHPNYuZqqCEoTBF+Vh1k1gDkxr594NwjsIJJK0+HGelamjzN/96
/jr88P4bm/6OIVdvwTUefnpIhNIum1Dfa8QWciKOzuct0jqsub65+SUSoTLmoiY4
/3AZDvp0ZMEwpMAvCIyvHnm81K72MA==
=sioN
-END PGP SIGNATURE-


Mikulas Patocka (4):
  dm: limit the number of targets and parameter size area
  dm stats: limit the number of entries
  dm writecache: allow allocations larger than 2GiB
  dm-crypt, dm-verity: disable tasklets

 drivers/md/dm-core.h  |  2 ++
 drivers/md/dm-crypt.c | 38 ++
 drivers/md/dm-ioctl.c |  3 ++-
 drivers/md/dm-stats.c |  9 +
 drivers/md/dm-table.c |  9 +++--
 drivers/md/dm-verity-target.c | 26 ++
 drivers/md/dm-verity.h|  1 -
 drivers/md/dm-writecache.c|  8 
 8 files changed, 28 insertions(+), 68 deletions(-)



Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Dan Williams
Mathieu Desnoyers wrote:
[..]
> The change for -EOPNOTSUPP in header and implementation would look like
> this (more questions below):
> 
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index b463502b16e1..df2d52b8a245 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -86,11 +86,7 @@ static inline void *dax_holder(struct dax_device *dax_dev)
>   static inline struct dax_device *alloc_dax(void *private,
>   const struct dax_operations *ops)
>   {
> - /*
> -  * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
> -  * NULL is an error or expected.
> -  */
> - return NULL;
> + return ERR_PTR(-EOPNOTSUPP);
>   }
>   static inline void put_dax(struct dax_device *dax_dev)
>   {
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 8c3a6e8e6334..c1cf6f0bbe12 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -448,7 +448,7 @@ struct dax_device *alloc_dax(void *private, const struct 
> dax_operations *ops)
>   
>   /* Unavailable on architectures with virtually aliased data caches. */
>   if (cpu_dcache_is_aliasing())
> - return NULL;
> + return ERR_PTR(-EOPNOTSUPP);
>   
>   if (WARN_ON_ONCE(ops && !ops->zero_page_range))
>   return ERR_PTR(-EINVAL);
> 
> > 
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index f4b635526345..254d3b1e420e 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -322,7 +322,7 @@ EXPORT_SYMBOL_GPL(dax_alive);
> >*/
> >   void kill_dax(struct dax_device *dax_dev)
> >   {
> > -   if (!dax_dev)
> > +   if (IS_ERR_OR_NULL(dax_dev))
> 
> I am tempted to just leave the "if (!dax_dev)" check here, because
> many other functions of this API are only protected by a NULL
> pointer check. I would hate to forget to convert one check in this
> change, and I don't think it simplifies anything.

It's not that it simplifies anything, but mistakes fail safely as a
memory leak rather than a crash.

> The alternative route I intend to take is to audit all callers
> of alloc_dax() and make sure they all save the alloc_dax() return
> value in a struct dax_device * local variable first for the sake
> of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
> initialized to NULL in the error case and simplify the rest of
> error checking.

I could maybe get on board with that, but it needs a comment somewhere
about the asymmetric subtlety.

> 
> 
> > return;
> >   
> > if (dax_dev->holder_data != NULL)
> > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> > index 4e8fdcb3f1c8..b69c9e442cf4 100644
> > --- a/drivers/nvdimm/pmem.c
> > +++ b/drivers/nvdimm/pmem.c
> > @@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
> > dax_dev = alloc_dax(pmem, &pmem_dax_ops);
> > if (IS_ERR(dax_dev)) {
> > rc = PTR_ERR(dax_dev);
> > -   goto out;
> > +   if (rc != -EOPNOTSUPP)
> > +   goto out;
> 
> If I compare the before / after this change, if previously
> pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
> result in a NULL pointer dereference.

No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the
CONFIG_FS_DAX=n case. So that means that pmem devices on ARM have been
possible without FS_DAX. So, in order for alloc_dax() returning
ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error
path needs to be changed.

> This would be an error handling fix all by itself. Do we really want
> to return successfully if dax is unsupported, or should we return
> an error here ?

Per above, there is no error handling fix, and pmem block device
available should not depend on alloc_dax() succeeding.

The real question is what to do about device-dax. I *think* it is not
affected by cpu_dcache aliasing because it never accesses user mappings
through a kernel alias. I doubt device-dax is in use on these platforms,
but we might need another fixup for that if someone screams about the
alloc_dax() behavior change making them lose device-dax access.

> > +   } else {
> > +   set_dax_nocache(dax_dev);
> > +   set_dax_nomc(dax_dev);
> > +   if (is_nvdimm_sync(nd_region))
> > +   set_dax_synchronous(dax_dev);
> > +   rc = dax_add_host(dax_dev, disk);
> > +   if (rc)
> > +   goto out_cleanup_dax;
> > +   dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
> > +   pmem->dax_dev = dax_dev;
> > }
> > -   set_dax_nocache(dax_dev);
> > -   set_dax_nomc(dax_dev);
> > -   if (is_nvdimm_sync(nd_region))
> > -   set_dax_synchronous(dax_dev);
> > -   rc = dax_add_host(dax_dev, disk);
> > -   if (rc)
> > -   goto out_cleanup_dax;
> > -   dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
> > -   pmem->dax_dev = dax_dev;
> >   
> > rc = device_add_disk(dev, disk, pmem_attribute_groups);
> > if (rc)
> > d

Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Mathieu Desnoyers

On 2024-02-01 10:44, Mathieu Desnoyers wrote:

On 2024-01-31 17:18, Dan Williams wrote:


[...]



diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 5f1be1da92ce..11053a70f5ab 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -16,6 +16,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include "fuse_i.h"
@@ -795,8 +796,11 @@ static void virtio_fs_cleanup_dax(void *data)
  put_dax(dax_dev);
  }
+DEFINE_FREE(cleanup_dax, struct dax_dev *, if (!IS_ERR_OR_NULL(_T)) 
virtio_fs_cleanup_dax(_T))

+
  static int virtio_fs_setup_dax(struct virtio_device *vdev, struct 
virtio_fs *fs)


So either I'm completely missing how ownership works in this function, or
we should be really concerned about the fact that it does no actual
cleanup of anything on any error.

[...]


Here what I'm seeing so far:

- devm_release_mem_region() is never called after 
devm_request_mem_region(). Not

   on error, neither on teardown,
- pgmap is never freed on error after devm_kzalloc.


I was indeed missing something: the devm_ family of functions
keeps ownership at the device level, so we would not need explicit
teardown.




  {
+    struct dax_device *dax_dev __free(cleanup_dax) = NULL;
  struct virtio_shm_region cache_reg;
  struct dev_pagemap *pgmap;
  bool have_cache;
@@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct 
virtio_device *vdev, struct virtio_fs *fs)

  if (!IS_ENABLED(CONFIG_FUSE_DAX))
  return 0;
+    dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
+    if (IS_ERR(dax_dev)) {
+    int rc = PTR_ERR(dax_dev);
+
+    if (rc == -EOPNOTSUPP)
+    return 0;
+    return rc;
+    }


What is gained by moving this allocation here ?


I'm still concerned about moving the call to alloc_dax() before
the setup of the memory region it will use. Are those completely
independent ?




+
  /* Get cache region */
  have_cache = virtio_get_shm_region(vdev, &cache_reg,
 (u8)VIRTIO_FS_SHMCAP_ID_CACHE);
@@ -849,10 +862,7 @@ static int virtio_fs_setup_dax(struct 
virtio_device *vdev, struct virtio_fs *fs)
  dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 
0x%llx\n",

  __func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);
-    fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
-    if (IS_ERR(fs->dax_dev))
-    return PTR_ERR(fs->dax_dev);
-
+    fs->dax_dev = no_free_ptr(dax_dev);
  return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax,
  fs->dax_dev);
  }




[...]

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime

2024-02-02 Thread Mathieu Desnoyers

On 2024-02-01 10:44, Mathieu Desnoyers wrote:
[...]

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4e8fdcb3f1c8..b69c9e442cf4 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
  dax_dev = alloc_dax(pmem, &pmem_dax_ops);
  if (IS_ERR(dax_dev)) {
  rc = PTR_ERR(dax_dev);
-    goto out;
+    if (rc != -EOPNOTSUPP)
+    goto out;


If I compare the before / after this change, if previously
pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
result in a NULL pointer dereference.


I was wrong. drivers/nvdimm/Kconfig has:

config BLK_DEV_PMEM
select DAX

and

drivers/nvdimm/Makefile has:

obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
nd_pmem-y := pmem.o

which means that anything in pmem.c can assume that alloc_dax() is
implemented.

[...]

diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 4b7ecd4fd431..f911e58a24dd 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -681,12 +681,14 @@ dcssblk_add_store(struct device *dev, struct 
device_attribute *attr, const char

  if (IS_ERR(dev_info->dax_dev)) {
  rc = PTR_ERR(dev_info->dax_dev);
  dev_info->dax_dev = NULL;
-    goto put_dev;
+    if (rc != -EOPNOTSUPP)
+    goto put_dev;


config DCSSBLK selects FS_DAX_LIMITED and DAX.

I'm not sure what selecting DAX is trying to achieve here, because the
Kconfig option is "FS_DAX".

So depending on the real motivation behind this select, we may want to
consider failure rather than success in the -EOPNOTSUPP case.



I missed that alloc_dax() is implemented as not supported based on
CONFIG_DAX (not CONFIG_FS_DAX).

Therefore DCSSBLK Kconfig does the right thing and always selects DAX,
and thus an implemented version of alloc_dax().

This takes care of two of my open questions at least. :)

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com