Re: [PATCH v2 2/2] dax,pmem: Implement pmem based dax data recovery

2021-11-05 Thread Darrick J. Wong
On Fri, Nov 05, 2021 at 07:16:38PM -0600, Jane Chu wrote:
> For /dev/pmem based dax, enable DAX_OP_RECOVERY mode for
> dax_direct_access to translate 'kaddr' over a range that
> may contain poison(s); and enable dax_copy_to_iter to
> read as much data as possible up till a poisoned page is
> encountered; and enable dax_copy_from_iter to clear poison
> among a page-aligned range, and then write the good data over.
> 
> Signed-off-by: Jane Chu 
> ---
>  drivers/md/dm.c   |  2 ++
>  drivers/nvdimm/pmem.c | 75 ---
>  fs/dax.c  | 24 +++---
>  3 files changed, 92 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index dc354db22ef9..9b3dac916f22 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1043,6 +1043,7 @@ static size_t dm_dax_copy_from_iter(struct dax_device 
> *dax_dev, pgoff_t pgoff,
>   if (!ti)
>   goto out;
>   if (!ti->type->dax_copy_from_iter) {
> + WARN_ON(mode == DAX_OP_RECOVERY);
>   ret = copy_from_iter(addr, bytes, i);
>   goto out;
>   }
> @@ -1067,6 +1068,7 @@ static size_t dm_dax_copy_to_iter(struct dax_device 
> *dax_dev, pgoff_t pgoff,
>   if (!ti)
>   goto out;
>   if (!ti->type->dax_copy_to_iter) {
> + WARN_ON(mode == DAX_OP_RECOVERY);

Maybe just return -EOPNOTSUPP here?

Warnings are kinda loud.

>   ret = copy_to_iter(addr, bytes, i);
>   goto out;
>   }
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 3dc99e0bf633..8ae6aa678c51 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -260,7 +260,7 @@ __weak long __pmem_direct_access(struct pmem_device 
> *pmem, pgoff_t pgoff,
>   resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
>  
>   if (unlikely(is_bad_pmem(>bb, PFN_PHYS(pgoff) / 512,
> - PFN_PHYS(nr_pages
> +  PFN_PHYS(nr_pages)) && mode == DAX_OP_NORMAL))
>   return -EIO;
>  
>   if (kaddr)
> @@ -303,20 +303,85 @@ static long pmem_dax_direct_access(struct dax_device 
> *dax_dev,
>  }
>  
>  /*
> - * Use the 'no check' versions of copy_from_iter_flushcache() and
> - * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
> - * checking, both file offset and device offset, is handled by
> - * dax_iomap_actor()
> + * Even though the 'no check' versions of copy_from_iter_flushcache()
> + * and copy_mc_to_iter() are used to bypass HARDENED_USERCOPY overhead,
> + * 'read'/'write' aren't always safe when poison is consumed. They happen
> + * to be safe because the 'read'/'write' range has been guaranteed
> + * be free of poison(s) by a prior call to dax_direct_access() on the
> + * caller stack.
> + * But on a data recovery code path, the 'read'/'write' range is expected
> + * to contain poison(s), and so poison(s) is explicit checked, such that
> + * 'read' can fetch data from clean page(s) up till the first poison is
> + * encountered, and 'write' requires the range be page aligned in order
> + * to restore the poisoned page's memory type back to "rw" after clearing
> + * the poison(s).
> + * In the event of poison related failure, (size_t) -EIO is returned and
> + * caller may check the return value after casting it to (ssize_t).
> + *
> + * TODO: add support for CPUs that support MOVDIR64B instruction for
> + * faster poison clearing, and possibly smaller error blast radius.

I get that it's still early days yet for whatever pmem stuff is going on
for 5.17, but I feel like this ought to be a separate function called by
pmem_copy_from_iter, with this huge comment attached to that recovery
function.

>   */
>  static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
>   void *addr, size_t bytes, struct iov_iter *i, int mode)
>  {
> + phys_addr_t pmem_off;
> + size_t len, lead_off;
> + struct pmem_device *pmem = dax_get_private(dax_dev);
> + struct device *dev = pmem->bb.dev;
> +
> + if (unlikely(mode == DAX_OP_RECOVERY)) {
> + lead_off = (unsigned long)addr & ~PAGE_MASK;
> + len = PFN_PHYS(PFN_UP(lead_off + bytes));
> + if (is_bad_pmem(>bb, PFN_PHYS(pgoff) / 512, len)) {
> + if (lead_off || !(PAGE_ALIGNED(bytes))) {
> + dev_warn(dev, "Found poison, but addr(%p) 
> and/or bytes(%#lx) not page aligned\n",
> + addr, bytes);
> + return (size_t) -EIO;
> + }
> + pmem_off = PFN_PHYS(pgoff) + pmem->data_offset;
> + if (pmem_clear_poison(pmem, pmem_off, bytes) !=
> + BLK_STS_OK)
> + return (size_t) -EIO;

Looks reasonable enough to me, though you might want to restructure this
to reduce the 

Re: [PATCH v2 1/2] dax: Introduce normal and recovery dax operation modes

2021-11-05 Thread Darrick J. Wong
On Fri, Nov 05, 2021 at 07:16:37PM -0600, Jane Chu wrote:
> Introduce DAX_OP_NORMAL and DAX_OP_RECOVERY operation modes to
> {dax_direct_access, dax_copy_from_iter, dax_copy_to_iter}.
> DAX_OP_NORMAL is the default or the existing mode, and
> DAX_OP_RECOVERY is a new mode for data recovery purpose.
> 
> When dax-FS suspects dax media error might be encountered
> on a read or write, it can enact the recovery mode read or write
> by setting DAX_OP_RECOVERY in the aforementioned APIs. A read
> in recovery mode attempts to fetch as much data as possible
> until the first poisoned page is encountered. A write in recovery
> mode attempts to clear poison(s) in a page-aligned range and
> then write the user provided data over.
> 
> DAX_OP_NORMAL should be used for all non-recovery code path.
> 
> Signed-off-by: Jane Chu 
> ---
>  drivers/dax/super.c | 15 +--
>  drivers/md/dm-linear.c  | 14 --
>  drivers/md/dm-log-writes.c  | 19 +++
>  drivers/md/dm-stripe.c  | 14 --
>  drivers/md/dm-target.c  |  2 +-
>  drivers/md/dm-writecache.c  |  8 +---
>  drivers/md/dm.c | 14 --
>  drivers/nvdimm/pmem.c   | 11 ++-
>  drivers/nvdimm/pmem.h   |  2 +-
>  drivers/s390/block/dcssblk.c| 13 -
>  fs/dax.c| 14 --
>  fs/fuse/dax.c   |  4 ++--
>  fs/fuse/virtio_fs.c | 12 
>  include/linux/dax.h | 18 +++---
>  include/linux/device-mapper.h   |  5 +++--
>  tools/testing/nvdimm/pmem-dax.c |  2 +-
>  16 files changed, 98 insertions(+), 69 deletions(-)
> 



> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 324363b798ec..931586df2905 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -9,6 +9,10 @@
>  /* Flag for synchronous flush */
>  #define DAXDEV_F_SYNC (1UL << 0)
>  
> +/* dax operation mode dynamically set by caller */
> +#define  DAX_OP_NORMAL   0
> +#define  DAX_OP_RECOVERY 1

Mostly looks ok to me, but since this is an operation mode, should this
be an enum instead of an int?

Granted I also think six arguments is a lot... though I don't really
see any better way to do this.

(Dunno, I spent all day running internal patches through the process
gauntlet so this is the remaining 2% of my brain speaking...)

--D

> +
>  typedef unsigned long dax_entry_t;
>  
>  struct dax_device;
> @@ -22,8 +26,8 @@ struct dax_operations {
>* logical-page-offset into an absolute physical pfn. Return the
>* number of pages available for DAX at that pfn.
>*/
> - long (*direct_access)(struct dax_device *, pgoff_t, long,
> - void **, pfn_t *);
> + long (*direct_access)(struct dax_device *, pgoff_t, long, int,
> + void **, pfn_t *);
>   /*
>* Validate whether this device is usable as an fsdax backing
>* device.
> @@ -32,10 +36,10 @@ struct dax_operations {
>   sector_t, sector_t);
>   /* copy_from_iter: required operation for fs-dax direct-i/o */
>   size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
> - struct iov_iter *);
> + struct iov_iter *, int);
>   /* copy_to_iter: required operation for fs-dax direct-i/o */
>   size_t (*copy_to_iter)(struct dax_device *, pgoff_t, void *, size_t,
> - struct iov_iter *);
> + struct iov_iter *, int);
>   /* zero_page_range: required operation. Zero page range   */
>   int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
>  };
> @@ -186,11 +190,11 @@ static inline void dax_read_unlock(int id)
>  bool dax_alive(struct dax_device *dax_dev);
>  void *dax_get_private(struct dax_device *dax_dev);
>  long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long 
> nr_pages,
> - void **kaddr, pfn_t *pfn);
> + int mode, void **kaddr, pfn_t *pfn);
>  size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void 
> *addr,
> - size_t bytes, struct iov_iter *i);
> + size_t bytes, struct iov_iter *i, int mode);
>  size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void 
> *addr,
> - size_t bytes, struct iov_iter *i);
> + size_t bytes, struct iov_iter *i, int mode);
>  int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
>   size_t nr_pages);
>  void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
> diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
> index a7df155ea49b..6596a8e0ceed 100644
> --- a/include/linux/device-mapper.h
> +++ b/include/linux/device-mapper.h
> @@ -146,9 +146,10 @@ typedef int (*dm_busy_fn) (struct dm_target *ti);
>   * >= 0 : the number of bytes accessible at the 

[PATCH v2 1/2] dax: Introduce normal and recovery dax operation modes

2021-11-05 Thread Jane Chu
Introduce DAX_OP_NORMAL and DAX_OP_RECOVERY operation modes to
{dax_direct_access, dax_copy_from_iter, dax_copy_to_iter}.
DAX_OP_NORMAL is the default or the existing mode, and
DAX_OP_RECOVERY is a new mode for data recovery purpose.

When dax-FS suspects dax media error might be encountered
on a read or write, it can enact the recovery mode read or write
by setting DAX_OP_RECOVERY in the aforementioned APIs. A read
in recovery mode attempts to fetch as much data as possible
until the first poisoned page is encountered. A write in recovery
mode attempts to clear poison(s) in a page-aligned range and
then write the user provided data over.

DAX_OP_NORMAL should be used for all non-recovery code path.

Signed-off-by: Jane Chu 
---
 drivers/dax/super.c | 15 +--
 drivers/md/dm-linear.c  | 14 --
 drivers/md/dm-log-writes.c  | 19 +++
 drivers/md/dm-stripe.c  | 14 --
 drivers/md/dm-target.c  |  2 +-
 drivers/md/dm-writecache.c  |  8 +---
 drivers/md/dm.c | 14 --
 drivers/nvdimm/pmem.c   | 11 ++-
 drivers/nvdimm/pmem.h   |  2 +-
 drivers/s390/block/dcssblk.c| 13 -
 fs/dax.c| 14 --
 fs/fuse/dax.c   |  4 ++--
 fs/fuse/virtio_fs.c | 12 
 include/linux/dax.h | 18 +++---
 include/linux/device-mapper.h   |  5 +++--
 tools/testing/nvdimm/pmem-dax.c |  2 +-
 16 files changed, 98 insertions(+), 69 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c0910687fbcb..90cae9d84b9c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -110,6 +110,7 @@ enum dax_device_flags {
  * @dax_dev: a dax_device instance representing the logical memory range
  * @pgoff: offset in pages from the start of the device to translate
  * @nr_pages: number of consecutive pages caller can handle relative to @pfn
+ * @mode: indicate whether dax operation is in normal or recovery mode
  * @kaddr: output parameter that returns a virtual address mapping of pfn
  * @pfn: output parameter that returns an absolute pfn translation of @pgoff
  *
@@ -117,7 +118,7 @@ enum dax_device_flags {
  * pages accessible at the device relative @pgoff.
  */
 long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long 
nr_pages,
-   void **kaddr, pfn_t *pfn)
+   int mode, void **kaddr, pfn_t *pfn)
 {
long avail;
 
@@ -131,7 +132,7 @@ long dax_direct_access(struct dax_device *dax_dev, pgoff_t 
pgoff, long nr_pages,
return -EINVAL;
 
avail = dax_dev->ops->direct_access(dax_dev, pgoff, nr_pages,
-   kaddr, pfn);
+   mode, kaddr, pfn);
if (!avail)
return -ERANGE;
return min(avail, nr_pages);
@@ -139,22 +140,24 @@ long dax_direct_access(struct dax_device *dax_dev, 
pgoff_t pgoff, long nr_pages,
 EXPORT_SYMBOL_GPL(dax_direct_access);
 
 size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void 
*addr,
-   size_t bytes, struct iov_iter *i)
+   size_t bytes, struct iov_iter *i, int mode)
 {
if (!dax_alive(dax_dev))
return 0;
 
-   return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i);
+   return dax_dev->ops->copy_from_iter(dax_dev, pgoff, addr, bytes, i,
+   mode);
 }
 EXPORT_SYMBOL_GPL(dax_copy_from_iter);
 
 size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
-   size_t bytes, struct iov_iter *i)
+   size_t bytes, struct iov_iter *i, int mode)
 {
if (!dax_alive(dax_dev))
return 0;
 
-   return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i);
+   return dax_dev->ops->copy_to_iter(dax_dev, pgoff, addr, bytes, i,
+   mode);
 }
 EXPORT_SYMBOL_GPL(dax_copy_to_iter);
 
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 90de42f6743a..c73ac6b98801 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -173,27 +173,29 @@ static struct dax_device *linear_dax_pgoff(struct 
dm_target *ti, pgoff_t *pgoff)
 }
 
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
-   long nr_pages, void **kaddr, pfn_t *pfn)
+   long nr_pages, int mode, void **kaddr, pfn_t *pfn)
 {
struct dax_device *dax_dev = linear_dax_pgoff(ti, );
 
-   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
+   return dax_direct_access(dax_dev, pgoff, nr_pages, mode, kaddr, pfn);
 }
 
 static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
-   void *addr, size_t bytes, struct iov_iter *i)
+   void *addr, size_t bytes, struct iov_iter *i,
+   int mode)
 {
struct 

[PATCH v2 2/2] dax,pmem: Implement pmem based dax data recovery

2021-11-05 Thread Jane Chu
For /dev/pmem based dax, enable DAX_OP_RECOVERY mode for
dax_direct_access to translate 'kaddr' over a range that
may contain poison(s); and enable dax_copy_to_iter to
read as much data as possible up till a poisoned page is
encountered; and enable dax_copy_from_iter to clear poison
among a page-aligned range, and then write the good data over.

Signed-off-by: Jane Chu 
---
 drivers/md/dm.c   |  2 ++
 drivers/nvdimm/pmem.c | 75 ---
 fs/dax.c  | 24 +++---
 3 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index dc354db22ef9..9b3dac916f22 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1043,6 +1043,7 @@ static size_t dm_dax_copy_from_iter(struct dax_device 
*dax_dev, pgoff_t pgoff,
if (!ti)
goto out;
if (!ti->type->dax_copy_from_iter) {
+   WARN_ON(mode == DAX_OP_RECOVERY);
ret = copy_from_iter(addr, bytes, i);
goto out;
}
@@ -1067,6 +1068,7 @@ static size_t dm_dax_copy_to_iter(struct dax_device 
*dax_dev, pgoff_t pgoff,
if (!ti)
goto out;
if (!ti->type->dax_copy_to_iter) {
+   WARN_ON(mode == DAX_OP_RECOVERY);
ret = copy_to_iter(addr, bytes, i);
goto out;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 3dc99e0bf633..8ae6aa678c51 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -260,7 +260,7 @@ __weak long __pmem_direct_access(struct pmem_device *pmem, 
pgoff_t pgoff,
resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset;
 
if (unlikely(is_bad_pmem(>bb, PFN_PHYS(pgoff) / 512,
-   PFN_PHYS(nr_pages
+PFN_PHYS(nr_pages)) && mode == DAX_OP_NORMAL))
return -EIO;
 
if (kaddr)
@@ -303,20 +303,85 @@ static long pmem_dax_direct_access(struct dax_device 
*dax_dev,
 }
 
 /*
- * Use the 'no check' versions of copy_from_iter_flushcache() and
- * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
- * checking, both file offset and device offset, is handled by
- * dax_iomap_actor()
+ * Even though the 'no check' versions of copy_from_iter_flushcache()
+ * and copy_mc_to_iter() are used to bypass HARDENED_USERCOPY overhead,
+ * 'read'/'write' aren't always safe when poison is consumed. They happen
+ * to be safe because the 'read'/'write' range has been guaranteed
+ * be free of poison(s) by a prior call to dax_direct_access() on the
+ * caller stack.
+ * But on a data recovery code path, the 'read'/'write' range is expected
+ * to contain poison(s), and so poison(s) is explicit checked, such that
+ * 'read' can fetch data from clean page(s) up till the first poison is
+ * encountered, and 'write' requires the range be page aligned in order
+ * to restore the poisoned page's memory type back to "rw" after clearing
+ * the poison(s).
+ * In the event of poison related failure, (size_t) -EIO is returned and
+ * caller may check the return value after casting it to (ssize_t).
+ *
+ * TODO: add support for CPUs that support MOVDIR64B instruction for
+ * faster poison clearing, and possibly smaller error blast radius.
  */
 static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i, int mode)
 {
+   phys_addr_t pmem_off;
+   size_t len, lead_off;
+   struct pmem_device *pmem = dax_get_private(dax_dev);
+   struct device *dev = pmem->bb.dev;
+
+   if (unlikely(mode == DAX_OP_RECOVERY)) {
+   lead_off = (unsigned long)addr & ~PAGE_MASK;
+   len = PFN_PHYS(PFN_UP(lead_off + bytes));
+   if (is_bad_pmem(>bb, PFN_PHYS(pgoff) / 512, len)) {
+   if (lead_off || !(PAGE_ALIGNED(bytes))) {
+   dev_warn(dev, "Found poison, but addr(%p) 
and/or bytes(%#lx) not page aligned\n",
+   addr, bytes);
+   return (size_t) -EIO;
+   }
+   pmem_off = PFN_PHYS(pgoff) + pmem->data_offset;
+   if (pmem_clear_poison(pmem, pmem_off, bytes) !=
+   BLK_STS_OK)
+   return (size_t) -EIO;
+   }
+   }
+
return _copy_from_iter_flushcache(addr, bytes, i);
 }
 
 static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i, int mode)
 {
+   int num_bad;
+   size_t len, lead_off;
+   unsigned long bad_pfn;
+   bool bad_pmem = false;
+   size_t adj_len = bytes;
+   sector_t sector, first_bad;
+   struct pmem_device *pmem = dax_get_private(dax_dev);
+   struct device *dev = pmem->bb.dev;
+
+   if (unlikely(mode == 

[PATCH v2 0/2] Dax poison recovery

2021-11-05 Thread Jane Chu
Up till now, the method commonly used for data recovery from
dax media error has been a combination of hole-punch followed
by a pwrite(2). The downside of this method is that it causes
fragmentation of the pmem backend, which brings a host of
very challenging issues.

This patch is an attempt to provide an alternative way for
dax users to perform data recovery from pmem media error without
fragmenting the pmem backend.

Dax media error may be manifested to user process via SIGBUS
with .si_code BUS_MCEERR_AR when a load instruction consumes
a poison in the media, or SIGBUS with .si_code BUS_ADRERR when
a page fault handler fails to resolve due to existing poison
record, or IO error returned from a block read or write.

With the patch in place, the way for user process to recover
the data can be just a pwrite(2) to a page aligned range without
the need to punch-a-hole. In case of BUS_MCEERR_AR, the range
is incidated by the signal payload: .si_addr and .si_addr_lsb.
In case of BUS_ADRERR, the range is the user mapping page size
starting from .si_addr. If the clue of media error come from
block IO error, the range is a stretch of the block IO range
to page aligned range.

Changes from v1:
- instead of giving user the say to start dax data recovery,
  let dax-filesystem decide when to enter data recovery mode.
  Hence 99% of the non-dax usage of pwrite and its variants
  aren't aware of dax specific recovering.
- Instead of exporting dax_clear_error() API that does one thing,
  combine dax {poison-clearing, error-record-update, writing-good-data}
  in one tight operation to better protect data integrity.
- some semantics and format fixes

v1: 
https://lore.kernel.org/lkml/20211029223233.gb449...@dread.disaster.area/T/
  
Jane Chu (2):
  dax: Introduce normal and recovery dax operation modes
  dax,pmem: Implement pmem based dax data recovery

 drivers/dax/super.c | 15 +++---
 drivers/md/dm-linear.c  | 14 +++---
 drivers/md/dm-log-writes.c  | 19 +---
 drivers/md/dm-stripe.c  | 14 +++---
 drivers/md/dm-target.c  |  2 +-
 drivers/md/dm-writecache.c  |  8 +--
 drivers/md/dm.c | 16 +++---
 drivers/nvdimm/pmem.c   | 86 +
 drivers/nvdimm/pmem.h   |  2 +-
 drivers/s390/block/dcssblk.c| 13 +++--
 fs/dax.c| 32 +---
 fs/fuse/dax.c   |  4 +-
 fs/fuse/virtio_fs.c | 12 +++--
 include/linux/dax.h | 18 ---
 include/linux/device-mapper.h   |  5 +-
 tools/testing/nvdimm/pmem-dax.c |  2 +-
 16 files changed, 187 insertions(+), 75 deletions(-)

-- 
2.18.4




Re: [dm-devel] [PATCH] libmultipath: add path wildcard "%I" for init state

2021-11-05 Thread Benjamin Marzinski
On Fri, Nov 05, 2021 at 12:03:12PM +0100, mwi...@suse.com wrote:
> From: Martin Wilck 
Reviewed-by: Benjamin Marzinski 
> 
> Enable printing pp->initialized with 'multipathd show paths format "%I"'.
> This is supposed to go on top of Ben's "multipathd: remove udev settle 
> dependency" series, to simplify checking multipathd's state.
> 
> ---
>  libmultipath/print.c   | 21 +
>  libmultipath/structs.h |  1 +
>  2 files changed, 22 insertions(+)
> 
> diff --git a/libmultipath/print.c b/libmultipath/print.c
> index 2fb9f4e..b5b9905 100644
> --- a/libmultipath/print.c
> +++ b/libmultipath/print.c
> @@ -504,6 +504,26 @@ snprint_dm_path_state (struct strbuf *buff, const struct 
> path * pp)
>   }
>  }
>  
> +static int snprint_initialized(struct strbuf *buff, const struct path * pp)
> +{
> + static const char *init_state_name[] = {
> + [INIT_NEW] = "new",
> + [INIT_FAILED] = "failed",
> + [INIT_MISSING_UDEV] = "udev-missing",
> + [INIT_REQUESTED_UDEV] = "udev-requested",
> + [INIT_OK] = "ok",
> + [INIT_REMOVED] = "removed",
> + [INIT_PARTIAL] = "partial",
> + };
> + const char *str;
> +
> + if (pp->initialized < INIT_NEW || pp->initialized >= __INIT_LAST)
> + str = "undef";
> + else
> + str = init_state_name[pp->initialized];
> + return append_strbuf_str(buff, str);
> +}
> +
>  static int
>  snprint_vpr (struct strbuf *buff, const struct path * pp)
>  {
> @@ -804,6 +824,7 @@ struct path_data pd[] = {
>   {'g', "vpd page data", 0, snprint_path_vpd_data},
>   {'0', "failures",  0, snprint_path_failures},
>   {'P', "protocol",  0, snprint_path_protocol},
> + {'I', "init_st",   0, snprint_initialized},
>   {0, NULL, 0 , NULL}
>  };
>  
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 69409fd..d8c24b5 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -206,6 +206,7 @@ enum initialized_states {
>* change uevent is received.
>*/
>   INIT_PARTIAL,
> + __INIT_LAST,
>  };
>  
>  enum prkey_sources {
> -- 
> 2.33.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] multipath: fix exit status of multipath -T

2021-11-05 Thread Benjamin Marzinski
On Fri, Nov 05, 2021 at 12:01:27PM +0100, mwi...@suse.com wrote:
> From: Martin Wilck 
> 
> We must set the return value in configure().
Reviewed-by: Benjamin Marzinski 
> ---
>  multipath/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/multipath/main.c b/multipath/main.c
> index 65ece83..b2d300e 100644
> --- a/multipath/main.c
> +++ b/multipath/main.c
> @@ -560,6 +560,7 @@ configure (struct config *conf, enum mpath_cmds cmd,
>  
>   dump_config(conf, hwes, curmp);
>   vector_free(hwes);
> + r = RTVL_OK;
>   goto out;
>   }
>  
> -- 
> 2.33.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm

2021-11-05 Thread Benjamin Marzinski
On Fri, Nov 05, 2021 at 10:55:11AM +, Martin Wilck wrote:
> Hi Ben,
> 
> On Thu, 2021-11-04 at 23:10 +0100, Martin Wilck wrote:
> > On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > > ...
> > 
> > > 
> > > Multipath now has a new state to deal with these devices,
> > > INIT_PARTIAL.
> > > Devices in this state are treated mostly like INIT_OK devices, but
> > > when "multipathd add path" is called or an add/change uevent
> > > happens
> > > on these devices, multipathd will finish initializing them, and
> > > remove
> > > them if necessary.
> > > 
> > > Signed-off-by: Benjamin Marzinski 
> > 
> > Nice. Somehow in my mind the issue always look much more complex.
> > I like this, but I want to give it some testing before I finally ack
> > it.
> 
> I noted that running "multipathd add path $path" for a path in
> INIT_PARTIAL state changes this paths's init state to INIT_OK, even if
> the udev still has no information about it (*).

Ah. Didn't think about that.
 
> The reason is that pp->wwid is set, and thus pathinfo() won't even try
> to retrieve the WWID, although for INIT_PARTIAL paths the origin of the
> WWID is not 100% trustworthy (being just copied from pp->mpp-
> >wwid). pathinfo() will return PATHINFO_OK without having retrieved the
> WWID. 
> 
> I suppose we could apply a similar logic as in uev_update_path() here,
> clearing pp->wwid before calling pathinfo(). If udev was still unaware
> of the path, this would mean a transition from INIT_PARTIAL to
> INIT_MISSING_UDEV. OTOH, we'd now need to be prepared to have to remove
> the path from the map if the WWID doesn't match after the call to
> pathinfo(). This means we'd basically need to reimplement the "changed
> WWID" logic from uev_update_path(), and thus we'd need to unify both
> code paths.
> 
> In general, the difference between INIT_MISSING_UDEV and INIT_PARTIAL
> is subtle. Correct me if I'm wrong, but INIT_PARTIAL means "udev has no
> information about this device" and INIT_MISSING_UDEV currently means
> "we weren't able to figure out a WWID for the path" (meaning that
> INIT_MISSING_UDEV is a misnomer - when fallback are allowed,
> INIT_MISSING_UDEV is actually independent of the device's state in the
> udev db. We should rename this to INIT_MISSING_WWID, perhaps).

Yeah. makes sense.

> The
> semantics of the two states are very different though:
> INIT_MISSING_UDEV will trigger an attempt to regenerate an uevent,
> whereas INIT_PARTIAL will just stick passively. IMO it'd make sense to
> trigger an uevent in the INIT_PARTIAL case, too, albeit perhaps not
> immediately (after the default uevent timeout - 180s?). 

Sure. We do want to wait long enough to be fairly sure that udev isn't
just being a little bit slow.

This would also handle the case where multipathd simply wasn't
tracking the path for some reason. If the device doesn't exist in the
udev database at all, do with send an "add" event instead of a "change"
event?

> Also, we currently don't track the udev state of devices cleanly. We
> set INIT_MISSING_UDEV if we can't obtain uid_attribute, which doesn't
> necessarily mean that udev is unaware of the device. I believe we
> should e.g. check the USEC_INITIALIZED property - it is non-NULL if and
> only if the device is present in the udev db. If uid_attribute isn't
> set regardless, we could conclude that the udev rules just don't set
> it. It might make sense to retrigger *one* uevent in that case, but no
> more.

IIRC, the initial reason why INIT_MISSING_UDEV was added was because
sometimes udev workers timed out, causing them to not run all the rules.
Do you know offhand if USEC_INITIALIZED is set in this case? If we could
differentiate between the following states:

1: udev hasn't gotten an event for a device
2: udev got an event but timed out or failed while processing it
3: udev successfully processed the event for the device, but multipath
   isn't seeing the attributes it was expecting.

We could react more sensibly.

-Ben

> IMO we need a clear definition of the INIT_xyz states and their
> transitions. We won't get along with intuitive logic any more.
> 
> Cheers,
> Martin
> 
> (*) my test procedure is simply to delete the paths' files from
> /run/udev/data and to restart multipathd.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 10:10:18PM +, Martin Wilck wrote:
> On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > When paths are added by update_pathvec_from_dm(), udev may not have
> > initialized them. This means that it's possible that they are
> > supposed
> > to be blacklisted by udev properties, but weren't.  Also, in order to
> > avoid doing potentially stalling IO, update_pathvec_from_dm() doesn't
> > get all the path information in pathinfo(). These paths end up in the
> > unexpected state of INIT_MISSING_UDEV or INIT_NEW, but with their mpp
> > and wwid set.
> > 
> > If udev has already initialized the path, but multipathed wasn't
> > monitoring it, the blacklist checks and wwid determination in
> > update_pathvec_from_dm() will work correctly, so paths added by it
> > are
> > safe, but not completely initialized. The most likely reason why this
> > would happen is if the path was manually removed from multipathd
> > monitoring with "multipathd del path".
> 
> Not quite getting this - normally the path would be removed from maps,
> too. Are you referring to the REMOVE_PATH_DELAY case?

Yeah, but if the user then runs "multipath", it will add that no longer
tracked path back to the multipath device. Like I said in 0/7, I'm not
sure that how much we need to proatively try to initalize these paths,
or if we can just ignore them on the grounds that sometimes when people
shoot themselves in the foot, it hurts.

-Ben

> 
> 
> >  The most common time when
> > uninitialized paths would in be part of multipath devices is during
> > boot, after the pivot root, but before the udev coldplug happens.
> > These
> > paths are not necessarily safe. It's possible that
> > /etc/multipath.conf
> > in the initramfs and regular filesystem differ, and they should now
> > be
> > either blacklisted by udev_property, or have a different wwid.
> > However
> > an "add" event should appear for them shortly.
> > 
> > Multipath now has a new state to deal with these devices,
> > INIT_PARTIAL.
> > Devices in this state are treated mostly like INIT_OK devices, but
> > when "multipathd add path" is called or an add/change uevent happens
> > on these devices, multipathd will finish initializing them, and
> > remove
> > them if necessary.
> > 
> > Signed-off-by: Benjamin Marzinski 
> 
> Nice. Somehow in my mind the issue always look much more complex.
> I like this, but I want to give it some testing before I finally ack
> it.
> 
> Regards
> Martin
> 
> 
> 
> 
> > ---
> >  libmultipath/structs.h |  6 +
> >  libmultipath/structs_vec.c |  5 ++--
> >  multipathd/cli_handlers.c  |  4 
> >  multipathd/main.c  | 48 ++--
> > --
> >  multipathd/main.h  |  1 +
> >  5 files changed, 58 insertions(+), 6 deletions(-)
> > 
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index d0b266b7..69409fd4 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -200,6 +200,12 @@ enum initialized_states {
> >  * mapped by some multipath map because of map reload
> > failure.
> >  */
> > INIT_REMOVED,
> > +   /*
> > +    * INIT_PARTIAL: paths added by update_pathvec_from_dm() will
> > not
> > +    * be fully initialized. This will be handled when an add or
> > +    * change uevent is received.
> > +    */
> > +   INIT_PARTIAL,
> >  };
> >  
> >  enum prkey_sources {
> > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> > index fb26437a..1de9175e 100644
> > --- a/libmultipath/structs_vec.c
> > +++ b/libmultipath/structs_vec.c
> > @@ -194,6 +194,7 @@ bool update_pathvec_from_dm(vector pathvec,
> > struct multipath *mpp,
> > }
> > condlog(2, "%s: adding new
> > path %s",
> > mpp->alias, pp->dev);
> > +   pp->initialized =
> > INIT_PARTIAL;
> > store_path(pathvec, pp);
> > pp->tick = 1;
> > }
> > @@ -392,12 +393,12 @@ extract_hwe_from_path(struct multipath * mpp)
> > condlog(4, "%s: searching paths for valid hwe", mpp->alias);
> > /* doing this in two passes seems like paranoia to me */
> > vector_foreach_slot(mpp->paths, pp, i) {
> > -   if (pp->state == PATH_UP &&
> > +   if (pp->state == PATH_UP && pp->initialized !=
> > INIT_PARTIAL &&
> >     pp->initialized != INIT_REMOVED && pp->hwe)
> > goto done;
> > }
> > vector_foreach_slot(mpp->paths, pp, i) {
> > -   if (pp->state != PATH_UP &&
> > +   if ((pp->state != PATH_UP || pp->initialized ==
> > INIT_PARTIAL) &&
> >     pp->initialized != INIT_REMOVED && pp->hwe)
> > goto done;
> > 

Re: [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 10:00:11PM +, Martin Wilck wrote:
> On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > So, it turns out that commit 4ef67017 "libmultipath: add
> > update_pathvec_from_dm()" already does most of the hard work of making
> > multipath handle the uninitialized paths that exist during boot, after
> > the switch root, but before the all the coldplug uevents have been
> > processed. The only problem is that this code leaves the paths in a
> > partially initialized state, which doesn't completely get corrected
> > until
> > a reconfigure happens. 
> > 
> > This patchset makes multipath track these partially initailized paths,
> > and makes sure that they get fully initialized, or removed, as
> > necessary.
> > 
> > The first patch is a tangentially related bug I found when trying
> > (unsuccessfully) to recreate multipathd's problem with dropping
> > uninitialized paths. Multipathd was not removing completely missing
> > paths from maps that it found while starting up. The solution I chose
> > was just to make sure that a full reload will happen on these maps,
> > even
> > if a weak reconfigure was requested. However, this means that multipath
> > still completely ignores these missing paths. A side effect of this is
> > that "multipath -l" does not show these paths, even though they exist
> > as
> > part of the multipath table. The full reloads are necessary, to take
> > care of issues that update_pathvec_from_dm() can run into, but we might
> > also want to think about marking these missing paths as INIT_REMOVED,
> > instead of not adding them at all. This would make "multipath -l" still
> > show them, until they actually get removed from the table.
> 
> Yes, that might be a good thing. Completely missing paths (not existing
> in sysfs) in a map represent a dangerous condition; it would be good if
> multipath -l was able to tell the user about it.

Sure, but I can do that as a separate patch?
 
> > 
> > Patch 0005 makes sure to fully initialize the paths when their coldplug
> > event occurs, but if the path is already fully initialized in udev, but
> > multipathd finds out about it from update_pathvec_from_dm(), multipath
> > doesn't do anything to finish initializing the path and moving it to
> > INIT_OK, besides waiting for a uevent or a reconfigure. This is
> > probably
> > fine, since the only way I can see for a path to be in this state is
> > for
> > someone to manually remove the path from multipathd monitoring. But if
> > I'm missing some other way that paths could end up in this state, we
> > could try proactively finishing the initalization of INIT_PARTIAL paths
> > that have all their udev information.
> 
> One option would be to try finishing the initialization in the path
> checker.

Yep. Something like that is what I was thinking.  But I'm still trying
to come up with a way that paths could get into this state without
someone running something like:

# multipathd del path sda
# mutipath

And I'm not sure how much code we want to add just to handle something
contrived like this.

> What about the checker, in general?  Should we take some care that
> partially initialized paths aren't mistakenly set as failed? I'm not
> sure if libudev is able to return a proper fd from
> udev_device_get_devnode() if the device isn't yet initialized in the
> udev db. Without a proper fd, the checker is doomed to fail. And other
> failure modes are possible too without proper udev initialization, I
> suppose?
> 

Since this is a case where the device nodes do exist, since they were
already made into multipath devices, I didn't run into any problems with
checking INIT_PARTIAL paths.

-Ben

> > 
> > I'm also kind of on the fence about patch 0006. There is no reason
> > why
> > we have to remove INIT_PARTIAL paths if the multipath device goes
> > away.
> > But if a path is in INIT_PARTIAL for long enough that the multipath
> > device gets removed, then it's likely not a path we want to be
> > monitoring, and if a uevent occurs, we'll add it back, anyway. Also,
> > knowing that INIT_PARTIAL paths are always part of multipath devices
> > does make the code simpler.
> > 
> > I've tested these patches both by rebooting with necessary and
> > unnecessary multipath devices in the initramfs and multipathd.service
> > set to make multipathd start up at various points after the switch
> > root,
> > and by manually sending remove uevents to unintialize some paths, and
> > then starting multipathd to test specific combinations of path
> > states.
> > But more testing is always welcome.
> 
> I'll try to give this code a test shortly.
> 
> Cheers,
> Martin
> 
> 
> > 
> > Benjamin Marzinski (7):
> >   multipathd: remove missing paths on startup
> >   libmultipath: skip unneeded steps to get path name
> >   libmultipath: don't use fallback wwid in update_pathvec_from_dm
> >   libmultipath: always set INIT_REMOVED in set_path_removed
> >   multipathd: fully initialize paths added by 

Re: [dm-devel] [PATCH 8/8] libmultipath: cleanup invalid config handling

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 09:11:34PM +, Martin Wilck wrote:
> On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > Add error reporting to the remaining config handlers. If the value is
> > invalid, do not change the existing config option's value.
> 
> Like for the previous patch, I'm unsure if this is wise. You rely on a
> reasonable default being set before the function is called. I suppose
> that's the case, but I like seeing the "invalid" value substituted
> right there where the validity is checked. That saves us from searching
> the code for the default value.
> 
> Maybe I overlooked an important rationale for not touching the values
> in the case of invalid input, please explain.

Since these handlers are only called if people put the corresponding
option in the config files, we had better have sensible defaults if
they're not called (or if they don't set anything).

I admit that I should take a look for cases were we cap an out-of-range
value, to see if it would make more sense to treat it as an invalid
value instead. Also, instead of accepting strings that are simply a
number, we should convert the string, and the check the actual number.
But I don't see any harm in simply ignoring the invalid values. It's no
different than if the user didn't put the invalid line into
multipath.conf

Not setting the values on garbage input makes the handlers more general.
If you have two options that work the same except that they have
different defaults, then by not explicitly setting the value to the
default when you have invalid input, one handler can be used for both
options. set_yes_no() is a good example.  Without my patch, it always
set the value to something, even if the input was garbage. But the
default value it set was "no". That had nothing to do with the default
value of the options that were using it. You could do extra work to make
sure that it would correctly use the option's default value, but you get
the same outcome, with simpler code, just by not changing the default if
you have a garbage value.

Also, many of the handlers never set the value on invalid input. I'm just
making that consistent across all of the handlers.

-Ben

> Cheers,
> Martin
> 
> >  Also print
> > an error whenever 0 is returned for an invalid value. When the
> > handler
> > returns 1, config processing already fails with an error message.
> > 
> > Signed-off-by: Benjamin Marzinski 
> > ---
> >  libmultipath/dict.c | 73 +++
> > --
> >  1 file changed, 51 insertions(+), 22 deletions(-)
> > 
> > diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> > index e79fcdd7..8c3b5f72 100644
> > --- a/libmultipath/dict.c
> > +++ b/libmultipath/dict.c
> > @@ -199,8 +199,11 @@ set_yes_no(vector strvec, void *ptr, const char
> > *file, int line_nr)
> >  
> > if (strcmp(buff, "yes") == 0 || strcmp(buff, "1") == 0)
> > *int_ptr = YN_YES;
> > -   else
> > +   else if (strcmp(buff, "no") == 0 || strcmp(buff, "0") == 0)
> > *int_ptr = YN_NO;
> > +   else
> > +   condlog(1, "%s line %d, invalid value for %s:
> > \"%s\"",
> > +   file, line_nr, (char*)VECTOR_SLOT(strvec, 0),
> > buff);
> >  
> > FREE(buff);
> > return 0;
> > @@ -221,7 +224,8 @@ set_yes_no_undef(vector strvec, void *ptr, const
> > char *file, int line_nr)
> > else if (strcmp(buff, "yes") == 0 || strcmp(buff, "1") == 0)
> > *int_ptr = YNU_YES;
> > else
> > -   *int_ptr = YNU_UNDEF;
> > +   condlog(1, "%s line %d, invalid value for %s:
> > \"%s\"",
> > +   file, line_nr, (char*)VECTOR_SLOT(strvec, 0),
> > buff);
> >  
> > FREE(buff);
> > return 0;
> > @@ -471,9 +475,6 @@ def_find_multipaths_handler(struct config *conf,
> > vector strvec,
> > char *buff;
> > int i;
> >  
> > -   if (set_yes_no_undef(strvec, >find_multipaths, file,
> > line_nr) == 0 && conf->find_multipaths != FIND_MULTIPATHS_UNDEF)
> > -   return 0;
> > -
> > buff = set_value(strvec);
> > if (!buff)
> > return 1;
> > @@ -486,9 +487,14 @@ def_find_multipaths_handler(struct config *conf,
> > vector strvec,
> > }
> > }
> >  
> > -   if (conf->find_multipaths == YNU_UNDEF) {
> > -   condlog(0, "illegal value for find_multipaths: %s",
> > buff);
> > -   conf->find_multipaths = DEFAULT_FIND_MULTIPATHS;
> > +   if (i >= __FIND_MULTIPATHS_LAST) {
> > +   if (strcmp(buff, "no") == 0 || strcmp(buff, "0") ==
> > 0)
> > +   conf->find_multipaths = FIND_MULTIPATHS_OFF;
> > +   else if (strcmp(buff, "yes") == 0 || strcmp(buff,
> > "1") == 0)
> > +   conf->find_multipaths = FIND_MULTIPATHS_ON;
> > +   else
> > +   condlog(1, "%s line %d, invalid value for
> > 

Re: [dm-devel] [PATCH 7/8] libmultipath: split set_int to enable reuse

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 08:54:11PM +, Martin Wilck wrote:
> On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > Split the code that does the actual value parsing out of set_int(),
> > into
> > a helper function, do_set_int(), so that it can be used by other
> > handlers. These functions no longer set the config value at all, when
> > they have invalid input.
> 
> Not sure about that, do_set_int() sets the value to the cap (see below)

Sorry for the confustion. That's not what I meant.  I meant that if
do_set_int() returns failure, we won't override the existing value in
the config.

> 
> > 
> > Signed-off-by: Benjamin Marzinski 
> > ---
> >  libmultipath/dict.c | 82 +--
> > --
> >  1 file changed, 46 insertions(+), 36 deletions(-)
> > 
> > diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> > index 91333068..e79fcdd7 100644
> > --- a/libmultipath/dict.c
> > +++ b/libmultipath/dict.c
> > @@ -31,17 +31,12 @@
> >  #include "strbuf.h"
> >  
> >  static int
> > -set_int(vector strvec, void *ptr, int min, int max, const char
> > *file,
> > -   int line_nr)
> > +do_set_int(vector strvec, void *ptr, int min, int max, const char
> > *file,
> > +   int line_nr, char *buff)
> >  {
> > int *int_ptr = (int *)ptr;
> > -   char *buff, *eptr;
> > +   char *eptr;
> > long res;
> > -   int rc;
> > -
> > -   buff = set_value(strvec);
> > -   if (!buff)
> > -   return 1;
> >  
> > res = strtol(buff, , 10);
> > if (eptr > buff)
> > @@ -50,17 +45,30 @@ set_int(vector strvec, void *ptr, int min, int
> > max, const char *file,
> > if (*buff == '\0' || *eptr != '\0') {
> > condlog(1, "%s line %d, invalid value for %s:
> > \"%s\"",
> > file, line_nr, (char*)VECTOR_SLOT(strvec, 0),
> > buff);
> > -   rc = 1;
> > -   } else {
> > -   if (res > max || res < min) {
> > -   res = (res > max) ? max : min;
> > -   condlog(1, "%s line %d, value for %s too %s,
> > capping at %ld",
> > +   return 1;
> > +   }
> > +   if (res > max || res < min) {
> > +   res = (res > max) ? max : min;
> > +   condlog(1, "%s line %d, value for %s too %s, capping
> > at %ld",
> > file, line_nr, (char*)VECTOR_SLOT(strvec, 0),
> > -   (res == max)? "large" : "small", res);
> > -   }
> > -   rc = 0;
> > -   *int_ptr = res;
> > +   (res == max)? "large" : "small", res);
> > }
> > +   *int_ptr = res;
> > +   return 0;
> > +}
> > +
> > +static int
> > +set_int(vector strvec, void *ptr, int min, int max, const char
> > *file,
> > +   int line_nr)
> > +{
> > +   char *buff;
> > +   int rc;
> > +
> > +   buff = set_value(strvec);
> > +   if (!buff)
> > +   return 1;
> > +
> > +   rc = do_set_int(strvec, ptr, min, max, file, line_nr, buff);
> >  
> > FREE(buff);
> > return rc;
> > @@ -918,6 +926,7 @@ declare_mp_attr_snprint(gid, print_gid)
> >  static int
> >  set_undef_off_zero(vector strvec, void *ptr, const char *file, int
> > line_nr)
> >  {
> > +   int rc = 0;
> > char * buff;
> > int *int_ptr = (int *)ptr;
> >  
> > @@ -927,10 +936,10 @@ set_undef_off_zero(vector strvec, void *ptr,
> > const char *file, int line_nr)
> >  
> > if (strcmp(buff, "off") == 0)
> > *int_ptr = UOZ_OFF;
> > -   else if (sscanf(buff, "%d", int_ptr) != 1 ||
> > -    *int_ptr < UOZ_ZERO)
> > -   *int_ptr = UOZ_UNDEF;
> > -   else if (*int_ptr == 0)
> > +   else
> > +   rc = do_set_int(strvec, int_ptr, 0, INT_MAX, file,
> > line_nr,
> > +   buff);
> > +   if (rc == 0 && *int_ptr == 0)
> > *int_ptr = UOZ_ZERO;
> >  
> > FREE(buff);
> > @@ -1082,14 +1091,12 @@ max_fds_handler(struct config *conf, vector
> > strvec, const char *file,
> > /* Assume safe limit */
> > max_fds = 4096;
> > }
> > -   if (strlen(buff) == 3 &&
> > -   !strcmp(buff, "max"))
> > -   conf->max_fds = max_fds;
> > -   else
> > -   conf->max_fds = atoi(buff);
> > -
> > -   if (conf->max_fds > max_fds)
> > +   if (!strcmp(buff, "max")) {
> > conf->max_fds = max_fds;
> > +   r = 0;
> > +   } else
> > +   r = do_set_int(strvec, >max_fds, 0, max_fds,
> > file,
> > +  line_nr, buff);
> >  
> > FREE(buff);
> >  
> > @@ -1158,6 +1165,7 @@ declare_mp_snprint(rr_weight, print_rr_weight)
> >  static int
> >  set_pgfailback(vector strvec, void *ptr, const char *file, int
> > line_nr)
> >  {
> > +   int rc = 0;
> > int *int_ptr = (int *)ptr;
> > char * buff;
> > 

Re: [dm-devel] [PATCH 6/8] libmultipath: improve checks for set_str

2021-11-05 Thread Benjamin Marzinski
On Fri, Nov 05, 2021 at 06:59:11AM +, Martin Wilck wrote:
> Hi Ben,
> 
> On Thu, 2021-11-04 at 21:34 +0100, Martin Wilck wrote:
> > On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > > multipath always requires absolute pathnames, so make sure all file
> > > and
> > > directory names start with a slash.  Also check that the
> > > directories
> > > exist.  Finally, some strings, like the alias, will be used in
> > > paths.
> > > These must not contain the slash character '/', since it is a
> > > forbidden
> > > character in file/directory names. This patch adds seperate
> > > handlers
> > > for
> > > these three cases. If a config line is invalid, these handlers
> > > retain
> > > the existing config string, if any.
> > > 
> > > Signed-off-by: Benjamin Marzinski 
> 
> I've changed my mind on this one. The options for directories and paths
> should be turned into buildtime options instead. If we do that, we
> don't need this sort of checks any more, except the "noslash" part.

That seems reasonable. I do want to ask around a little bit to see if I
can find anyone who is actually setting the directories. The only one I
really worry about is "config_dir". I worry that people might do
something like stick that on shared storage, to make it possible to
change the multipath config on a bunch of machines in one place.

-Ben

> 
> Regards,
> Martin
> 
> > 
> > Mostly OK, see remarks below. I'm a bit wary that when we start this,
> > we might need to do other checks as well. For example, as multipathd
> > is
> > running as root, we may want to check that the paths to files it
> > writes
> > to (bindings_file etc.) don't contain symlinks and have proper
> > permissions... But that'd be another patch.
> > 
> > Regards,
> > Martin
> > 
> > 
> > > ---
> > >  libmultipath/dict.c | 88 +++--
> > > --
> > > --
> > >  1 file changed, 78 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> > > index 1758bd26..91333068 100644
> > > --- a/libmultipath/dict.c
> > > +++ b/libmultipath/dict.c
> > > @@ -5,6 +5,8 @@
> > >   * Copyright (c) 2005 Kiyoshi Ueda, NEC
> > >   */
> > >  #include 
> > > +#include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include "checkers.h"
> > > @@ -111,6 +113,72 @@ set_str(vector strvec, void *ptr, const char
> > > *file, int line_nr)
> > > return 0;
> > >  }
> > >  
> > > +static int
> > > +set_dir(vector strvec, void *ptr, const char *file, int line_nr)
> > > +{
> > > +   char **str_ptr = (char **)ptr;
> > > +   char *old_str = *str_ptr;
> > > +   struct stat sb;
> > > +
> > > +   *str_ptr = set_value(strvec);
> > > +   if (!*str_ptr) {
> > > +   free(old_str);
> > > +   return 1;
> > > +   }
> > > +   if ((*str_ptr)[0] != '/'){
> > > +   condlog(1, "%s line %d, %s is not an absolute
> > > directory path. Ignoring", file, line_nr, *str_ptr);
> > > +   *str_ptr = old_str;
> > > +   } else {
> > > +   if (stat(*str_ptr, ) == 0 &&
> > > S_ISDIR(sb.st_mode))
> > > +   free(old_str);
> > > +   else {
> > > +   condlog(1, "%s line %d, %s is not an
> > > existing
> > > directory. Ignoring", file, line_nr, *str_ptr);
> > > +   *str_ptr = old_str;
> > > +   }
> > > +   }
> > > +   return 0;
> > > +}
> > > +
> > > +static int
> > > +set_path(vector strvec, void *ptr, const char *file, int line_nr)
> > > +{
> > > +   char **str_ptr = (char **)ptr;
> > > +   char *old_str = *str_ptr;
> > > +
> > > +   *str_ptr = set_value(strvec);
> > > +   if (!*str_ptr) {
> > > +   free(old_str);
> > > +   return 1;
> > > +   }
> > > +   if ((*str_ptr)[0] != '/'){
> > > +   condlog(1, "%s line %d, %s is not an absolute path.
> > > Ignoring",
> > > +   file, line_nr, *str_ptr);
> > > +   *str_ptr = old_str;
> > > +   } else
> > > +   free(old_str);
> > > +   return 0;
> > > +}
> > 
> > Once you go down this route, you might as well test that the dirname
> > of
> > the path is an existing directory.
> > 
> > 
> > 
> > > +
> > > +static int
> > > +set_str_noslash(vector strvec, void *ptr, const char *file, int
> > > line_nr)
> > > +{
> > > +   char **str_ptr = (char **)ptr;
> > > +   char *old_str = *str_ptr;
> > > +
> > > +   *str_ptr = set_value(strvec);
> > > +   if (!*str_ptr) {
> > > +   free(old_str);
> > > +   return 1;
> > > +   }
> > > +   if (strchr(*str_ptr, '/')) {
> > > +   condlog(1, "%s line %d, %s cannot contain a slash.
> > > Ignoring",
> > > +   file, line_nr, *str_ptr);
> > > +   *str_ptr = old_str;
> > > +   } else
> > > +   free(old_str);
> > > +   return 0;
> > > +}
> > > +
> > >  

Re: [dm-devel] [PATCH 6/8] libmultipath: improve checks for set_str

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 08:34:20PM +, Martin Wilck wrote:
> On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > multipath always requires absolute pathnames, so make sure all file
> > and
> > directory names start with a slash.  Also check that the directories
> > exist.  Finally, some strings, like the alias, will be used in paths.
> > These must not contain the slash character '/', since it is a
> > forbidden
> > character in file/directory names. This patch adds seperate handlers
> > for
> > these three cases. If a config line is invalid, these handlers retain
> > the existing config string, if any.
> > 
> > Signed-off-by: Benjamin Marzinski 
> 
> Mostly OK, see remarks below. I'm a bit wary that when we start this,
> we might need to do other checks as well. For example, as multipathd is
> running as root, we may want to check that the paths to files it writes
> to (bindings_file etc.) don't contain symlinks and have proper
> permissions... But that'd be another patch.
> 
> Regards,
> Martin
> 
> 
> > ---
> >  libmultipath/dict.c | 88 +++
> > --
> >  1 file changed, 78 insertions(+), 10 deletions(-)
> > 
> > diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> > index 1758bd26..91333068 100644
> > --- a/libmultipath/dict.c
> > +++ b/libmultipath/dict.c
> > @@ -5,6 +5,8 @@
> >   * Copyright (c) 2005 Kiyoshi Ueda, NEC
> >   */
> >  #include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include "checkers.h"
> > @@ -111,6 +113,72 @@ set_str(vector strvec, void *ptr, const char
> > *file, int line_nr)
> > return 0;
> >  }
> >  
> > +static int
> > +set_dir(vector strvec, void *ptr, const char *file, int line_nr)
> > +{
> > +   char **str_ptr = (char **)ptr;
> > +   char *old_str = *str_ptr;
> > +   struct stat sb;
> > +
> > +   *str_ptr = set_value(strvec);
> > +   if (!*str_ptr) {
> > +   free(old_str);
> > +   return 1;
> > +   }
> > +   if ((*str_ptr)[0] != '/'){
> > +   condlog(1, "%s line %d, %s is not an absolute
> > directory path. Ignoring", file, line_nr, *str_ptr);
> > +   *str_ptr = old_str;
> > +   } else {
> > +   if (stat(*str_ptr, ) == 0 && S_ISDIR(sb.st_mode))
> > +   free(old_str);
> > +   else {
> > +   condlog(1, "%s line %d, %s is not an existing
> > directory. Ignoring", file, line_nr, *str_ptr);
> > +   *str_ptr = old_str;
> > +   }
> > +   }
> > +   return 0;
> > +}
> > +
> > +static int
> > +set_path(vector strvec, void *ptr, const char *file, int line_nr)
> > +{
> > +   char **str_ptr = (char **)ptr;
> > +   char *old_str = *str_ptr;
> > +
> > +   *str_ptr = set_value(strvec);
> > +   if (!*str_ptr) {
> > +   free(old_str);
> > +   return 1;
> > +   }
> > +   if ((*str_ptr)[0] != '/'){
> > +   condlog(1, "%s line %d, %s is not an absolute path.
> > Ignoring",
> > +   file, line_nr, *str_ptr);
> > +   *str_ptr = old_str;
> > +   } else
> > +   free(old_str);
> > +   return 0;
> > +}
> 
> Once you go down this route, you might as well test that the dirname of
> the path is an existing directory.
> 

But they don't need to exist, since the multipath code will create the
missing directories.
 
> 
> > +
> > +static int
> > +set_str_noslash(vector strvec, void *ptr, const char *file, int
> > line_nr)
> > +{
> > +   char **str_ptr = (char **)ptr;
> > +   char *old_str = *str_ptr;
> > +
> > +   *str_ptr = set_value(strvec);
> > +   if (!*str_ptr) {
> > +   free(old_str);
> > +   return 1;
> > +   }
> > +   if (strchr(*str_ptr, '/')) {
> > +   condlog(1, "%s line %d, %s cannot contain a slash.
> > Ignoring",
> > +   file, line_nr, *str_ptr);
> > +   *str_ptr = old_str;
> > +   } else
> > +   free(old_str);
> > +   return 0;
> > +}
> > +
> >  static int
> >  set_yes_no(vector strvec, void *ptr, const char *file, int line_nr)
> >  {
> > @@ -353,13 +421,13 @@ declare_def_snprint(verbosity, print_int)
> >  declare_def_handler(reassign_maps, set_yes_no)
> >  declare_def_snprint(reassign_maps, print_yes_no)
> >  
> > -declare_def_handler(multipath_dir, set_str)
> > +declare_def_handler(multipath_dir, set_dir)
> >  declare_def_snprint(multipath_dir, print_str)
> >  
> >  static int def_partition_delim_handler(struct config *conf, vector
> > strvec,
> >    const char *file, int line_nr)
> >  {
> > -   int rc = set_str(strvec, >partition_delim, file,
> > line_nr);
> > +   int rc = set_str_noslash(strvec, >partition_delim,
> > file, line_nr);
> >  
> > if (rc != 0)
> > return rc;
> > @@ -490,11 +558,11 @@ declare_hw_snprint(prio_name, print_str)

Re: [dm-devel] [PATCH 5/8] libmultipath: make set_int take a range for valid values

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 08:34:33PM +, Martin Wilck wrote:
> On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > If a value outside of the valid range is passed to set_int, it caps
> > the
> > value at appropriate limit, and issues a warning.
> > 
> > Signed-off-by: Benjamin Marzinski 
> 
> One nitpick below.
> 
> It's a lot of code changes for just two cases where we have nontrivial
> values for min and max. I guess for uint the count of nontrivial cases
> was zero?

Yeah. all the uint cases accepted the entire UINT range.  As you've
seen, I end up using the int range checking for more functions later.
 
> Yet it's an improvement, so I'm going to ack when the nit is fixed.
> 
> Martin
> 
> 
> 
> > ---
> >  libmultipath/dict.c | 121 +++---
> > --
> >  1 file changed, 75 insertions(+), 46 deletions(-)
> > 
> > diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> > index eb2c44c0..1758bd26 100644
> > --- a/libmultipath/dict.c
> > +++ b/libmultipath/dict.c
> > ...
> > @@ -298,7 +347,7 @@ declare_def_snprint(checkint, print_int)
> >  declare_def_handler(max_checkint, set_uint)
> >  declare_def_snprint(max_checkint, print_int)
> >  
> > -declare_def_handler(verbosity, set_int)
> > +declare_def_range_handler(verbosity, 0, 4)
> 
> You should use MAX_VERBOSITY here.

Sure.

-Ben

> 
> >  declare_def_snprint(verbosity, print_int)
> >  
> >  declare_def_handler(reassign_maps, set_yes_no)
> > @@ -473,22 +522,22 @@ declare_ovr_snprint(checker_name, print_str)
> >  declare_hw_handler(checker_name, set_str)
> >  declare_hw_snprint(checker_name, print_str)
> >  
> > -declare_def_handler(minio, set_int)
> > +declare_def_range_handler(minio, 0, INT_MAX)
> >  declare_def_snprint_defint(minio, print_int, DEFAULT_MINIO)
> > -declare_ovr_handler(minio, set_int)
> > +declare_ovr_range_handler(minio, 0, INT_MAX)
> >  declare_ovr_snprint(minio, print_nonzero)
> > -declare_hw_handler(minio, set_int)
> > +declare_hw_range_handler(minio, 0, INT_MAX)
> >  declare_hw_snprint(minio, print_nonzero)
> > -declare_mp_handler(minio, set_int)
> > +declare_mp_range_handler(minio, 0, INT_MAX)
> >  declare_mp_snprint(minio, print_nonzero)
> >  
> > -declare_def_handler(minio_rq, set_int)
> > +declare_def_range_handler(minio_rq, 0, INT_MAX)
> >  declare_def_snprint_defint(minio_rq, print_int, DEFAULT_MINIO_RQ)
> > -declare_ovr_handler(minio_rq, set_int)
> > +declare_ovr_range_handler(minio_rq, 0, INT_MAX)
> >  declare_ovr_snprint(minio_rq, print_nonzero)
> > -declare_hw_handler(minio_rq, set_int)
> > +declare_hw_range_handler(minio_rq, 0, INT_MAX)
> >  declare_hw_snprint(minio_rq, print_nonzero)
> > -declare_mp_handler(minio_rq, set_int)
> > +declare_mp_range_handler(minio_rq, 0, INT_MAX)
> >  declare_mp_snprint(minio_rq, print_nonzero)
> >  
> >  declare_def_handler(queue_without_daemon, set_yes_no)
> > @@ -512,7 +561,7 @@ snprint_def_queue_without_daemon(struct config
> > *conf, struct strbuf *buff,
> > return append_strbuf_quoted(buff, qwd);
> >  }
> >  
> > -declare_def_handler(checker_timeout, set_int)
> > +declare_def_range_handler(checker_timeout, 0, INT_MAX)
> >  declare_def_snprint(checker_timeout, print_nonzero)
> >  
> >  declare_def_handler(allow_usb_devices, set_yes_no)
> > @@ -583,13 +632,13 @@ declare_hw_snprint(deferred_remove,
> > print_yes_no_undef)
> >  declare_mp_handler(deferred_remove, set_yes_no_undef)
> >  declare_mp_snprint(deferred_remove, print_yes_no_undef)
> >  
> > -declare_def_handler(retrigger_tries, set_int)
> > +declare_def_range_handler(retrigger_tries, 0, INT_MAX)
> >  declare_def_snprint(retrigger_tries, print_int)
> >  
> > -declare_def_handler(retrigger_delay, set_int)
> > +declare_def_range_handler(retrigger_delay, 0, INT_MAX)
> >  declare_def_snprint(retrigger_delay, print_int)
> >  
> > -declare_def_handler(uev_wait_timeout, set_int)
> > +declare_def_range_handler(uev_wait_timeout, 0, INT_MAX)
> >  declare_def_snprint(uev_wait_timeout, print_int)
> >  
> >  declare_def_handler(strict_timing, set_yes_no)
> > @@ -616,19 +665,19 @@ static int
> > snprint_def_disable_changed_wwids(struct config *conf,
> > return print_ignored(buff);
> >  }
> >  
> > -declare_def_handler(remove_retries, set_int)
> > +declare_def_range_handler(remove_retries, 0, INT_MAX)
> >  declare_def_snprint(remove_retries, print_int)
> >  
> > -declare_def_handler(max_sectors_kb, set_int)
> > +declare_def_range_handler(max_sectors_kb, 0, INT_MAX)
> >  declare_def_snprint(max_sectors_kb, print_nonzero)
> > -declare_ovr_handler(max_sectors_kb, set_int)
> > +declare_ovr_range_handler(max_sectors_kb, 0, INT_MAX)
> >  declare_ovr_snprint(max_sectors_kb, print_nonzero)
> > -declare_hw_handler(max_sectors_kb, set_int)
> > +declare_hw_range_handler(max_sectors_kb, 0, INT_MAX)
> >  declare_hw_snprint(max_sectors_kb, print_nonzero)
> > -declare_mp_handler(max_sectors_kb, set_int)
> > +declare_mp_range_handler(max_sectors_kb, 0, INT_MAX)
> >  declare_mp_snprint(max_sectors_kb, 

Re: [dm-devel] [PATCH 1/8] libmulitpath: add section name to invalid keyword output

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 08:55:39PM +, Martin Wilck wrote:
> On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > If users forget the closing brace for a section in multipath.conf,
> > multipath has no way to detect that. When it sees the keyword at the
> > start of the next section, it will complain that there is an invalid
> > keyword, because that keyword doesn't belong in previous section
> > (which
> > was never ended with a closing brace). This can confuse users. To
> > make
> > this easier to understand, when multipath prints and invalid keyword
> > message, it now also prints the current section name, which can give
> > users a hint that they didn't end the previous section.
> > 
> > Signed-off-by: Benjamin Marzinski 
> 
> Apart from typo in the subject:

Oops. I'll fix that.

-Ben

> 
> Reviewed-by: Martin Wilck 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 4/8] libmultipath: pass file and line number to keyword handlers

2021-11-05 Thread Benjamin Marzinski
On Thu, Nov 04, 2021 at 08:55:02PM +, Martin Wilck wrote:
> On Wed, 2021-10-06 at 15:04 -0500, Benjamin Marzinski wrote:
> > This will make it possible for the keyword handlers to print more
> > useful
> > warning messages. It will be used by future patches.
> > 
> > Signed-off-by: Benjamin Marzinski 
> 
> Nit: There's one very long line (349).
> 
> Apart from that, ack.

It gets removed by "libmultipath: cleanup invalid config handling", but
I can fix it in this patch, if you'd like.

-Ben

> 
> 
> 
> > ---
> >  libmultipath/dict.c   | 143 +-
> >  libmultipath/parser.c |   3 +-
> >  libmultipath/parser.h |   2 +-
> >  3 files changed, 90 insertions(+), 58 deletions(-)
> > 
> > diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> > index 7a727389..eb2c44c0 100644
> > --- a/libmultipath/dict.c
> > +++ b/libmultipath/dict.c
> > @@ -29,7 +29,7 @@
> >  #include "strbuf.h"
> >  
> >  static int
> > -set_int(vector strvec, void *ptr)
> > +set_int(vector strvec, void *ptr, const char *file, int line_nr)
> >  {
> > int *int_ptr = (int *)ptr;
> > char *buff, *eptr;
> > @@ -58,7 +58,7 @@ set_int(vector strvec, void *ptr)
> >  }
> >  
> >  static int
> > -set_uint(vector strvec, void *ptr)
> > +set_uint(vector strvec, void *ptr, const char *file, int line_nr)
> >  {
> > unsigned int *uint_ptr = (unsigned int *)ptr;
> > char *buff, *eptr, *p;
> > @@ -90,7 +90,7 @@ set_uint(vector strvec, void *ptr)
> >  }
> >  
> >  static int
> > -set_str(vector strvec, void *ptr)
> > +set_str(vector strvec, void *ptr, const char *file, int line_nr)
> >  {
> > char **str_ptr = (char **)ptr;
> >  
> > @@ -105,7 +105,7 @@ set_str(vector strvec, void *ptr)
> >  }
> >  
> >  static int
> > -set_yes_no(vector strvec, void *ptr)
> > +set_yes_no(vector strvec, void *ptr, const char *file, int line_nr)
> >  {
> > char * buff;
> > int *int_ptr = (int *)ptr;
> > @@ -124,7 +124,7 @@ set_yes_no(vector strvec, void *ptr)
> >  }
> >  
> >  static int
> > -set_yes_no_undef(vector strvec, void *ptr)
> > +set_yes_no_undef(vector strvec, void *ptr, const char *file, int
> > line_nr)
> >  {
> > char * buff;
> > int *int_ptr = (int *)ptr;
> > @@ -187,9 +187,10 @@ static int print_yes_no_undef(struct strbuf *buff,
> > long v)
> >  
> >  #define declare_def_handler(option,
> > function)  \
> >  static
> > int \
> > -def_ ## option ## _handler (struct config *conf, vector
> > strvec)\
> > +def_ ## option ## _handler (struct config *conf, vector
> > strvec,\
> > +   const char *file, int
> > line_nr)  \
> >  { 
> >  \
> > -   return function (strvec, 
> > >option);\
> > +   return function (strvec, >option, file,
> > line_nr); \
> >  }
> >  
> >  #define declare_def_snprint(option,
> > function)  \
> > @@ -224,12 +225,13 @@ snprint_def_ ## option (struct config *conf,
> > struct strbuf *buff, \
> >  
> >  #define declare_hw_handler(option,
> > function)   \
> >  static
> > int \
> > -hw_ ## option ## _handler (struct config *conf, vector
> > strvec) \
> > +hw_ ## option ## _handler (struct config *conf, vector
> > strvec, \
> > +  const char *file, int
> > line_nr)   \
> >  { 
> >  \
> > struct hwentry * hwe = VECTOR_LAST_SLOT(conf-
> > >hwtable); \
> > if
> > (!hwe)   \
> > return
> > 1;   \
> > -   return function (strvec, 
> > >option); \
> > +   return function (strvec, >option, file,
> > line_nr);  \
> >  }
> >  
> >  #define declare_hw_snprint(option,
> > function)   \
> > @@ -243,11 +245,12 @@ snprint_hw_ ## option (struct config *conf,
> > struct strbuf *buff,  \
> >  
> >  #define declare_ovr_handler(option,
> > function)  \
> >  static
> > int \
> > -ovr_ ## option ## _handler (struct config *conf, vector
> > strvec)\
> > +ovr_ ## option ## _handler (struct config *conf, vector
> > strvec,\
> > +   const char *file, int
> > line_nr)  \
> >  { 
> >  \
> > if (!conf-
> > >overrides)   \
> > return
> > 1;   \
> > -   return 

Re: [dm-devel] [RFC PATCH 0/8] block: add support for REQ_OP_VERIFY

2021-11-05 Thread jav...@javigon.com

On 04.11.2021 15:37, Keith Busch wrote:

On Thu, Nov 04, 2021 at 06:34:31PM +0100, Christoph Hellwig wrote:

On Thu, Nov 04, 2021 at 10:32:35AM -0700, Darrick J. Wong wrote:
> I also wonder if it would be useful (since we're already having a
> discussion elsewhere about data integrity syscalls for pmem) to be able
> to call this sort of thing against files?  In which case we'd want
> another preadv2 flag or something, and then plumb all that through the
> vfs/iomap as needed?

IFF we do this (can't answer if there is a need) we should not
overload read with it.  It is an operation that does not return
data but just a status, so let's not get into that mess.


If there is a need for this, a new io_uring opcode seems like the
appropirate user facing interface for it.


+1 to this. I was looking at the patchset yesterday and this was one of
the questions I had. Any reasons to not do it this way Chaitanya?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RFC PATCH 0/8] block: add support for REQ_OP_VERIFY

2021-11-05 Thread Keith Busch
On Thu, Nov 04, 2021 at 06:34:31PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 04, 2021 at 10:32:35AM -0700, Darrick J. Wong wrote:
> > I also wonder if it would be useful (since we're already having a
> > discussion elsewhere about data integrity syscalls for pmem) to be able
> > to call this sort of thing against files?  In which case we'd want
> > another preadv2 flag or something, and then plumb all that through the
> > vfs/iomap as needed?
> 
> IFF we do this (can't answer if there is a need) we should not
> overload read with it.  It is an operation that does not return
> data but just a status, so let's not get into that mess.

If there is a need for this, a new io_uring opcode seems like the
appropirate user facing interface for it.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RFC PATCH 3/8] nvme: add support for the Verify command

2021-11-05 Thread Keith Busch
On Wed, Nov 03, 2021 at 11:46:29PM -0700, Chaitanya Kulkarni wrote:
> +static inline blk_status_t nvme_setup_verify(struct nvme_ns *ns,
> + struct request *req, struct nvme_command *cmnd)
> +{

Due to recent driver changes, you need a "memset(cmnd, 0, sizeof(*cmnd))"
prior to setting up the rest of the command, or you need to set each
command dword individually.

> + cmnd->verify.opcode = nvme_cmd_verify;
> + cmnd->verify.nsid = cpu_to_le32(ns->head->ns_id);
> + cmnd->verify.slba =
> + cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
> + cmnd->verify.length =
> + cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
> + cmnd->verify.control = 0;
> + return BLK_STS_OK;
> +}

> +static void nvme_config_verify(struct gendisk *disk, struct nvme_ns *ns)
> +{
> + u64 max_blocks;
> +
> + if (!(ns->ctrl->oncs & NVME_CTRL_ONCS_VERIFY))
> + return;
> +
> + if (ns->ctrl->max_hw_sectors == UINT_MAX)
> + max_blocks = (u64)USHRT_MAX + 1;
> + else
> + max_blocks = ns->ctrl->max_hw_sectors + 1;

If supported by the controller, this maximum is defined in the non-mdts
command limits in NVM command set specific Identify Controller VSL field.

> +
> + /* keep same as discard */
> + if (blk_queue_flag_test_and_set(QUEUE_FLAG_VERIFY, disk->queue))
> + return;
> +
> + blk_queue_max_verify_sectors(disk->queue,
> +  nvme_lba_to_sect(ns, max_blocks));
> +
> +}

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel