On Tue, Jun 09, 2026 at 08:07:14PM +0800, Li Chen wrote:
> Hi,

Hi Li Chen,

In case you missed it, a Sashiko AI review of this set has posted
feedback. Please take a look.

https://sashiko.dev/#/patchset/20260609120726.1714780-1-me%40linux.beauty

-- Alison

> 
> The nvdimm flush helper currently converts any non-zero provider flush
> callback error to -EIO. That hides useful errno values from providers. For
> example, virtio-pmem may fail child flush bio allocation with -ENOMEM, but
> that is currently reported as -EIO by nvdimm_flush().
> 
> The raw failure seen in the local mkfs sanity test was:
> 
>   wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error
>   mkfs.ext4: Input/output error while writing out and closing file system
>   nd_region region0: dbg: nvdimm_flush rc=-5
> 
> The first two patches keep provider flush errors intact and make the
> virtio-pmem child flush bio allocation use GFP_NOIO. async_pmem_flush() can
> allocate a child flush bio from filesystem flush and writeback paths;
> GFP_NOIO is a better fit than GFP_ATOMIC there because the allocation may
> sleep but must not recurse into filesystem I/O reclaim. With these changes,
> the same mkfs sanity test reached mkfs_rc=0, mount_rc=0, and umount_rc=0.
> 
> The rest of the series addresses virtio-pmem request lifetime and broken
> virtqueue handling. The virtio-pmem flush path uses a virtqueue cookie/token
> to carry a per-request context through completion. Under broken virtqueue /
> notify failure conditions, the submitter can return and free the request
> object while the host/backend may still complete the published request. The
> IRQ completion handler then dereferences freed memory when waking waiters,
> which is reported by KASAN as a slab-use-after-free and may manifest as lock
> corruption (e.g. "BUG: spinlock already unlocked") without KASAN.
> 
> In addition, the flush path has two wait sites: one for virtqueue descriptor
> availability (-ENOSPC from virtqueue_add_sgs()) and one for request
> completion. If the virtqueue becomes broken, forward progress is no longer
> guaranteed and these waiters may sleep indefinitely unless the driver
> converges the failure and wakes all wait sites.
> 
> This series addresses these issues:
> 
> 1/7 nvdimm: preserve flush callback errors
> Return provider flush callback errors directly from nvdimm_flush().
> 
> 2/7 nvdimm: virtio_pmem: use GFP_NOIO for child flush bio
> Use GFP_NOIO for the child flush bio allocation.
> 
> 3/7 nvdimm: virtio_pmem: always wake -ENOSPC waiters
> Wake one -ENOSPC waiter for each reclaimed used buffer, decoupled from
> token completion.
> 
> 4/7 nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
> Use READ_ONCE()/WRITE_ONCE() for the wait_event() flags (done and
> wq_buf_avail).
> 
> 5/7 nvdimm: virtio_pmem: refcount requests for token lifetime
> Refcount request objects so the token lifetime spans the window where it is
> reachable through the virtqueue until completion/drain drops the virtqueue
> reference.
> 
> 6/7 nvdimm: virtio_pmem: converge broken virtqueue to -EIO
> Track a device-level broken state to converge broken/notify failures to -EIO:
> wake all waiters and drain/detach outstanding requests to complete them with
> an error, and fail-fast new requests.
> 
> 7/7 nvdimm: virtio_pmem: drain requests in freeze
> Drain outstanding requests in freeze() before tearing down virtqueues so
> waiters do not sleep indefinitely.
> 
> Testing was done on QEMU x86_64 with a virtio-pmem device exported as
> /dev/pmem0. This v4 series applies to v7.1-rc7, builds with
> CONFIG_VIRTIO_PMEM=m, passes checkpatch, and passed the local repro checks
> with a local-only virtqueue_kick() fault injection. I also checked that it
> applies cleanly to next/master at 6e845bcb78c9 ("Add linux-next specific
> files for 20260605").
> 
> Thanks,
> Li Chen
> 
> Changelog:
> v3->v4:
> - Rebased the series onto v7.1-rc7 so it applies cleanly to Linux 7.1-rc7.
> - Update the allocation site in 6/7 from kmalloc(sizeof(*req_data),
>   GFP_KERNEL) to kmalloc_obj(*req_data) to match current nvdimm code.
> - Add 1/7 to preserve provider flush callback errors in nvdimm_flush().
> - Include the GFP_NOIO child flush bio allocation fix as 2/7.
> - Renumber the old request lifetime and broken virtqueue fixes after the two
>   new flush error patches.
> v2->v3:
> - Split patch 1 as suggested by Pankaj Gupta: keep the waiter wakeup
>   ordering change in 1/5 and move READ_ONCE()/WRITE_ONCE() updates to
>   2/5 (no functional change intended).
> - Add log report to commit msg
> - Fold the export fix into 4/5 to keep the series bisectable when
>   CONFIG_VIRTIO_PMEM=m.
> v1->v2: add the export patch to fix compile issue.
> 
> Links:
> v3: https://lore.kernel.org/all/[email protected]/#t
> v2: https://lore.kernel.org/all/[email protected]/
> v1: https://www.spinics.net/lists/kernel/msg5974818.html
> 
> Li Chen (7):
>   nvdimm: preserve flush callback errors
>   nvdimm: virtio_pmem: use GFP_NOIO for child flush bio
>   nvdimm: virtio_pmem: always wake -ENOSPC waiters
>   nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
>   nvdimm: virtio_pmem: refcount requests for token lifetime
>   nvdimm: virtio_pmem: converge broken virtqueue to -EIO
>   nvdimm: virtio_pmem: drain requests in freeze
> 
>  drivers/nvdimm/nd_virtio.c   | 139 +++++++++++++++++++++++++++++------
>  drivers/nvdimm/region_devs.c |   6 +-
>  drivers/nvdimm/virtio_pmem.c |  14 ++++
>  drivers/nvdimm/virtio_pmem.h |   6 ++
>  4 files changed, 139 insertions(+), 26 deletions(-)
> -- 
> 2.52.0

Reply via email to