+CC Dan's correct email address and MST's email.
> Hi,
>
> This series started as a virtio-pmem request lifetime and broken virtqueue
> fix, but the rerolls have picked up several related flush-path fixes found
> during local testing and review. Since the series is now broader than the
> original lifetime bug, this cover letter calls out where the patches came
> from.
>
> The nvdimm flush helper maps provider flush failures to -EIO. That should
> remain the default for provider/backend failures because host-side errors are
> still best reported as generic I/O errors to the guest. However, virtio-pmem
> may also fail a guest-local flush request allocation with -ENOMEM before any
> request is submitted to the host. Reporting that resource failure as -EIO
> makes memory pressure look like media failure.
>
> The raw failure seen in the local mkfs sanity test was:
>
> wipefs: /dev/pmem0: cannot flush modified buffers: Input/output error
> mkfs.ext4: Input/output error while writing out and closing file system
> nd_region region0: dbg: nvdimm_flush rc=-5
>
> Patch 1 comes from that local failure, with the error policy narrowed after
> Pankaj pointed out that host/backend provider errors should not all be exposed
> directly to the guest. It now preserves only -ENOMEM and keeps other provider
> flush failures mapped to -EIO.
>
> Patches 2 and 3 come from review of the pmem flush path. Patch 2 keeps a
> failed REQ_PREFLUSH from being overwritten after data copy, and patch 3 is the
> dataless-bio guard added after the Sashiko review. Patch 4 comes from the
> local child flush bio allocation failure, but v7 reworks the v6 synchronous
> FUA approach after Pankaj noted that the old child flush bio path completed
> asynchronously. This version removes the child bio while keeping parent bio
> completion asynchronous: the provider returns NVDIMM_FLUSH_ASYNC, queues
> ordered WQ_MEM_RECLAIM work, and completes the parent bio after
> virtio_pmem_flush() finishes. Patch 5 is the remaining allocation-policy
> follow-up for the actual virtio-pmem flush request object, not for a child
> bio.
>
> Patches 6 and 7 are the older waiter fixes. Patch 6 wakes one -ENOSPC waiter
> for each reclaimed used buffer, and patch 7 makes the wait flags explicit
> READ_ONCE()/WRITE_ONCE() accesses. Pankaj asked for those changes to be split
> across patches, and patch 7 carries his Acked-by.
>
> Patch 8 is the original KASAN use-after-free fix for the request token
> lifetime. Patches 9 and 10 are follow-up hardening in the same completion
> path: order response publication before the submitter reads resp.ret, and keep
> the DMA_FROM_DEVICE response buffer away from CPU-owned request fields. Patch
> 11 addresses the broken virtqueue / notify failure path reported by LKP and
> reproduced locally with fault injection. It also serializes async parent-bio
> flush work against broken-state publication, so remove/freeze cannot drain the
> workqueue before a racing FUA bio queues new completion work. Patch 12 handles
> teardown: it drains requests across freeze/remove and also addresses the
> Sashiko-reported req_vq-after-free/NULL-deref class by clearing req_vq after
> del_vqs() and making the drain helper tolerate a NULL queue. It also stops the
> submit path from checking req_vq after the broken state is visible.
>
> The original repros were on QEMU x86_64 with a virtio-pmem device exported
> as /dev/pmem0. For this v7 reroll, the series applies to v7.1-rc7.
>
> Thanks,
> Li Chen
>
> Changelog:
> v6->v7:
> - Address Pankaj's feedback on nvdimm_flush() error policy.
> - Preserve only -ENOMEM from provider flush callbacks and continue to map
> other provider/backend failures to -EIO.
> - Address Pankaj's feedback on the FUA flush behavior: replace the v6
> synchronous FUA path with provider-owned asynchronous parent bio completion.
> - Add NVDIMM_FLUSH_ASYNC and use ordered WQ_MEM_RECLAIM work to run
> virtio_pmem_flush() and complete the parent bio after the host flush.
> - Keep GFP_NOIO for the virtio-pmem request allocation, but no longer describe
> it as a child bio allocation fix.
> - Add Pankaj's Acked-by on the READ_ONCE()/WRITE_ONCE() patch.
> - Serialize async parent-bio flush work against broken-state publication in
> the broken-virtqueue patch, so remove/freeze cannot drain the workqueue
> before a racing FUA bio queues new completion work.
> - Fold the Sashiko-reported req_vq NULL-deref fix into the freeze/remove
> drain patch.
> - Update commit messages and this cover letter to describe patch origins.
> v5->v6:
> - Address Sashiko review feedback:
> - Add a data-loop guard for dataless bios in pmem_submit_bio().
> - Replace the child flush bio allocation with synchronous FUA flushing.
> - Keep GFP_NOIO only for the virtio-pmem request allocation.
> - Publish request completion with release/acquire ordering.
> - Isolate the DMA_FROM_DEVICE response buffer from CPU-owned fields.
> - Wake the in-flight host-completion waiter when marking the queue broken.
> - Clear req_vq after del_vqs() and make drain tolerate a NULL queue.
> v4->v5:
> - Address review feedback about REQ_PREFLUSH ordering and active virtqueue
> detach.
> - Add 2/8 so a failed REQ_PREFLUSH fails the bio before any data copy, and
> make REQ_PREFLUSH use a synchronous provider flush instead of a deferred
> child bio.
> - Rework broken-queue handling so runtime failure marking only stops new
> submissions and wakes local -ENOSPC waiters; used/unused token draining is
> done after device reset in remove() and freeze().
> - Remove the broken-state shortcut from the host-completion wait so the
> submitter never reads an uninitialized response field.
> - Keep the raw broken-virtqueue dmesg in 7/8 while updating the teardown
> rationale.
> - Renumber the old virtio-pmem fixes after the new pmem PREFLUSH patch.
> v3->v4:
> - Rebased the series onto v7.1-rc7 so it applies cleanly to Linux 7.1-rc7.
> - Update the allocation site in 6/7 from kmalloc(sizeof(*req_data),
> GFP_KERNEL) to kmalloc_obj(*req_data) to match current nvdimm code.
> - Add 1/7 to preserve provider flush callback errors in nvdimm_flush().
> - Include the GFP_NOIO child flush bio allocation fix as 2/7.
> - Renumber the old request lifetime and broken virtqueue fixes after the two
> new flush error patches.
> v2->v3:
> - Split patch 1 as suggested by Pankaj Gupta: keep the waiter wakeup
> ordering change in 1/5 and move READ_ONCE()/WRITE_ONCE() updates to
> 2/5 (no functional change intended).
> - Add log report to commit msg.
> - Fold the export fix into 4/5 to keep the series bisectable when
> CONFIG_VIRTIO_PMEM=m.
> v1->v2:
> - Add the export patch to fix compile issue.
>
> Links:
> v6: https://lore.kernel.org/all/[email protected]/
> v5: https://lore.kernel.org/all/[email protected]/
> v4: https://lore.kernel.org/all/[email protected]/
> v3: https://lore.kernel.org/all/[email protected]/#t
> v2: https://lore.kernel.org/all/[email protected]/
> v1: https://www.spinics.net/lists/kernel/msg5974818.html
>
> Li Chen (12):
> nvdimm: preserve flush callback -ENOMEM
> nvdimm: pmem: keep PREFLUSH before data writes
> nvdimm: pmem: guard data loop for dataless bios
> nvdimm: virtio_pmem: stop allocating child flush bio
> nvdimm: virtio_pmem: use GFP_NOIO for flush requests
> nvdimm: virtio_pmem: always wake -ENOSPC waiters
> nvdimm: virtio_pmem: use READ_ONCE()/WRITE_ONCE() for wait flags
> nvdimm: virtio_pmem: refcount requests for token lifetime
> nvdimm: virtio_pmem: publish done with release/acquire
> nvdimm: virtio_pmem: isolate DMA request buffers
> nvdimm: virtio_pmem: converge broken virtqueue to -EIO
> nvdimm: virtio_pmem: drain requests in freeze
>
> drivers/nvdimm/nd_virtio.c | 265 +++++++++++++++++++++++++++++------
> drivers/nvdimm/pmem.c | 51 ++++---
> drivers/nvdimm/region_devs.c | 5 +-
> drivers/nvdimm/virtio_pmem.c | 65 ++++++++-
> drivers/nvdimm/virtio_pmem.h | 22 ++-
> include/linux/libnvdimm.h | 9 ++
> 6 files changed, 343 insertions(+), 74 deletions(-)
>
> --
> 2.52.0