Hi Alex,
On 23/04/2026 22:30, Alex Williamson wrote:
On Thu, 23 Apr 2026 11:25:07 -0700
Matt Evans <[email protected]> wrote:
Previously BAR resource requests and the corresponding pci_iomap()
were performed on-demand and without synchronisation, which was racy.
Rather than add synchronisation, it's simplest to address this by
doing both activities from vfio_pci_core_enable().
The resource allocation and/or pci_iomap() can still fail; their
status is tracked and existing calls to vfio_pci_core_setup_barmap()
will fail in the same way as before. This keeps the point of failure
as observed by userspace the same, i.e. failures to request/map unused
BARs are benign.
Fixes: 7f5764e179c6 ("vfio: use vfio_pci_core_setup_barmap to map bar in mmap")
Fixes: 0d77ed3589ac0 ("vfio/pci: Pull BAR mapping setup from read-write path")
Signed-off-by: Matt Evans <[email protected]>
---
drivers/vfio/pci/vfio_pci_core.c | 61 +++++++++++++++++++++++++++-----
drivers/vfio/pci/vfio_pci_rdwr.c | 29 ++++++---------
include/linux/vfio_pci_core.h | 1 +
3 files changed, 64 insertions(+), 27 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3f8d093aacf8..c59c61861d81 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -482,6 +482,55 @@ static int vfio_pci_core_runtime_resume(struct device *dev)
}
#endif /* CONFIG_PM */
+static void __vfio_pci_core_unmap_bars(struct vfio_pci_core_device *vdev)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ int i;
+
+ for (i = 0; i < PCI_STD_NUM_BARS; i++) {
+ int bar = i + PCI_STD_RESOURCES;
+
+ if (vdev->barmap[bar])
+ pci_iounmap(pdev, vdev->barmap[bar]);
+ if (vdev->have_bar_resource[bar])
+ pci_release_selected_regions(pdev, 1 << bar);
+ vdev->barmap[bar] = NULL;
+ vdev->have_bar_resource[bar] = false;
+ }
+}
+
+static void __vfio_pci_core_map_bars(struct vfio_pci_core_device *vdev)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ int i;
+
+ /*
+ * Eager-request BAR resources, and iomap; soft failures are
+ * allowed, and consumers must check before use.
+ */
I'd use this to describe that soft failures maintain compatible error
signatures to previously used on-demand mapping.
Done.
+ for (i = 0; i < PCI_STD_NUM_BARS; i++) {
+ int ret;
+ int bar = i + PCI_STD_RESOURCES;
+ void __iomem *io;
Reverse Christmas tree ordering.
Done.
+
+ if (pci_resource_len(pdev, i) == 0)
+ continue;
+
+ ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+ if (ret) {
+ pci_warn(vdev->pdev, "Failed to reserve region %d\n",
bar);
+ continue;
+ }
+ vdev->have_bar_resource[bar] = true;
+
+ io = pci_iomap(pdev, bar, 0);
+ if (io)
+ vdev->barmap[bar] = io;
+ else
+ pci_warn(vdev->pdev, "Failed to iomap region %d\n",
bar);
+ }
+}
I see you making the point in the cover letter about the resource
request vs the iomap resource, but we currently handle these together.
If either fails, setup barmap fails and the path returns error. I
don't see any justification for now allowing the request resource to
succeed but the iomap fails.
The primary effect was to let consumers see -EBUSY for a resource
reservation failure or -ENOMEM for an iomap failure (whether through
this patch's vfio_pci_core_setup_barmap() or the next patch's helpers),
and that keeps the error signatures identical.
A weak secondary effect was that a BAR that gets resource but fails for
whatever reason to iomap it can still be used by most clients (assuming
the general usage is to mmap). The system's pretty sick if this is the
case, so as I say it's weak.
OK, if you prefer the combined approach and don't feel the subsequent
single-semantic check helpers (need mapping, need resource) are clearer
to read then I'll recombine them, though:
- If vfio_pci_core_map_bars() just sets barmap[n] iff both resource
acquisition and iomap succeed, then a later check can only return one
error from either cause. I'll go with -ENOMEM unless you prefer -EBUSY.
Using something else can again make userspace see previously-unseen
error values.
- IMHO vfio_pci_core_setup_barmap() should still be renamed (in a 2nd
patch) since it doesn't do any setting up anymore. Cosmetic, but
cleaner to parse when the callers use vfio_pci_core_check_barmap_valid() no?
These functions also don't need the double-underscore prefix.
Done.
+
/*
* The pci-driver core runtime PM routines always save the device state
* before going into suspended state. If the device is going into low power
@@ -568,6 +617,7 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
vdev->has_vga = true;
+ __vfio_pci_core_map_bars(vdev);
return 0;
@@ -591,7 +641,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_dummy_resource *dummy_res, *tmp;
struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp;
- int i, bar;
+ int i;
/* For needs_reset */
lockdep_assert_held(&vdev->vdev.dev_set->lock);
@@ -646,14 +696,7 @@ void vfio_pci_core_disable(struct vfio_pci_core_device
*vdev)
vfio_config_free(vdev);
- for (i = 0; i < PCI_STD_NUM_BARS; i++) {
- bar = i + PCI_STD_RESOURCES;
- if (!vdev->barmap[bar])
- continue;
- pci_iounmap(pdev, vdev->barmap[bar]);
- pci_release_selected_regions(pdev, 1 << bar);
- vdev->barmap[bar] = NULL;
- }
+ __vfio_pci_core_unmap_bars(vdev);
I expect this doesn't need to change if we drop the separation between
resources and iomap.
OK, restored.
list_for_each_entry_safe(dummy_res, tmp,
&vdev->dummy_resources_list, res_next) {
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 4251ee03e146..bf7152316db4 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,25 +200,18 @@ EXPORT_SYMBOL_GPL(vfio_pci_core_do_io_rw);
int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)
{
- struct pci_dev *pdev = vdev->pdev;
- int ret;
- void __iomem *io;
-
- if (vdev->barmap[bar])
- return 0;
-
- ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
- if (ret)
- return ret;
-
- io = pci_iomap(pdev, bar, 0);
- if (!io) {
- pci_release_selected_regions(pdev, 1 << bar);
+ /*
+ * The barmap is now always set up in vfio_pci_core_enable().
"now" is going to read strangely very quickly.
Hm, yeah, fixed.
+ * Some legacy callers use this function to ensure the BAR
+ * resources are requested, and others to ensure the
+ * pci_iomap() was done, so check here:
+ */
+ if (bar < 0 || bar >= PCI_STD_NUM_BARS)
+ return -EINVAL;
+ if (vdev->barmap[bar] == 0)
return -ENOMEM;
- }
-
- vdev->barmap[bar] = io;
-
+ if (!vdev->bar_has_rsrc[bar])
Typo, this won't incrementally compile. Thanks,
Fixed.
Alex, thanks for all your comments so far, I realise this is a pretty
noddy fix but it's good to get it clean.
Matt
Alex
+ return -EBUSY;
return 0;
}
EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 2ebba746c18f..1f508b067d82 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -101,6 +101,7 @@ struct vfio_pci_core_device {
const struct vfio_pci_device_ops *pci_ops;
void __iomem *barmap[PCI_STD_NUM_BARS];
bool bar_mmap_supported[PCI_STD_NUM_BARS];
+ bool have_bar_resource[PCI_STD_NUM_BARS];
u8 *pci_config_map;
u8 *vconfig;
struct perm_bits *msi_perm;