> -----Original Message-----
> From: Duan, Zhenzhong <[email protected]>
> Sent: 08 January 2026 09:41
> To: Shameer Kolothum <[email protected]>; qemu-
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Nicolin Chen <[email protected]>;
> Nathan Chen <[email protected]>; Matt Ochs <[email protected]>;
> Jason Gunthorpe <[email protected]>; Krishnakant Jaju <[email protected]>
> Subject: Re: [PATCH 3/3] hw/vfio/region: Create dmabuf for PCI BAR per
> region
>
> External email: Use caution opening links or attachments
>
>
> On 12/22/2025 9:53 PM, Shameer Kolothum wrote:
> > From: Nicolin Chen <[email protected]>
> >
> > Linux now provides a VFIO dmabuf exporter to expose PCI BAR memory for
> > P2P use cases. Create a dmabuf for each mapped BAR region after the
> > mmap is set up, and store the returned fd in the region’s RAMBlock.
> > This allows QEMU to pass the fd to dma_map_file(), enabling iommufd to
> > import the dmabuf and map the BAR correctly in the host IOMMU page
> table.
> >
> > If the kernel lacks support or dmabuf setup fails, QEMU skips the
> > setup and continues with normal mmap handling.
> >
> > Signed-off-by: Nicolin Chen <[email protected]>
> > Signed-off-by: Shameer Kolothum <[email protected]>
> > ---
> > hw/vfio/region.c | 57
> +++++++++++++++++++++++++++++++++++++++++++-
> > hw/vfio/trace-events | 1 +
> > 2 files changed, 57 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/vfio/region.c b/hw/vfio/region.c index
> > b165ab0b93..6949f6779c 100644
> > --- a/hw/vfio/region.c
> > +++ b/hw/vfio/region.c
> > @@ -29,6 +29,7 @@
> > #include "qemu/error-report.h"
> > #include "qemu/units.h"
> > #include "monitor/monitor.h"
> > +#include "system/ramblock.h"
> > #include "vfio-helpers.h"
> >
> > /*
> > @@ -238,13 +239,52 @@ static void vfio_subregion_unmap(VFIORegion
> *region, int index)
> > region->mmaps[index].mmap = NULL;
> > }
> >
> > +static int vfio_region_create_dma_buf(VFIORegion *region) {
> > + g_autofree struct vfio_device_feature *feature = NULL;
> > + VFIODevice *vbasedev = region->vbasedev;
> > + struct vfio_device_feature_dma_buf *dma_buf;
> > + size_t total_size;
> > + int i, ret;
> > +
> > + g_assert(region->nr_mmaps);
> > +
> > + total_size = sizeof(*feature) + sizeof(*dma_buf) +
> > + sizeof(struct vfio_region_dma_range) * region->nr_mmaps;
> > + feature = g_malloc0(total_size);
> > + *feature = (struct vfio_device_feature) {
> > + .argsz = total_size,
> > + .flags = VFIO_DEVICE_FEATURE_GET |
> VFIO_DEVICE_FEATURE_DMA_BUF,
> > + };
> > +
> > + dma_buf = (void *)feature->data;
> > + *dma_buf = (struct vfio_device_feature_dma_buf) {
> > + .region_index = region->nr,
> > + .open_flags = O_RDWR,
> > + .nr_ranges = region->nr_mmaps,
> > + };
> > +
> > + for (i = 0; i < region->nr_mmaps; i++) {
> > + dma_buf->dma_ranges[i].offset = region->mmaps[i].offset;
> > + dma_buf->dma_ranges[i].length = region->mmaps[i].size;
> > + }
> > +
> > + ret = vbasedev->io_ops->device_feature(vbasedev, feature);
>
> vbasedev->io_ops->device_feature may be NULL for other backend like vfio-
> user.
Ah..Ok. I will add a check.
>
> > + for (i = 0; i < region->nr_mmaps; i++) {
> > + trace_vfio_region_dmabuf(region->vbasedev->name, ret, region->nr,
> > + region->mem->name,
> > region->mmaps[i].offset,
> > + region->mmaps[i].size);
> > + }
> > + return ret;
> > +}
> > +
> > int vfio_region_mmap(VFIORegion *region)
> > {
> > int i, ret, prot = 0;
> > char *name;
> > int fd;
> >
> > - if (!region->mem) {
> > + if (!region->mem || !region->nr_mmaps) {
>
> Just curious, when will above check return true?
I think `!region->mem` covers cases where no MemoryRegion was created
(e.g. zero sized regions). And nr_mmaps checks regions with mmap
support exists (VFIO_REGION_INFO_FLAG_MMAP/ _CAP_SPARSE_MMAP).
>
> > return 0;
> > }
> >
> > @@ -305,6 +345,21 @@ int vfio_region_mmap(VFIORegion *region)
> > region->mmaps[i].size - 1);
> > }
> >
> > + ret = vfio_region_create_dma_buf(region);
> > + if (ret < 0) {
> > + if (ret == -ENOTTY) {
> > + warn_report_once("VFIO dmabuf not supported in kernel");
> > + } else {
> > + error_report("%s: failed to create dmabuf: %s",
> > + memory_region_name(region->mem), strerror(errno));
> > + }
> > + } else {
> > + MemoryRegion *mr = ®ion->mmaps[0].mem;
>
> Do we need to support region->mmaps[1]?
My understanding is all region->mmaps[] entries for a VFIO region share
the same RAMBlock. And the kernel returns a single dmabuf fd per region,
not per subrange.
Thanks,
Shameer
>
> Thanks
>
> Zhenzhong
>
> > + RAMBlock *ram_block = mr->ram_block;
> > +
> > + ram_block->fd = ret;
> > + }
> > +
> > return 0;
> >
> > no_mmap:
> > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index
> > 1e895448cd..592a0349d4 100644
> > --- a/hw/vfio/trace-events
> > +++ b/hw/vfio/trace-events
> > @@ -117,6 +117,7 @@ vfio_device_put(int fd) "close vdev->fd=%d"
> > vfio_region_write(const char *name, int index, uint64_t addr, uint64_t
> data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)"
> > vfio_region_read(char *name, int index, uint64_t addr, unsigned size,
> uint64_t data) " (%s:region%d+0x%"PRIx64", %d) = 0x%"PRIx64
> > vfio_region_setup(const char *dev, int index, const char *name, unsigned
> long flags, unsigned long offset, unsigned long size) "Device %s, region %d
> \"%s\", flags: 0x%lx, offset: 0x%lx, size: 0x%lx"
> > +vfio_region_dmabuf(const char *dev, int fd, int index, const char *name,
> unsigned long offset, unsigned long size) "Device %s, dmabuf fd %d region %d
> \"%s\", offset: 0x%lx, size: 0x%lx"
> > vfio_region_mmap_fault(const char *name, int index, unsigned long offset,
> unsigned long size, int fault) "Region %s mmaps[%d], [0x%lx - 0x%lx], fault:
> %d"
> > vfio_region_mmap(const char *name, unsigned long offset, unsigned long
> end) "Region %s [0x%lx - 0x%lx]"
> > vfio_region_exit(const char *name, int index) "Device %s, region %d"