> -----Original Message-----
> From: Duan, Zhenzhong <[email protected]>
> Sent: 09 January 2026 10:13
> To: Shameer Kolothum <[email protected]>; qemu-
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Nicolin Chen <[email protected]>;
> Nathan Chen <[email protected]>; Matt Ochs <[email protected]>;
> Jason Gunthorpe <[email protected]>; Krishnakant Jaju <[email protected]>
> Subject: RE: [PATCH 3/3] hw/vfio/region: Create dmabuf for PCI BAR per
> region
> 
> External email: Use caution opening links or attachments
> 
> 
> >-----Original Message-----
> >From: Shameer Kolothum <[email protected]>
> >Subject: RE: [PATCH 3/3] hw/vfio/region: Create dmabuf for PCI BAR per
> >region
> >
> >
> >
> >> -----Original Message-----
> >> From: Duan, Zhenzhong <[email protected]>
> >> Sent: 08 January 2026 09:41
> >> To: Shameer Kolothum <[email protected]>; qemu-
> [email protected];
> >> [email protected]
> >> Cc: [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; Nicolin Chen
> ><[email protected]>;
> >> Nathan Chen <[email protected]>; Matt Ochs <[email protected]>;
> Jason
> >> Gunthorpe <[email protected]>; Krishnakant Jaju <[email protected]>
> >> Subject: Re: [PATCH 3/3] hw/vfio/region: Create dmabuf for PCI BAR
> >> per region
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 12/22/2025 9:53 PM, Shameer Kolothum wrote:
> >> > From: Nicolin Chen <[email protected]>
> >> >
> >> > Linux now provides a VFIO dmabuf exporter to expose PCI BAR memory
> >for
> >> > P2P use cases. Create a dmabuf for each mapped BAR region after the
> >> > mmap is set up, and store the returned fd in the region’s RAMBlock.
> >> > This allows QEMU to pass the fd to dma_map_file(), enabling iommufd
> >> > to import the dmabuf and map the BAR correctly in the host IOMMU
> >> > page
> >> table.
> >> >
> >> > If the kernel lacks support or dmabuf setup fails, QEMU skips the
> >> > setup and continues with normal mmap handling.
> >> >
> >> > Signed-off-by: Nicolin Chen <[email protected]>
> >> > Signed-off-by: Shameer Kolothum <[email protected]>
> >> > ---
> >> >   hw/vfio/region.c     | 57
> >> +++++++++++++++++++++++++++++++++++++++++++-
> >> >   hw/vfio/trace-events |  1 +
> >> >   2 files changed, 57 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/hw/vfio/region.c b/hw/vfio/region.c index
> >> > b165ab0b93..6949f6779c 100644
> >> > --- a/hw/vfio/region.c
> >> > +++ b/hw/vfio/region.c
> >> > @@ -29,6 +29,7 @@
> >> >   #include "qemu/error-report.h"
> >> >   #include "qemu/units.h"
> >> >   #include "monitor/monitor.h"
> >> > +#include "system/ramblock.h"
> >> >   #include "vfio-helpers.h"
> >> >
> >> >   /*
> >> > @@ -238,13 +239,52 @@ static void vfio_subregion_unmap(VFIORegion
> >> *region, int index)
> >> >       region->mmaps[index].mmap = NULL;
> >> >   }
> >> >
> >> > +static int vfio_region_create_dma_buf(VFIORegion *region) {
> >> > +    g_autofree struct vfio_device_feature *feature = NULL;
> >> > +    VFIODevice *vbasedev = region->vbasedev;
> >> > +    struct vfio_device_feature_dma_buf *dma_buf;
> >> > +    size_t total_size;
> >> > +    int i, ret;
> >> > +
> >> > +    g_assert(region->nr_mmaps);
> >> > +
> >> > +    total_size = sizeof(*feature) + sizeof(*dma_buf) +
> >> > +                 sizeof(struct vfio_region_dma_range) *
> >region->nr_mmaps;
> >> > +    feature = g_malloc0(total_size);
> >> > +    *feature = (struct vfio_device_feature) {
> >> > +        .argsz = total_size,
> >> > +        .flags = VFIO_DEVICE_FEATURE_GET |
> >> VFIO_DEVICE_FEATURE_DMA_BUF,
> >> > +    };
> >> > +
> >> > +    dma_buf = (void *)feature->data;
> >> > +    *dma_buf = (struct vfio_device_feature_dma_buf) {
> >> > +        .region_index = region->nr,
> >> > +        .open_flags = O_RDWR,
> >> > +        .nr_ranges = region->nr_mmaps,
> >> > +    };
> >> > +
> >> > +    for (i = 0; i < region->nr_mmaps; i++) {
> >> > +        dma_buf->dma_ranges[i].offset = region->mmaps[i].offset;
> >> > +        dma_buf->dma_ranges[i].length = region->mmaps[i].size;
> >> > +    }
> >> > +
> >> > +    ret = vbasedev->io_ops->device_feature(vbasedev, feature);
> >>
> >> vbasedev->io_ops->device_feature may be NULL for other backend like
> >vfio-
> >> user.
> >
> >Ah..Ok. I will add a check.
> >
> >>
> >> > +    for (i = 0; i < region->nr_mmaps; i++) {
> >> > +        trace_vfio_region_dmabuf(region->vbasedev->name, ret,
> >region->nr,
> >> > +                                 region->mem->name,
> >region->mmaps[i].offset,
> >> > +                                 region->mmaps[i].size);
> >> > +    }
> >> > +    return ret;
> >> > +}
> >> > +
> >> >   int vfio_region_mmap(VFIORegion *region)
> >> >   {
> >> >       int i, ret, prot = 0;
> >> >       char *name;
> >> >       int fd;
> >> >
> >> > -    if (!region->mem) {
> >> > +    if (!region->mem || !region->nr_mmaps) {
> >>
> >> Just curious, when will above check return true?
> >I think `!region->mem` covers cases where no MemoryRegion was created
> >(e.g. zero sized regions).  And nr_mmaps checks regions with mmap
> >support exists (VFIO_REGION_INFO_FLAG_MMAP/ _CAP_SPARSE_MMAP).
> 
> Understood, thanks.
> 
> >
> >>
> >> >           return 0;
> >> >       }
> >> >
> >> > @@ -305,6 +345,21 @@ int vfio_region_mmap(VFIORegion *region)
> >> >                                  region->mmaps[i].size - 1);
> >> >       }
> >> >
> >> > +    ret = vfio_region_create_dma_buf(region);
> >> > +    if (ret < 0) {
> >> > +        if (ret == -ENOTTY) {
> >> > +            warn_report_once("VFIO dmabuf not supported in
> >kernel");
> >> > +        } else {
> >> > +            error_report("%s: failed to create dmabuf: %s",
> >> > +                         memory_region_name(region->mem),
> >strerror(errno));
> >> > +        }
> >> > +    } else {
> >> > +        MemoryRegion *mr = &region->mmaps[0].mem;
> >>
> >> Do we need to support region->mmaps[1]?
> >
> >My understanding is all region->mmaps[] entries for a VFIO region share
> >the same RAMBlock. And the kernel returns a single dmabuf fd per
> >region, not per subrange.
> 
> Not get, can RAMBlock have holes?

Yes, a RAMBlock can effectively have holes, but in this context
that is not what is happening.

IIUC, for a VFIO PCI BAR region, all region->mmaps[] entries
correspond to subranges of the same BAR and are backed by the
same MemoryRegion and therefore the same RAMBlock. The sparse
mmap layout (nr_mmaps > 1) exists to describe which parts of the
BAR are mappable, not to represent distinct backing memory objects.

So while sparse regions may look like "holes" at the mmap level, there
are no holes in the RAMBlock abstraction itself. All region->mmaps[]
entries share the same RAMBlock, which is why attaching the returned
dmabuf fd to region->mmaps[0].mem.ram_block is sufficient, I think.

However, possible I may be missing the case you are concerned about here.
Please let me know.

Thanks,
Shameer

Reply via email to