Hi Cédric,
> -----Original Message-----
> From: Cédric Le Goater <[email protected]>
> Sent: 09 January 2026 17:05
> To: Shameer Kolothum <[email protected]>; Duan, Zhenzhong
> <[email protected]>; [email protected]; qemu-
> [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; Nicolin Chen <[email protected]>; Nathan Chen
> <[email protected]>; Matt Ochs <[email protected]>; Jason Gunthorpe
> <[email protected]>; Krishnakant Jaju <[email protected]>
> Subject: Re: [PATCH 3/3] hw/vfio/region: Create dmabuf for PCI BAR per
> region
>
> External email: Use caution opening links or attachments
>
>
> On 1/8/26 12:04, Shameer Kolothum wrote:
> >
> >
> >> -----Original Message-----
> >> From: Duan, Zhenzhong <[email protected]>
> >> Sent: 08 January 2026 09:41
> >> To: Shameer Kolothum <[email protected]>; qemu-
> [email protected];
> >> [email protected]
> >> Cc: [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; Nicolin Chen
> >> <[email protected]>; Nathan Chen <[email protected]>; Matt Ochs
> >> <[email protected]>; Jason Gunthorpe <[email protected]>; Krishnakant
> >> Jaju <[email protected]>
> >> Subject: Re: [PATCH 3/3] hw/vfio/region: Create dmabuf for PCI BAR
> >> per region
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 12/22/2025 9:53 PM, Shameer Kolothum wrote:
> >>> From: Nicolin Chen <[email protected]>
> >>>
> >>> Linux now provides a VFIO dmabuf exporter to expose PCI BAR memory
> >>> for P2P use cases. Create a dmabuf for each mapped BAR region after
> >>> the mmap is set up, and store the returned fd in the region’s RAMBlock.
> >>> This allows QEMU to pass the fd to dma_map_file(), enabling iommufd
> >>> to import the dmabuf and map the BAR correctly in the host IOMMU
> >>> page
> >> table.
> >>>
> >>> If the kernel lacks support or dmabuf setup fails, QEMU skips the
> >>> setup and continues with normal mmap handling.
> >>>
> >>> Signed-off-by: Nicolin Chen <[email protected]>
> >>> Signed-off-by: Shameer Kolothum <[email protected]>
> >>> ---
> >>> hw/vfio/region.c | 57
> >> +++++++++++++++++++++++++++++++++++++++++++-
> >>> hw/vfio/trace-events | 1 +
> >>> 2 files changed, 57 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/vfio/region.c b/hw/vfio/region.c index
> >>> b165ab0b93..6949f6779c 100644
> >>> --- a/hw/vfio/region.c
> >>> +++ b/hw/vfio/region.c
> >>> @@ -29,6 +29,7 @@
> >>> #include "qemu/error-report.h"
> >>> #include "qemu/units.h"
> >>> #include "monitor/monitor.h"
> >>> +#include "system/ramblock.h"
> >>> #include "vfio-helpers.h"
> >>>
> >>> /*
> >>> @@ -238,13 +239,52 @@ static void vfio_subregion_unmap(VFIORegion
> >> *region, int index)
> >>> region->mmaps[index].mmap = NULL;
> >>> }
> >>>
> >>> +static int vfio_region_create_dma_buf(VFIORegion *region) {
> >>> + g_autofree struct vfio_device_feature *feature = NULL;
> >>> + VFIODevice *vbasedev = region->vbasedev;
> >>> + struct vfio_device_feature_dma_buf *dma_buf;
> >>> + size_t total_size;
> >>> + int i, ret;
> >>> +
> >>> + g_assert(region->nr_mmaps);
> >>> +
> >>> + total_size = sizeof(*feature) + sizeof(*dma_buf) +
> >>> + sizeof(struct vfio_region_dma_range) * region->nr_mmaps;
> >>> + feature = g_malloc0(total_size);
> >>> + *feature = (struct vfio_device_feature) {
> >>> + .argsz = total_size,
> >>> + .flags = VFIO_DEVICE_FEATURE_GET |
> >> VFIO_DEVICE_FEATURE_DMA_BUF,
> >>> + };
> >>> +
> >>> + dma_buf = (void *)feature->data;
> >>> + *dma_buf = (struct vfio_device_feature_dma_buf) {
> >>> + .region_index = region->nr,
> >>> + .open_flags = O_RDWR,
> >>> + .nr_ranges = region->nr_mmaps,
> >>> + };
> >>> +
> >>> + for (i = 0; i < region->nr_mmaps; i++) {
> >>> + dma_buf->dma_ranges[i].offset = region->mmaps[i].offset;
> >>> + dma_buf->dma_ranges[i].length = region->mmaps[i].size;
> >>> + }
> >>> +
> >>> + ret = vbasedev->io_ops->device_feature(vbasedev, feature);
> >>
> >> vbasedev->io_ops->device_feature may be NULL for other backend like
> >> vbasedev->vfio-
> >> user.
> >
> > Ah..Ok. I will add a check.
>
> Could you please add a global routine :
>
> int vfio_device_get_feature(VFIODevice *vbasedev, struct
> vfio_device_feature *feature)
Ok.
>
>
> >
> >>
> >>> + for (i = 0; i < region->nr_mmaps; i++) {
> >>> + trace_vfio_region_dmabuf(region->vbasedev->name, ret, region->nr,
> >>> + region->mem->name,
> >>> region->mmaps[i].offset,
> >>> + region->mmaps[i].size);
> >>> + }
> >>> + return ret;
> >>> +}
> >>> +
> >>> int vfio_region_mmap(VFIORegion *region)
> >>> {
> >>> int i, ret, prot = 0;
> >>> char *name;
> >>> int fd;
> >>>
> >>> - if (!region->mem) {
> >>> + if (!region->mem || !region->nr_mmaps) {
> >>
> >> Just curious, when will above check return true?
> > I think `!region->mem` covers cases where no MemoryRegion was created
> > (e.g. zero sized regions). And nr_mmaps checks regions with mmap
> > support exists (VFIO_REGION_INFO_FLAG_MMAP/ _CAP_SPARSE_MMAP).
> >
> >>
> >>> return 0;
> >>> }
> >>>
> >>> @@ -305,6 +345,21 @@ int vfio_region_mmap(VFIORegion *region)
> >>> region->mmaps[i].size - 1);
> >>> }
> >>>
> >>> + ret = vfio_region_create_dma_buf(region);
> >>> + if (ret < 0) {
> >>> + if (ret == -ENOTTY) {
> >>> + warn_report_once("VFIO dmabuf not supported in kernel");
> >>> + } else {
> >>> + error_report("%s: failed to create dmabuf: %s",
> >>> + memory_region_name(region->mem),
> >>> + strerror(errno));
>
> Shouldn't we return 'ret' in this case ?
That would result in:
Failed to mmap 0018:06:00.0 BAR 0. Performance may be slow
Not sure that error msg is correct in this context. If we don't return 'ret'
here
vfio_container_dma_map() will eventually report the warning:
qemu-system-aarch64: warning: vfio_container_dma_map(0xaaaaff67ad40,
0x58000000000, 0xb90000, 0xffff64000000) = -14 (Bad address)
0018:06:00.0: PCI peer-to-peer transactions on BARs are not supported.
qemu-system-aarch64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
I think the above is good enough for this. Please let me know.
Thanks,
Shameer