On 5/27/26 05:14, Chenyi Qiang wrote:
On 5/27/2026 3:44 AM, Cédric Le Goater wrote:
Hello Chenyi,
On 5/19/26 07:57, Chenyi Qiang wrote:
vfio_container_dma_map() uses dma_map_file() whenever a RAMBlock has an
fd and the VFIO IOMMU backend supports file-based DMA mapping. That is
not correct for private file-backed RAM.
dma_map_file() resolves PFNs from the backing file, but private
mappings can run on different PFNs than the file itself. As a result,
using dma_map_file() on a private RAMBlock can program DMA against pages
that do not back QEMU's actual guest memory.
This was observed with hugetlbfs-backed guest memory and iommufd/VFIO:
share=on works, while share=off can fault because the file-backed PFNs
can diverge from the PFNs backing QEMU's private mapping.
Fix this by using dma_map_file() only for shared RAMBlocks.
Fixes: fb32965b6dd8 ("vfio/iommufd: use IOMMU_IOAS_MAP_FILE")
Reported-by: Farrah Chen <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220776
Signed-off-by: Chenyi Qiang <[email protected]>
---
hw/vfio/container.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 4c2816b574..c5a3c60a27 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -82,7 +82,7 @@ int vfio_container_dma_map(VFIOContainer *bcontainer,
RAMBlock *rb = mr->ram_block;
int mfd = rb ? qemu_ram_get_fd(rb) : -1;
- if (mfd >= 0 && vioc->dma_map_file) {
+ if (mfd >= 0 && vioc->dma_map_file && qemu_ram_is_shared(rb)) {
I think we should introduce an helper to check that the conditions
for using the .dma_map_file() handler are met. Something like :
if (vfio_container_can_dma_map_file(bcontainer, ...)) {
RAMBlock *rb = mr->ram_block;
and
/*
* We can use IOMMU DMA mapping (IOMMU_IOAS_MAP_FILE) for :
*
* 1) Guest RAM blocks explicitly configured as shared (MAP_SHARED)
* 2) RAM device sub-regions (MMIO BARs)
*
* Private RAM mappings (MAP_PRIVATE) are strictly excluded. Because
* they are subject to copy-on-write (COW) anomalies, their
* underlying PFNs can permanently diverge from the backing file
*/
bool vfio_container_can_dma_map_file(VFIOContainer *bcontainer, ...)
{
return ...
}
How's that ?
Thanks for the review. I'm fine with introducing the helper.
But I'm not sure if it's cleaner to include duplicated calculations in both the
caller and helper.
+static bool vfio_container_can_dma_map_file(VFIOContainer *bcontainer, RAMBlock *rb)
+{
+ VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+
+ return rb && qemu_ram_get_fd(rb) >= 0 && vioc->dma_map_file &&
+ qemu_ram_is_shared(rb);
+}
+
int vfio_container_dma_map(VFIOContainer *bcontainer, ...)
{
...
RAMBlock *rb = mr->ram_block;
int mfd = rb ? qemu_ram_get_fd(rb) : -1;
- if (mfd >= 0 && vioc->dma_map_file && qemu_ram_is_shared(rb)) {
+ if (vfio_container_can_dma_map_file(bcontainer, rb)) {
unsigned long start = vaddr - qemu_ram_get_host_addr(rb);
unsigned long offset = qemu_ram_get_fd_offset(rb);
Perhaps it can be simplified slightly.
I was thinking moving the check on qemu_ram_get_fd(rb) within the
new routine. Something like :
static bool vfio_container_can_dma_map_file(VFIOContainer *bcontainer,
MemoryRegion *mr, int *fd)
{
VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
RAMBlock *rb = mr->ram_block;
if (!vioc->dma_map_file || !rb) {
return false;
}
*fd = qemu_ram_get_fd(rb);
if (*fd < 0) {
return false;
}
/*
* We can use IOMMU DMA mapping (IOMMU_IOAS_MAP_FILE) for :
*
* 1) Guest RAM blocks explicitly configured as shared (MAP_SHARED)
* 2) RAM device sub-regions (MMIO BARs)
*
* Private RAM mappings (MAP_PRIVATE) are strictly excluded. Because
* they are subject to copy-on-write (COW) anomalies, their underlying
* PFNs can permanently diverge from the backing file
*/
return qemu_ram_is_shared(rb) || memory_region_is_ram_device(mr);
}
or simpler :
static bool vfio_container_can_dma_map_file(MemoryRegion *mr, int *fd)
{
RAMBlock *rb = mr->ram_block;
if (!rb) {
return false;
}
*fd = qemu_ram_get_fd(rb);
if (*fd < 0) {
return false;
}
/* ... */
return qemu_ram_is_shared(rb) || memory_region_is_ram_device(mr);
}
I think option 2. is best. you choose.
Thanks,
C.