All ram device regions was turned to be indirectly accessible by commit
4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
to a hanged guest where a NVidia GH100 GPU is passed from host. The memory
in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to take
DMA bounce buffer in address_space_map() to cover the DMA request. However,
the bounce buffer size is 4096 bytes and we're overrunning it easily when
the guest has significant disk activities on compiling 'cuda-samples'.
The full log and problem description can be found from PATCH[1/2]'s commit
log.
Try to fix the issue handled in commit 4a2e242bbb by replacing mem{cpy, move}
with __builtin_mem{cpy, move} in the accessors to the ram device regions.
With this, we can basically revert that commit to make ram device region
directly accessible again and bypass the bounce buffer in address_space_map()
where the guest hang is caused.
PATCH[1] replaces mem{cpy, move} with __builtin_mem{cpy, move}
PATCH[2] makes ram device region directly accessible again
Changelog
=========
RFCv1 -> v1:
* https://lists.nongnu.org/archive/html/qemu-arm/2026-06/msg00307.html
* Reworked solution based on suggestions from Peter Xu, Peter Maydell
and Michael S. Tsirkin
Gavin Shan (2):
system/memory: Use __builtin_mem{cpy, move} in accessors of ram device
region
system/memory: Make ram device region directly accessible
hw/remote/vfio-user-obj.c | 4 +--
include/system/memory.h | 53 +++++++++++++++++++++++++++++++--------
system/memory.c | 41 +-----------------------------
system/physmem.c | 8 +++---
system/trace-events | 2 --
5 files changed, 50 insertions(+), 58 deletions(-)
--
2.54.0