All ram device regions was turned to be indirectly accessible by commit
4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
to a hanged guest where a NVidia GH100 GPU is passed from host. The memory
in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to take
DMA bounce buffer in address_space_map() to cover the DMA request. However,
the bounce buffer size is 4096 bytes and we're overrunning it easily when
the guest has significant disk activities on compiling 'cuda-samples'.
The full log and problem description can be found from PATCH[1/2]'s commit
log.

Try to fix the issue handled in commit 4a2e242bbb by replacing mem{cpy, move}
with __builtin_mem{cpy, move} in the accessors to the ram device regions.
With this, we can basically revert that commit to make ram device region
directly accessible again and bypass the bounce buffer in address_space_map()
where the guest hang is caused.

PATCH[1] replaces mem{cpy, move} with __builtin_mem{cpy, move}
PATCH[2] makes ram device region directly accessible again

Changelog
=========
RFCv1 -> v1:
  * https://lists.nongnu.org/archive/html/qemu-arm/2026-06/msg00307.html
  * Reworked solution based on suggestions from Peter Xu, Peter Maydell
    and Michael S. Tsirkin

Gavin Shan (2):
  system/memory: Use __builtin_mem{cpy, move} in accessors of ram device
    region
  system/memory: Make ram device region directly accessible

 hw/remote/vfio-user-obj.c |  4 +--
 include/system/memory.h   | 53 +++++++++++++++++++++++++++++++--------
 system/memory.c           | 41 +-----------------------------
 system/physmem.c          |  8 +++---
 system/trace-events       |  2 --
 5 files changed, 50 insertions(+), 58 deletions(-)

-- 
2.54.0


Reply via email to