[Cc Liu Gang and Ding hui]

On 6/12/26 9:03 PM, Gavin Shan wrote:
All ram device regions was turned to be indirectly accessible by commit
4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads
to a hanged guest where a NVidia GH100 GPU is passed from host. The memory
in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to take
DMA bounce buffer in address_space_map() to cover the DMA request. However,
the bounce buffer size is 4096 bytes and we're overrunning it easily when
the guest has significant disk activities on compiling 'cuda-samples'.
The full log and problem description can be found from PATCH[1/2]'s commit
log.

Try to fix the issue handled in commit 4a2e242bbb by replacing mem{cpy, move}
with __builtin_mem{cpy, move} in the accessors to the ram device regions.
With this, we can basically revert that commit to make ram device region
directly accessible again and bypass the bounce buffer in address_space_map()
where the guest hang is caused.

PATCH[1] replaces mem{cpy, move} with __builtin_mem{cpy, move}
PATCH[2] makes ram device region directly accessible again


Liu and Ding, Could you give this series a try to see if your e1000 issue gets
fixed by this?

  
https://lore.kernel.org/qemu-devel/[email protected]/

Changelog
=========
RFCv1 -> v1:
   * https://lists.nongnu.org/archive/html/qemu-arm/2026-06/msg00307.html
   * Reworked solution based on suggestions from Peter Xu, Peter Maydell
     and Michael S. Tsirkin

Gavin Shan (2):
   system/memory: Use __builtin_mem{cpy, move} in accessors of ram device
     region
   system/memory: Make ram device region directly accessible

  hw/remote/vfio-user-obj.c |  4 +--
  include/system/memory.h   | 53 +++++++++++++++++++++++++++++++--------
  system/memory.c           | 41 +-----------------------------
  system/physmem.c          |  8 +++---
  system/trace-events       |  2 --
  5 files changed, 50 insertions(+), 58 deletions(-)


Thanks,
Gavin


Reply via email to