[Cc Liu Gang and Ding hui] On 6/12/26 9:03 PM, Gavin Shan wrote:
All ram device regions was turned to be indirectly accessible by commit 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). This leads to a hanged guest where a NVidia GH100 GPU is passed from host. The memory in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to take DMA bounce buffer in address_space_map() to cover the DMA request. However, the bounce buffer size is 4096 bytes and we're overrunning it easily when the guest has significant disk activities on compiling 'cuda-samples'. The full log and problem description can be found from PATCH[1/2]'s commit log.Try to fix the issue handled in commit 4a2e242bbb by replacing mem{cpy, move} with __builtin_mem{cpy, move} in the accessors to the ram device regions. With this, we can basically revert that commit to make ram device region directly accessible again and bypass the bounce buffer in address_space_map() where the guest hang is caused. PATCH[1] replaces mem{cpy, move} with __builtin_mem{cpy, move} PATCH[2] makes ram device region directly accessible again
Liu and Ding, Could you give this series a try to see if your e1000 issue gets fixed by this? https://lore.kernel.org/qemu-devel/[email protected]/
Changelog ========= RFCv1 -> v1: * https://lists.nongnu.org/archive/html/qemu-arm/2026-06/msg00307.html * Reworked solution based on suggestions from Peter Xu, Peter Maydell and Michael S. Tsirkin Gavin Shan (2): system/memory: Use __builtin_mem{cpy, move} in accessors of ram device region system/memory: Make ram device region directly accessible hw/remote/vfio-user-obj.c | 4 +-- include/system/memory.h | 53 +++++++++++++++++++++++++++++++-------- system/memory.c | 41 +----------------------------- system/physmem.c | 8 +++--- system/trace-events | 2 -- 5 files changed, 50 insertions(+), 58 deletions(-)
Thanks, Gavin
