On Wed, Jun 17, 2026 at 05:00:45PM +1000, Gavin Shan wrote:
> Hi Michael,
>
> On 6/17/26 3:52 PM, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2026 at 12:35:00PM +1000, Gavin Shan wrote:
> > > On 6/16/26 3:44 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 16, 2026 at 03:40:34PM +1000, Gavin Shan wrote:
> > > > > On 6/16/26 3:25 PM, Gavin Shan wrote:
> > > > > > All ram device regions was turned to be indirectly accessible by
> > > > > > commit
> > > > > > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions").
> > > > > > This leads
> > > > > > to a hanged guest where a NVidia GH100 GPU is passed from host. The
> > > > > > memory
> > > > > > in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to
> > > > > > take
> > > > > > DMA bounce buffer in address_space_map() to cover the DMA request.
> > > > > > However,
> > > > > > the bounce buffer size is 4096 bytes and we're overrunning it
> > > > > > easily when
> > > > > > the guest has significant disk activities on compiling
> > > > > > 'cuda-samples'.
> > > > > > The full log and problem description can be found from PATCH[1/2]'s
> > > > > > commit
> > > > > > log.
> > > > > >
> > > > > > Try to fix the issue handled in commit 4a2e242bbb by replacing
> > > > > > memcopy()/
> > > > > > memmove() with newly added helpers qemu_ram_{copy, move}() that
> > > > > > works on
> > > > > > top of __builtin_{memcpy, memmove} or unaligned access friendly
> > > > > > memory
> > > > > > movement in the accessors to the ram device regions. With this, we
> > > > > > can
> > > > > > basically revert that commit to make ram device region directly
> > > > > > accessible
> > > > > > again and bypass the bounce buffer in address_space_map() where the
> > > > > > guest
> > > > > > hang is caused.
> > > > > >
> > > > > > PATCH[1] uses qemu_ram_{copy, move}() in ram device region accessors
> > > > > > PATCH[2] makes ram device region directly accessible again
> > > > > >
> > > > > Michael asked to include below context in the cover letter in v3, but
> > > > > I
> > > > > didn't noticed that before I sent v3 series, appended with them.
> > > > >
> > >
> > > Looking at the list of issues (questions) raised by Michael, I don't
> > > understand
> > > every one
> >
> > Gavin, I doubt one should make memory.c changes without understanding the
> > issues
> > it is trying to address.
> >
> > What is unclear? Ask away.
> >
>
> Yeah, absolutely. I need some time to understand all questions or suggestions
> by digging the code a bit, before I'm able to come back to you, but not very
> soon though :-)
>
> >
> > > before I'm able to put more time to dig, but I feel this series has
> > > too ambitious goal to cover accesses to all the directly accessible
> > > regions
> > > with the newly introduced qemu_ram_{copy, move}. It causes too many
> > > behavior
> > > changes and concerns, making this series impossible to land.
> > >
> > > I would suggest to break down the goal and step back to apply the newly
> > > introduced
> > > qemu_ram_{copy, move} to the ram device regions only? It's actually
> > > something
> > > proposed by Peter Xu in the earlier replies. Taking address_space_write()
> > > as an
> > > example, the indirectly accessible regions are covered by
> > > memory_region_dispatch_write()
> > > in (1), the ram device region is covered by qemu_ram_move() in (2), and
> > > all other
> > > directly accessible regions are covered by memmove() in (3).
> > >
> > > address_space_write
> > > flatview_write
> > > flatview_write_continue
> > > flatview_write_continue_step
> > > memory_access_size // (1) indirectly accessible
> > > region
> > > memory_region_dispatch_write
> > > access_with_adjusted_size
> > > memory_region_write_accessor
> > > mr->ops->write
> > > qemu_ram_move // (2) ram device region
> > > memmove // (3) all other directly
> > > accessible regions
> > >
> > > With the limitation, only the ram device regions in (2) are affected.
> > > We're
> > > basically moving the accesses to the ram device region from (1) to (2). No
> > > changes introduced to other types of regions. The goal is to make the ram
> > > device
> > > region accessible so that the bounce buffer can be bypassed in DMA path.
> >
> > Esthetics aside - ram device regions have all the same issues.
> >
> > Maybe you can limit the scope of the changes,
> > but I doubt you can get out understanding)
> >
>
> Yes. Limited to the scope of VFIO and PCI BARs, it depends on how the PCI BARs
> are mapped, pgprot_noncached or pgprot_writecombine. The listed problems and
> concerns are existing on the ram device region exposed with pgprot_noncached.
> However, everything should be just fine if the region is exposed with
> pgprot_writecombine.
> Am I understanding this correctly?
Not everything)
We have issues around guest RAM access, too.
But yes you can then mostly access device RAM as guest RAM on
arm and x86. I am not sure about power. for power pgprot_writecombine
is cache inhibited and I am not sure e.g. vector instructions
with misaligned addresses behave the same.
> [...]
>
> Thanks,
> Gavin