On Wed, Jun 17, 2026 at 05:00:45PM +1000, Gavin Shan wrote:
> Hi Michael,
> 
> On 6/17/26 3:52 PM, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2026 at 12:35:00PM +1000, Gavin Shan wrote:
> > > On 6/16/26 3:44 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 16, 2026 at 03:40:34PM +1000, Gavin Shan wrote:
> > > > > On 6/16/26 3:25 PM, Gavin Shan wrote:
> > > > > > All ram device regions was turned to be indirectly accessible by 
> > > > > > commit
> > > > > > 4a2e242bbb ("memory: Don't use memcpy for ram_device regions"). 
> > > > > > This leads
> > > > > > to a hanged guest where a NVidia GH100 GPU is passed from host. The 
> > > > > > memory
> > > > > > in its PCI BAR#4 can be allocated as DMA target buffer. qemu has to 
> > > > > > take
> > > > > > DMA bounce buffer in address_space_map() to cover the DMA request. 
> > > > > > However,
> > > > > > the bounce buffer size is 4096 bytes and we're overrunning it 
> > > > > > easily when
> > > > > > the guest has significant disk activities on compiling 
> > > > > > 'cuda-samples'.
> > > > > > The full log and problem description can be found from PATCH[1/2]'s 
> > > > > > commit
> > > > > > log.
> > > > > > 
> > > > > > Try to fix the issue handled in commit 4a2e242bbb by replacing 
> > > > > > memcopy()/
> > > > > > memmove() with newly added helpers qemu_ram_{copy, move}() that 
> > > > > > works on
> > > > > > top of __builtin_{memcpy, memmove} or unaligned access friendly 
> > > > > > memory
> > > > > > movement in the accessors to the ram device regions. With this, we 
> > > > > > can
> > > > > > basically revert that commit to make ram device region directly 
> > > > > > accessible
> > > > > > again and bypass the bounce buffer in address_space_map() where the 
> > > > > > guest
> > > > > > hang is caused.
> > > > > > 
> > > > > > PATCH[1] uses qemu_ram_{copy, move}() in ram device region accessors
> > > > > > PATCH[2] makes ram device region directly accessible again
> > > > > > 
> > > > > Michael asked to include below context in the cover letter in v3, but 
> > > > > I
> > > > > didn't noticed that before I sent v3 series, appended with them.
> > > > > 
> > > 
> > > Looking at the list of issues (questions) raised by Michael, I don't 
> > > understand
> > > every one
> > 
> > Gavin, I doubt one should make memory.c changes without understanding the 
> > issues
> > it is trying to address.
> > 
> > What is unclear? Ask away.
> > 
> 
> Yeah, absolutely. I need some time to understand all questions or suggestions
> by digging the code a bit, before I'm able to come back to you, but not very
> soon though :-)
> 
> > 
> > > before I'm able to put more time to dig, but I feel this series has
> > > too ambitious goal to cover accesses to all the directly accessible 
> > > regions
> > > with the newly introduced qemu_ram_{copy, move}. It causes too many 
> > > behavior
> > > changes and concerns, making this series impossible to land.
> > > 
> > > I would suggest to break down the goal and step back to apply the newly 
> > > introduced
> > > qemu_ram_{copy, move} to the ram device regions only? It's actually 
> > > something
> > > proposed by Peter Xu in the earlier replies. Taking address_space_write() 
> > > as an
> > > example, the indirectly accessible regions are covered by 
> > > memory_region_dispatch_write()
> > > in (1), the ram device region is covered by qemu_ram_move() in (2), and 
> > > all other
> > > directly accessible regions are covered by memmove() in (3).
> > > 
> > >    address_space_write
> > >      flatview_write
> > >        flatview_write_continue
> > >          flatview_write_continue_step
> > >            memory_access_size             // (1) indirectly accessible 
> > > region
> > >            memory_region_dispatch_write
> > >              access_with_adjusted_size
> > >                memory_region_write_accessor
> > >                  mr->ops->write
> > >            qemu_ram_move                  // (2) ram device region
> > >            memmove                        // (3) all other directly 
> > > accessible regions
> > > 
> > > With the limitation, only the ram device regions in (2) are affected. 
> > > We're
> > > basically moving the accesses to the ram device region from (1) to (2). No
> > > changes introduced to other types of regions. The goal is to make the ram 
> > > device
> > > region accessible so that the bounce buffer can be bypassed in DMA path.
> > 
> > Esthetics aside - ram device regions have all the same issues.
> > 
> > Maybe you can limit the scope of the changes,
> > but I doubt you can get out understanding)
> > 
> 
> Yes. Limited to the scope of VFIO and PCI BARs, it depends on how the PCI BARs
> are mapped, pgprot_noncached or pgprot_writecombine. The listed problems and
> concerns are existing on the ram device region exposed with pgprot_noncached.
> However, everything should be just fine if the region is exposed with 
> pgprot_writecombine.
> Am I understanding this correctly?

Not everything)

We have issues around guest RAM access, too.

But yes you can then mostly access device RAM as guest RAM on
arm and x86. I am not sure about power. for power pgprot_writecombine
is cache inhibited and I am not sure e.g. vector instructions
with misaligned addresses behave the same.



> [...]
> 
> Thanks,
> Gavin


Reply via email to