memory: Use __builtin_mem{cpy, move} in accessors of ram device region

Michael S. Tsirkin Sun, 14 Jun 2026 09:18:28 -0700

On Sun, Jun 14, 2026 at 05:01:34PM +0100, Peter Maydell wrote:
> On Sun, 14 Jun 2026 at 16:13, Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Fri, Jun 12, 2026 at 10:25:35AM -0700, Richard Henderson wrote:
> > > On 6/12/26 09:36, Peter Maydell wrote:
> > > > On Fri, 12 Jun 2026 at 16:29, Richard Henderson
> > > > <[email protected]> wrote:
> > > > >
> > > > > On 6/12/26 04:03, Gavin Shan wrote:
> > > > > > This replaces mem{cpy, move} with __builtin_mem{cpy, move} in the 
> > > > > > memory
> > > > > > accessors to ram device memory region, preparatory work to make ram 
> > > > > > device
> > > > > > region directly accessible and bypass the bounce buffer in the DMA 
> > > > > > path
> > > > > > in next patch.
> > > > >
> > > > > memcpy/memmove *always* compile to __builtin_memcpy/memmove, and the 
> > > > > compiler later
> > > > > decides whether or not to expand inline.
> > > >
> > > > Yes, but if you pass it a fixed small integer, then it is likely
> > > > to expand it inline, whereas if you pass it a variable then it
> > > > is likely not to... The patch is attempting to persuade the
> > > > compiler to definitely do an inline access for 1, 2, 4, 8
> > > > byte access.
> > >
> > > Sure, for hosts with unaligned accesses.  We still have sparc64 and 
> > > (some?)
> > > riscv64 that don't automatically have such and will compile to more than 
> > > one
> > > host instruction.
> > >
> > > > > My real question is: what are you attempting to achieve?
> > > > >
> > > > > (1) is the problem unaligned access to a mapped physical device?
> > > > > (2) is the problem vector access to a mapped physical device?
> > > > > (3) something else?
> > > >
> > > > I think there are two problems we're trying to fix here:
> > > >
> > > > (1) If a device does e.g. a pci_dma_write() with size 1, we want
> > > > this to turn into exactly 1 byte write into guest memory, for the
> > > > normal case where the guest memory is real host RAM.
> > > > This deals with the e1000 bug where the pci_dma_write() turns into
> > > > a call to glibc memmove() with size 1 and glibc's implementation
> > > > turns that into 3 writes of the byte to the same address...
> > >
> > > Gotcha.  Easily handled by not using memcpy/memmove at all.
> > >
> > >       *(char *)ptr = val;
> > >
> > > is sufficient for all hosts.
> >
> > Yes, I think it does work because we use -fno-strict-aliasing.
> > For bigger sizes we'll need packed because the addresses
> > could be unaligned.
> 
> IIRC "packed" will cause architectures that can't do
> unaligned word accesses to emit code to do byte accesses,
> so you don't want that.



I checked arm32 and while it does that, so does memcpy.
We'd have to explicitly code up aligned/unaligned usecases.

But do we really care about device assignment on those
hosts?

Maybe, the thing to do is just to ignore the issue.



> You need to explicitly check alignment,
> I think.
> 
> > But again, qemu simply already relies on this in bswap.h
> >
> > I kind of dislike muddying the waters by making several
> > unrelated changes here. If we do we should change bwap too.
> 
> The ldl_p etc functions in bswap.h provide different semantics:
> ldl_p() is "do a load of a 32-bit quantity, even if the
> address is not 4-aligned". We don't care if the compiler
> or the memcpy ends up doing that as 4 byte accesses, which
> on some hosts it must do.
> 
> thanks
> -- PMM

Re: [PATCH 1/2] system/memory: Use __builtin_mem{cpy, move} in accessors of ram device region

Reply via email to