On Sun, Jun 14, 2026 at 05:01:34PM +0100, Peter Maydell wrote: > On Sun, 14 Jun 2026 at 16:13, Michael S. Tsirkin <[email protected]> wrote: > > > > On Fri, Jun 12, 2026 at 10:25:35AM -0700, Richard Henderson wrote: > > > On 6/12/26 09:36, Peter Maydell wrote: > > > > On Fri, 12 Jun 2026 at 16:29, Richard Henderson > > > > <[email protected]> wrote: > > > > > > > > > > On 6/12/26 04:03, Gavin Shan wrote: > > > > > > This replaces mem{cpy, move} with __builtin_mem{cpy, move} in the > > > > > > memory > > > > > > accessors to ram device memory region, preparatory work to make ram > > > > > > device > > > > > > region directly accessible and bypass the bounce buffer in the DMA > > > > > > path > > > > > > in next patch. > > > > > > > > > > memcpy/memmove *always* compile to __builtin_memcpy/memmove, and the > > > > > compiler later > > > > > decides whether or not to expand inline. > > > > > > > > Yes, but if you pass it a fixed small integer, then it is likely > > > > to expand it inline, whereas if you pass it a variable then it > > > > is likely not to... The patch is attempting to persuade the > > > > compiler to definitely do an inline access for 1, 2, 4, 8 > > > > byte access. > > > > > > Sure, for hosts with unaligned accesses. We still have sparc64 and > > > (some?) > > > riscv64 that don't automatically have such and will compile to more than > > > one > > > host instruction. > > > > > > > > My real question is: what are you attempting to achieve? > > > > > > > > > > (1) is the problem unaligned access to a mapped physical device? > > > > > (2) is the problem vector access to a mapped physical device? > > > > > (3) something else? > > > > > > > > I think there are two problems we're trying to fix here: > > > > > > > > (1) If a device does e.g. a pci_dma_write() with size 1, we want > > > > this to turn into exactly 1 byte write into guest memory, for the > > > > normal case where the guest memory is real host RAM. > > > > This deals with the e1000 bug where the pci_dma_write() turns into > > > > a call to glibc memmove() with size 1 and glibc's implementation > > > > turns that into 3 writes of the byte to the same address... > > > > > > Gotcha. Easily handled by not using memcpy/memmove at all. > > > > > > *(char *)ptr = val; > > > > > > is sufficient for all hosts. > > > > Yes, I think it does work because we use -fno-strict-aliasing. > > For bigger sizes we'll need packed because the addresses > > could be unaligned. > > IIRC "packed" will cause architectures that can't do > unaligned word accesses to emit code to do byte accesses, > so you don't want that.
I checked arm32 and while it does that, so does memcpy. We'd have to explicitly code up aligned/unaligned usecases. But do we really care about device assignment on those hosts? Maybe, the thing to do is just to ignore the issue. > You need to explicitly check alignment, > I think. > > > But again, qemu simply already relies on this in bswap.h > > > > I kind of dislike muddying the waters by making several > > unrelated changes here. If we do we should change bwap too. > > The ldl_p etc functions in bswap.h provide different semantics: > ldl_p() is "do a load of a 32-bit quantity, even if the > address is not 4-aligned". We don't care if the compiler > or the memcpy ends up doing that as 4 byte accesses, which > on some hosts it must do. > > thanks > -- PMM
