On Tue, Jun 16, 2026 at 08:50:27PM +0800, Ding Hui wrote: > > What is the question, exactly? That's why I made the list of 11 issues. > > Is that unclear, somehow? Another reason to include that, maybe. > > > > We can take the e1000 issue on aarch64 for example: > > Link: > https://lore.kernel.org/qemu-devel/[email protected]/ > > The software allocate ring buffer for hardware to receive packet. > In normal case, the hardware fill the RX ring descriptor, and with status DD > bit=1 (Descriptor Done), > then the ownership of the ring descriptor is converted to software, after the > software consume > the descriptor, it will write status DD bit=0 and give it back to hardware to > refill data. > > pci_dma_write calls memcpy at the underlying level, and on aarch64 with glibc > 2.24+, > the memcpy/memmove implementation uses a branchless sequence that copies > the same byte three times when count == 1.
Yes this is issue 11: 10. on non-x86 memcpy will do multiple overlapping stores even for single byte writes. E.g. it does it to avoid extra branches. This is causing issues in practice. -- MST
