On Tue, Jun 16, 2026 at 08:50:27PM +0800, Ding Hui wrote:
> > What is the question, exactly? That's why I made the list of 11 issues.
> > Is that unclear, somehow?  Another reason to include that, maybe.
> > 
> 
> We can take the e1000 issue on aarch64 for example:
> 
> Link: 
> https://lore.kernel.org/qemu-devel/[email protected]/
> 
> The software allocate ring buffer for hardware to receive packet.
> In normal case, the hardware fill the RX ring descriptor, and with status DD 
> bit=1 (Descriptor Done),
> then the ownership of the ring descriptor is converted to software, after the 
> software consume
> the descriptor, it will write status DD bit=0 and give it back to hardware to 
> refill data.
> 
> pci_dma_write calls memcpy at the underlying level, and on aarch64 with glibc 
> 2.24+,
> the memcpy/memmove implementation uses a branchless sequence that copies
> the same byte three times when count == 1.

Yes this is issue 11:

10. on non-x86 memcpy will do multiple overlapping stores even
for single byte writes. E.g. it does it to avoid extra branches.
This is causing issues in practice.


-- 
MST


Reply via email to