On 6/17/26 1:51 AM, Michael S. Tsirkin wrote:
On Tue, Jun 16, 2026 at 08:50:27PM +0800, Ding Hui wrote:
What is the question, exactly? That's why I made the list of 11 issues.
Is that unclear, somehow? Another reason to include that, maybe.
We can take the e1000 issue on aarch64 for example:
Link:
https://lore.kernel.org/qemu-devel/[email protected]/
The software allocate ring buffer for hardware to receive packet.
In normal case, the hardware fill the RX ring descriptor, and with status DD
bit=1 (Descriptor Done),
then the ownership of the ring descriptor is converted to software, after the
software consume
the descriptor, it will write status DD bit=0 and give it back to hardware to
refill data.
pci_dma_write calls memcpy at the underlying level, and on aarch64 with glibc
2.24+,
the memcpy/memmove implementation uses a branchless sequence that copies
the same byte three times when count == 1.
Yes this is issue 11:
10. on non-x86 memcpy will do multiple overlapping stores even
for single byte writes. E.g. it does it to avoid extra branches.
This is causing issues in practice.
This issue should be fixed by this series. With this series applied, the
memcpy/memmove
are replaced with 'qatomic_set((uint8_t *)dst, qatomic_read((uint8_t *)src))' in
qemu_ram_{copy, backwards_copy}_unaligned(). I hope Ding can give this series a
try
to confirm.
After this series is applied:
pci_dma_write
pci_dma_rw
dma_memory_rw
dma_memory_rw_relaxed
address_space_rw
address_space_write
flatview_write
flatview_write_continue
flatview_write_continue_step
memory_access_is_direct // return true
qemu_ram_move // replaced original
memmove()
qemu_ram_{copy, backwards_copy}_unaligned
qatomic_set((uint8_t *)dst, qatomic_read((uint8_t
*)src));
For this specific case, what's done in this series is similar to the proposed
in that thread [1]. In the proposal, pci_dma_write() is replaced with
address_space_stb(),
and eventually bail into "*(uint8_t *)ptr = v;". However, address_space_stb()
seems not friendly to DMA write from from syntax level because the limited
size is supported by it.
[1]
https://lore.kernel.org/qemu-devel/[email protected]/
address_space_stb
address_space_stm_internal
stm_p
stb_p
*(uint8_t *)ptr = v;
Thanks,
Gavin