On Wed, 27 May 2026 at 14:23, Liu Gang <[email protected]> wrote: > > From: LiuGang <[email protected]> > > In x86 environments, writing the Rx descriptor status byte with a single > pci_dma_write() works correctly. However, on aarch64 with glibc 2.24+, > the memcpy/memmove implementation uses a branchless sequence that copies > the same byte three times when count == 1. This results in three > consecutive STRB operations. > Ref: > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/memcpy.S;hb=refs/heads/release/2.24/master#l124 > > When the guest uses an e1000 NIC and DPDK, this behavior can cause > packet reception to stop or packets to be dropped repeatedly. Under > normal operation, DPDK clears the DD flag after processing a packet and > then updates RDT. But due to the triple STRB from memcpy/memmove, the > DD flag may be cleared and then immediately set again. This causes DPDK > to consume the same packet twice, making RDT be incremented by one extra. > As a result, RDH == RDT, and QEMU mistakenly considers the receive > queue full, stopping packet reception. > > The issue can be reproduced on aarch64 using the adapted test program: > https://github.com/cdkey/e1000_poc > > Replace pci_dma_write() with address_space_stb() to write the DD status > byte directly, avoiding the problematic memcpy/memmove sequence and > resolving the issue. > > Fixes: 034d00d48581("e1000: set RX descriptor status in a separate > operation") > Signed-off-by: Liu Gang <[email protected]> > Signed-off-by: Ding Hui <[email protected]> > --- > hw/net/e1000.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > index 202ad40401..f7be035e0f 100644 > --- a/hw/net/e1000.c > +++ b/hw/net/e1000.c > @@ -869,6 +869,14 @@ e1000_receiver_overrun(E1000State *s, size_t size) > set_ics(s, 0, E1000_ICS_RXO); > } > > +static inline void > +e1000_dma_write_byte(PCIDevice *d, dma_addr_t addr, uint8_t val) > +{ > + AddressSpace *as = pci_get_address_space(d); > + dma_barrier(as, DMA_DIRECTION_FROM_DEVICE); > + address_space_stb(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);
I don't think there's any guarantee that we won't write multiple times to the memory in this case either. This eventually boils down to calling address_space_stm_internal which for dispatch to RAM calls stm_p(), which for a byte write will call stb_p(), which is "*(uint8_t *)ptr = v;" and there's no guarantee in the C standard about that doing only a single byte store to memory. I think this change only gives different behaviour by accident, not by design. -- PMM
