On Wed, 27 May 2026 at 14:23, Liu Gang <[email protected]> wrote:
>
> From: LiuGang <[email protected]>
>
> In x86 environments, writing the Rx descriptor status byte with a single
> pci_dma_write() works correctly. However, on aarch64 with glibc 2.24+,
> the memcpy/memmove implementation uses a branchless sequence that copies
> the same byte three times when count == 1. This results in three
> consecutive STRB operations.
> Ref: 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/memcpy.S;hb=refs/heads/release/2.24/master#l124
>
> When the guest uses an e1000 NIC and DPDK, this behavior can cause
> packet reception to stop or packets to be dropped repeatedly. Under
> normal operation, DPDK clears the DD flag after processing a packet and
> then updates RDT. But due to the triple STRB from memcpy/memmove, the
> DD flag may be cleared and then immediately set again. This causes DPDK
> to consume the same packet twice, making RDT be incremented by one extra.
> As a result, RDH == RDT, and QEMU mistakenly considers the receive
> queue full, stopping packet reception.
>
> The issue can be reproduced on aarch64 using the adapted test program:
> https://github.com/cdkey/e1000_poc
>
> Replace pci_dma_write() with address_space_stb() to write the DD status
> byte directly, avoiding the problematic memcpy/memmove sequence and
> resolving the issue.
>
> Fixes: 034d00d48581("e1000: set RX descriptor status in a separate
> operation")
> Signed-off-by: Liu Gang <[email protected]>
> Signed-off-by: Ding Hui <[email protected]>
> ---
>  hw/net/e1000.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index 202ad40401..f7be035e0f 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -869,6 +869,14 @@ e1000_receiver_overrun(E1000State *s, size_t size)
>      set_ics(s, 0, E1000_ICS_RXO);
>  }
>
> +static inline void
> +e1000_dma_write_byte(PCIDevice *d, dma_addr_t addr, uint8_t val)
> +{
> +    AddressSpace *as = pci_get_address_space(d);
> +    dma_barrier(as, DMA_DIRECTION_FROM_DEVICE);
> +    address_space_stb(as, addr, val, MEMTXATTRS_UNSPECIFIED, NULL);

I don't think there's any guarantee that we won't write
multiple times to the memory in this case either. This
eventually boils down to calling address_space_stm_internal
which for dispatch to RAM calls stm_p(), which for a byte
write will call stb_p(), which is "*(uint8_t *)ptr = v;"
and there's no guarantee in the C standard about that doing
only a single byte store to memory.

I think this change only gives different behaviour by accident,
not by design.

-- PMM

Reply via email to