On 7/7/2024 6:50 AM, Mihai Brodschi wrote: > Hi Ferruh, > > On 07/07/2024 05:12, Ferruh Yigit wrote: >> On 6/28/2024 10:01 PM, Mihai Brodschi wrote: >>> rte_pktmbuf_alloc_bulk is called by the zero-copy receiver to allocate >>> new mbufs to be provided to the sender. The allocated mbuf pointers >>> are stored in a ring, but the alloc function doesn't implement index >>> wrap-around, so it writes past the end of the array. This results in >>> memory corruption and duplicate mbufs being received. >>> >> >> Hi Mihai, >> >> I am not sure writing past the ring actually occurs. >> >> As far as I can see is to keep the ring full as much as possible, when >> initially 'head' and 'tail' are 0, it fills all ring. >> Later tails moves and emptied space filled again. So head (in modulo) is >> always just behind tail after refill. In next run, refill will only fill >> the part tail moved, and this is calculated by 'n_slots'. As this is >> only the size of the gap, starting from 'head' (with modulo) shouldn't >> pass the ring length. >> >> Do you observe this issue practically? If so can you please provide your >> backtrace and numbers that is showing how to reproduce the issue? > > The alloc function writes starting from the ring's head, but the ring's > head can be located at the end of the ring's memory buffer (ring_size - 1). > The correct behavior would be to wrap around to the start of the buffer (0), > but the alloc function has no awareness of the fact that it's writing to a > ring, so it writes to ring_size, ring_size + 1, etc. > > Let's look at the existing code: > We assume the ring size is 256 and we just received 32 packets. > The previous tail was at index 255, now it's at index 31. > The head is initially at index 255. > > head = __atomic_load_n(&ring->head, __ATOMIC_RELAXED); // head = 255 > n_slots = ring_size - head + mq->last_tail; // n_slots = 32 > > if (n_slots < 32) // not taken > goto no_free_mbufs; > > ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots); > // This will write 32 mbuf pointers starting at index (head & mask) = 255. > // The ring size is 256, so apart from the first one all pointers will be > // written out of bounds (index 256 .. 286, when it should be 0 .. 30). >
My expectation is numbers should be like following: Initially: size = 256 head = 0 tail = 0 In first refill: n_slots = 256 head = 256 tail = 0 Subsequent run that 32 slots used: head = 256 tail = 32 n_slots = 32 rte_pktmbuf_alloc_bulk(mq, buf[head & mask], n_slots); head & mask = 0 // So it fills first 32 elements of buffer, which is inbound This will continue as above, combination of only gap filled and head masked with 'mask' provides the wrapping required. > I can reproduce a crash 100% of the time with my application, but the output > is not very helpful, since it crashes elsewhere because of mempool corruption. > Applying this patch fixes the crashes completely. > This causing always reproducible crash means existing memif zero copy Rx is broken and nobody can use it, but I am suspicions that this is the case, perhaps something special in your usecase triggering this issue. @Jakup, can you please confirm that memif Rx zero copy is tested? >>> Allocate 2x the space for the mbuf ring, so that the alloc function >>> has a contiguous array to write to, then copy the excess entries >>> to the start of the array. >>> >> >> Even issue is valid, I am not sure about solution to double to buffer >> memory, but lets confirm the issue first before discussing the solution. > > Initially, I thought about splitting the call to rte_pktmbuf_alloc_bulk in > two, > but I thought that might be bad for performance if the mempool is being used > concurrently from multiple threads. > > If we want to use only one call to rte_pktmbuf_alloc_bulk, we need an array > to store the allocated mbuf pointers. This array must be of length ring_size, > since that's the maximum amount of mbufs which may be allocated in one go. > We need to copy the pointers from this array to the ring. > > If we instead allocate twice the space for the ring, we can skip copying > the pointers which were written to the ring, and only copy those that were > written outside of its bounds. > First thing comes my mind was also using two 'rte_pktmbuf_alloc_bulk()' calls. I can see why you prefer doubling the buffer size, but it comes with copying overhead. So both options comes with some overhead, not sure which one is better, although I am leaning to the first solution we should do some measurements to decide. BUT first lets agree on the problem first, before doing more work, I am not still fully convinced that original code is wrong. >>> Fixes: 43b815d88188 ("net/memif: support zero-copy slave") >>> Cc: sta...@dpdk.org >>> Signed-off-by: Mihai Brodschi <mihai.brods...@broadcom.com> >>> --- >>> v2: >>> - fix email formatting >>> >>> --- >>> drivers/net/memif/rte_eth_memif.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/memif/rte_eth_memif.c >>> b/drivers/net/memif/rte_eth_memif.c >>> index 16da22b5c6..3491c53cf1 100644 >>> --- a/drivers/net/memif/rte_eth_memif.c >>> +++ b/drivers/net/memif/rte_eth_memif.c >>> @@ -600,6 +600,10 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, >>> uint16_t nb_pkts) >>> ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], >>> n_slots); >>> if (unlikely(ret < 0)) >>> goto no_free_mbufs; >>> + if (unlikely(n_slots > ring_size - (head & mask))) { >>> + rte_memcpy(mq->buffers, &mq->buffers[ring_size], >>> + (n_slots + (head & mask) - ring_size) * sizeof(struct >>> rte_mbuf *)); >>> + } >>> >>> while (n_slots--) { >>> s0 = head++ & mask; >>> @@ -1245,8 +1249,12 @@ memif_init_queues(struct rte_eth_dev *dev) >>> } >>> mq->buffers = NULL; >>> if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) { >>> + /* >>> + * Allocate 2x ring_size to reserve a contiguous array >>> for >>> + * rte_pktmbuf_alloc_bulk (to store allocated mbufs). >>> + */ >>> mq->buffers = rte_zmalloc("bufs", sizeof(struct >>> rte_mbuf *) * >>> - (1 << mq->log2_ring_size), 0); >>> + (1 << (mq->log2_ring_size + >>> 1)), 0); >>> if (mq->buffers == NULL) >>> return -ENOMEM; >>> } >> > > Apologies for sending this multiple times, I'm not familiar with mailing > lists. > >