On 7/7/2024 6:50 AM, Mihai Brodschi wrote:
> Hi Ferruh,
> 
> On 07/07/2024 05:12, Ferruh Yigit wrote:
>> On 6/28/2024 10:01 PM, Mihai Brodschi wrote:
>>> rte_pktmbuf_alloc_bulk is called by the zero-copy receiver to allocate
>>> new mbufs to be provided to the sender. The allocated mbuf pointers
>>> are stored in a ring, but the alloc function doesn't implement index
>>> wrap-around, so it writes past the end of the array. This results in
>>> memory corruption and duplicate mbufs being received.
>>>
>>
>> Hi Mihai,
>>
>> I am not sure writing past the ring actually occurs.
>>
>> As far as I can see is to keep the ring full as much as possible, when
>> initially 'head' and 'tail' are 0, it fills all ring.
>> Later tails moves and emptied space filled again. So head (in modulo) is
>> always just behind tail after refill. In next run, refill will only fill
>> the part tail moved, and this is calculated by 'n_slots'. As this is
>> only the size of the gap, starting from 'head' (with modulo) shouldn't
>> pass the ring length.
>>
>> Do you observe this issue practically? If so can you please provide your
>> backtrace and numbers that is showing how to reproduce the issue?
> 
> The alloc function writes starting from the ring's head, but the ring's
> head can be located at the end of the ring's memory buffer (ring_size - 1).
> The correct behavior would be to wrap around to the start of the buffer (0),
> but the alloc function has no awareness of the fact that it's writing to a
> ring, so it writes to ring_size, ring_size + 1, etc.
> 
> Let's look at the existing code:
> We assume the ring size is 256 and we just received 32 packets.
> The previous tail was at index 255, now it's at index 31.
> The head is initially at index 255.
> 
> head = __atomic_load_n(&ring->head, __ATOMIC_RELAXED);        // head = 255
> n_slots = ring_size - head + mq->last_tail;           // n_slots = 32
> 
> if (n_slots < 32)                                     // not taken
>       goto no_free_mbufs;
> 
> ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], n_slots);
> // This will write 32 mbuf pointers starting at index (head & mask) = 255.
> // The ring size is 256, so apart from the first one all pointers will be
> // written out of bounds (index 256 .. 286, when it should be 0 .. 30).
> 

My expectation is numbers should be like following:

Initially:
 size = 256
 head = 0
 tail = 0

In first refill:
 n_slots = 256
 head = 256
 tail = 0

Subsequent run that 32 slots used:
 head = 256
 tail = 32
 n_slots = 32
 rte_pktmbuf_alloc_bulk(mq, buf[head & mask], n_slots);
  head & mask = 0
  // So it fills first 32 elements of buffer, which is inbound

This will continue as above, combination of only gap filled and head
masked with 'mask' provides the wrapping required.


> I can reproduce a crash 100% of the time with my application, but the output
> is not very helpful, since it crashes elsewhere because of mempool corruption.
> Applying this patch fixes the crashes completely.
> 

This causing always reproducible crash means existing memif zero copy Rx
is broken and nobody can use it, but I am suspicions that this is the
case, perhaps something special in your usecase triggering this issue.

@Jakup, can you please confirm that memif Rx zero copy is tested?

>>> Allocate 2x the space for the mbuf ring, so that the alloc function
>>> has a contiguous array to write to, then copy the excess entries
>>> to the start of the array.
>>>
>>
>> Even issue is valid, I am not sure about solution to double to buffer
>> memory, but lets confirm the issue first before discussing the solution.
> 
> Initially, I thought about splitting the call to rte_pktmbuf_alloc_bulk in 
> two,
> but I thought that might be bad for performance if the mempool is being used
> concurrently from multiple threads.
> 
> If we want to use only one call to rte_pktmbuf_alloc_bulk, we need an array
> to store the allocated mbuf pointers. This array must be of length ring_size,
> since that's the maximum amount of mbufs which may be allocated in one go.
> We need to copy the pointers from this array to the ring.
> 
> If we instead allocate twice the space for the ring, we can skip copying
> the pointers which were written to the ring, and only copy those that were
> written outside of its bounds.
> 

First thing comes my mind was also using two 'rte_pktmbuf_alloc_bulk()'
calls.
I can see why you prefer doubling the buffer size, but it comes with
copying overhead.
So both options comes with some overhead, not sure which one is better,
although I am leaning to the first solution we should do some
measurements to decide.

BUT first lets agree on the problem first, before doing more work, I am
not still fully convinced that original code is wrong.

>>> Fixes: 43b815d88188 ("net/memif: support zero-copy slave")
>>> Cc: sta...@dpdk.org
>>> Signed-off-by: Mihai Brodschi <mihai.brods...@broadcom.com>
>>> ---
>>> v2:
>>>  - fix email formatting
>>>
>>> ---
>>>  drivers/net/memif/rte_eth_memif.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/memif/rte_eth_memif.c 
>>> b/drivers/net/memif/rte_eth_memif.c
>>> index 16da22b5c6..3491c53cf1 100644
>>> --- a/drivers/net/memif/rte_eth_memif.c
>>> +++ b/drivers/net/memif/rte_eth_memif.c
>>> @@ -600,6 +600,10 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, 
>>> uint16_t nb_pkts)
>>>     ret = rte_pktmbuf_alloc_bulk(mq->mempool, &mq->buffers[head & mask], 
>>> n_slots);
>>>     if (unlikely(ret < 0))
>>>             goto no_free_mbufs;
>>> +   if (unlikely(n_slots > ring_size - (head & mask))) {
>>> +           rte_memcpy(mq->buffers, &mq->buffers[ring_size],
>>> +                   (n_slots + (head & mask) - ring_size) * sizeof(struct 
>>> rte_mbuf *));
>>> +   }
>>>  
>>>     while (n_slots--) {
>>>             s0 = head++ & mask;
>>> @@ -1245,8 +1249,12 @@ memif_init_queues(struct rte_eth_dev *dev)
>>>             }
>>>             mq->buffers = NULL;
>>>             if (pmd->flags & ETH_MEMIF_FLAG_ZERO_COPY) {
>>> +                   /*
>>> +                    * Allocate 2x ring_size to reserve a contiguous array 
>>> for
>>> +                    * rte_pktmbuf_alloc_bulk (to store allocated mbufs).
>>> +                    */
>>>                     mq->buffers = rte_zmalloc("bufs", sizeof(struct 
>>> rte_mbuf *) *
>>> -                                             (1 << mq->log2_ring_size), 0);
>>> +                                             (1 << (mq->log2_ring_size + 
>>> 1)), 0);
>>>                     if (mq->buffers == NULL)
>>>                             return -ENOMEM;
>>>             }
>>
> 
> Apologies for sending this multiple times, I'm not familiar with mailing 
> lists.
> 
> 

Reply via email to