Hi Ferruh,
Apologies for the late response.
I've run some performance tests for the two proposed solutions.
In the tables below, the rte_memcpy results correspond to this patch.
The 2xpktmbuf_alloc results correspond to the other proposed solution.
bash commands:
server# ./dpdk-testpmd --vdev=net_memif0,id=1,role=server,bsize=1024,rsize=8
--single -l --file=test1 -- --nb-cores --txq
--rxq --burst -i
client# ./dpdk-testpmd
--vdev=net_memif0,id=1,role=client,bsize=1024,rsize=8,zero-copy=yes --single
-l --file=test2 -- --nb-cores --txq --rxq
--burst -i
testpmd commands:
client:
testpmd> start
server:
testpmd> start tx_first
CPU: AMD EPYC 7713P
RAM: DDR4-3200
OS: Debian 12
DPDK: 22.11.1
SERVER_CORES=72,8,9,10,11
CLIENT_CORES=76,12,13,14,15
Results:
==
| | 1 CORE | 2 CORES| 4 CORES |
==
| unpatched burst=32 | 9.95 Gbps | 19.24 Gbps | 36.4 Gbps |
--
| 2xpktmbuf_alloc burst=32 | 9.86 Gbps | 18.88 Gbps | 36.6 Gbps |
--
| 2xpktmbuf_alloc burst=31 | 9.17 Gbps | 18.69 Gbps | 35.1 Gbps |
--
| rte_memcpy burst=32 | 9.54 Gbps | 19.10 Gbps | 36.6 Gbps |
--
| rte_memcpy burst=31 | 9.39 Gbps | 18.53 Gbps | 35.5 Gbps |
==
CPU: Intel Core i7-14700HX
RAM: DDR5-5600
OS: Ubuntu 24.04.1
DPDK: 23.11.1
SERVER_CORES=0,1,3,5,7
CLIENT_CORES=8,9,11,13,15
Results:
==
| | 1 CORE | 2 CORES| 4 CORES |
==
| unpatched burst=32 | 15.52 Gbps | 27.35 Gbps | 46.8 Gbps |
--
| 2xpktmbuf_alloc burst=32 | 15.49 Gbps | 27.68 Gbps | 46.4 Gbps |
--
| 2xpktmbuf_alloc burst=31 | 14.98 Gbps | 26.75 Gbps | 45.2 Gbps |
--
| rte_memcpy burst=32 | 15.99 Gbps | 28.44 Gbps | 49.3 Gbps |
--
| rte_memcpy burst=31 | 14.85 Gbps | 27.32 Gbps | 46.3 Gbps |
==
On 19/07/2024 12:03, Ferruh Yigit wrote:
> On 7/8/2024 12:45 PM, Ferruh Yigit wrote:
>> On 7/8/2024 4:39 AM, Mihai Brodschi wrote:
>>>
>>>
>>> On 07/07/2024 21:46, Mihai Brodschi wrote:
On 07/07/2024 18:18, Mihai Brodschi wrote:
>
>
> On 07/07/2024 17:05, Ferruh Yigit wrote:
>>
>> My expectation is numbers should be like following:
>>
>> Initially:
>> size = 256
>> head = 0
>> tail = 0
>>
>> In first refill:
>> n_slots = 256
>> head = 256
>> tail = 0
>>
>> Subsequent run that 32 slots used:
>> head = 256
>> tail = 32
>> n_slots = 32
>> rte_pktmbuf_alloc_bulk(mq, buf[head & mask], n_slots);
>> head & mask = 0
>> // So it fills first 32 elements of buffer, which is inbound
>>
>> This will continue as above, combination of only gap filled and head
>> masked with 'mask' provides the wrapping required.
>
> If I understand correctly, this works only if eth_memif_rx_zc always
> processes
> a number of packets which is a power of 2, so that the ring's head always
> wraps
> around at the end of a refill loop, never in the middle of it.
> Is there any reason this should be the case?
> Maybe the tests don't trigger the crash because this condition holds true
> for them?
Here's how to reproduce the crash on DPDK stable 23.11.1, using testpmd:
Server:
# ./dpdk-testpmd --vdev=net_memif0,id=1,role=server,bsize=1024,rsize=8
--single-file-segments -l2,3 --file-prefix test1 -- -i
Client:
# ./dpdk-testpmd
--vdev=net_memif0,id=1,role=client,bsize=1024,rsize=8,zero-copy=yes
--single-file-segments -l4,5 --file-prefix test2 -- -i
testpmd> start
Server:
testpmd> start tx_first
testpmt> set burst 15
At this point, the client crashes with a segmentation fault.
Before the burst is set to 15, its default value is 32.
If the receiver processes packets in bursts of size 2^N, the crash does
not occur.
Setting the burst size to any power of 2 works, anything else crashes.
After applying this patch, the crashes are completely gone.
>>>
>>> Sorry, this might not crash with a segmentation fault. To confirm the
>>> mempool is
>>> corrupted, please comp