On 2/1/2024 3:55 AM, Long Li wrote:
>>>> 'mbufs' is temporarily storage for allocated mbuf pointers, why not
>>>> allocate if from stack instead, can be faster and easier to manage:
>>>> "struct rte_mbuf *mbufs[count]"
>>>
>>> That would introduce a variable length array.
>>> VLA's should be removed, they are not supported on Windows and many
>>> security tools flag them. The problem is that it makes the code
>>> brittle if count gets huge.
>>>
>>> But certainly regular calloc() or alloca() would work here.
>>>
>>
>> Most of the existing bulk alloc already uses VLA but I can see the problem 
>> it is not
>> being supported by Windows.
>>
>> As this mbuf pointer array is short lived within the function, and being in 
>> the fast
>> path, I think continuous alloc and free can be prevented,
>>
>> one option can be to define a fixed size, big enough, array which requires
>> additional loop for the cases 'count' size is bigger than array size,
>>
>> or an array can be allocated by driver init in device specific data ,as we 
>> know it
>> will be required continuously in the datapath, and it can be freed during 
>> device
>> close()/uninit().
>>
>> I think an fixed size array from stack is easier and can be preferred.
> 
> I sent a v3 of the patch, still using alloc().
> 
> I found two problems with using a fixed array:
> 1. the array size needs to be determined in advance. I don't know what a good 
> number should be. If too big, some of them may be wasted. (and maybe make a 
> bigger mess of CPU cache) If too small, it ends up doing multiple 
> allocations, which is the problem this patch trying to solve.
>

I think default burst size 32 can be used like below:

struct rte_mbuf *mbufs[32];

loop: //use do {} while(); if you prefer
n = min(32, count);
rte_pktmbuf_alloc_bulk(mbufs, n);
for (i = 0; i < n; i++)
        mana_post_rx_wqe(rxq, mbufs[i]);
count -= n;
if (count > 0) goto loop:


This additional loop doesn't make code very complex (I think not more
than additional alloc() & free()) and it doesn't waste memory.
I suggest doing a performance measurement with above change, it may
increase performance,
afterwards if you insist to go with original code, we can do it.


> 2. if makes the code slightly more complex ,but I think 1 is the main problem.
> 
> I think another approach is to use VLA by default, but for Windows use 
> alloc().
> 
> Long

Reply via email to