On 2/1/2024 3:55 AM, Long Li wrote: >>>> 'mbufs' is temporarily storage for allocated mbuf pointers, why not >>>> allocate if from stack instead, can be faster and easier to manage: >>>> "struct rte_mbuf *mbufs[count]" >>> >>> That would introduce a variable length array. >>> VLA's should be removed, they are not supported on Windows and many >>> security tools flag them. The problem is that it makes the code >>> brittle if count gets huge. >>> >>> But certainly regular calloc() or alloca() would work here. >>> >> >> Most of the existing bulk alloc already uses VLA but I can see the problem >> it is not >> being supported by Windows. >> >> As this mbuf pointer array is short lived within the function, and being in >> the fast >> path, I think continuous alloc and free can be prevented, >> >> one option can be to define a fixed size, big enough, array which requires >> additional loop for the cases 'count' size is bigger than array size, >> >> or an array can be allocated by driver init in device specific data ,as we >> know it >> will be required continuously in the datapath, and it can be freed during >> device >> close()/uninit(). >> >> I think an fixed size array from stack is easier and can be preferred. > > I sent a v3 of the patch, still using alloc(). > > I found two problems with using a fixed array: > 1. the array size needs to be determined in advance. I don't know what a good > number should be. If too big, some of them may be wasted. (and maybe make a > bigger mess of CPU cache) If too small, it ends up doing multiple > allocations, which is the problem this patch trying to solve. >
I think default burst size 32 can be used like below: struct rte_mbuf *mbufs[32]; loop: //use do {} while(); if you prefer n = min(32, count); rte_pktmbuf_alloc_bulk(mbufs, n); for (i = 0; i < n; i++) mana_post_rx_wqe(rxq, mbufs[i]); count -= n; if (count > 0) goto loop: This additional loop doesn't make code very complex (I think not more than additional alloc() & free()) and it doesn't waste memory. I suggest doing a performance measurement with above change, it may increase performance, afterwards if you insist to go with original code, we can do it. > 2. if makes the code slightly more complex ,but I think 1 is the main problem. > > I think another approach is to use VLA by default, but for Windows use > alloc(). > > Long