On 2/22/2016 10:52 PM, Xie, Huawei wrote: > On 2/4/2016 1:24 AM, Olivier MATZ wrote: >> Hi, >> >> On 01/27/2016 02:56 PM, Panu Matilainen wrote: >>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of >>> the library ABI and should not be listed in the version map. >>> >>> I assume its inline for performance reasons, but then you lose the >>> benefits of dynamic linking such as ability to fix bugs and/or improve >>> itby just updating the library. Since the point of having a bulk API is >>> to improve performance by reducing the number of calls required, does it >>> really have to be inline? As in, have you actually measured the >>> difference between inline and non-inline and decided its worth all the >>> downsides? >> Agree with Panu. It would be interesting to compare the performance >> between inline and non inline to decide whether inlining it or not. > Will update after i gathered more data. inline could show obvious > performance difference in some cases.
Panu and Oliver: I write a simple benchmark. This benchmark run 10M rounds, in each round 8 mbufs are allocated through bulk API, and then freed. These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded). Btw, i have removed some exceptional data, the frequency of which is like 1/10. Sometimes observed user usage suddenly disappeared, no clue what happened. With 8 mbufs allocated, there is about 6% performance increase using inline. inline non-inline 2780738888 2950309416 2834853696 2951378072 2823015320 2954500888 2825060032 2958939912 2824499804 2898938284 2810859720 2944892796 2852229420 3014273296 2787308500 2956809852 2793337260 2958674900 2822223476 2954346352 2785455184 2925719136 2821528624 2937380416 2822922136 2974978604 2776645920 2947666548 2815952572 2952316900 2801048740 2947366984 2851462672 2946469004 With 16 mbufs allocated, we could still observe obvious performance difference, though only 1%-2% inline non-inline 5519987084 5669902680 5538416096 5737646840 5578934064 5590165532 5548131972 5767926840 5625585696 5831345628 5558282876 5662223764 5445587768 5641003924 5559096320 5775258444 5656437988 5743969272 5440939404 5664882412 5498875968 5785138532 5561652808 5737123940 5515211716 5627775604 5550567140 5630790628 5665964280 5589568164 5591295900 5702697308 With 32/64 mbufs allocated, the deviation of the data itself would hide the performance difference. So we prefer using inline for performance. >> Also, it would be nice to have a simple test function in >> app/test/test_mbuf.c. For instance, you could update >> test_one_pktmbuf() to take a mbuf pointer as a parameter and remove >> the mbuf allocation from the function. Then it could be called with >> a mbuf allocated with rte_pktmbuf_alloc() (like before) and with >> all the mbufs of rte_pktmbuf_alloc_bulk(). Don't quite get you. Is it that we write two cases, one case allocate mbuf through rte_pktmbuf_alloc_bulk and one use rte_pktmbuf_alloc? It is good to have. I could do this after this patch. >> >> Regards, >> Olivier >> >