On 2/7/2024 6:46 AM, Rahul Bhansali wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Tuesday, February 6, 2024 11:55 PM >> To: Rahul Bhansali <rbhans...@marvell.com>; dev@dpdk.org; Radu Nicolau >> <radu.nico...@intel.com>; Akhil Goyal <gak...@marvell.com>; Konstantin >> Ananyev <konstantin.anan...@huawei.com>; Anoob Joseph >> <ano...@marvell.com> >> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop >> >> External Email >> >> ---------------------------------------------------------------------- >> On 2/6/2024 12:38 PM, Rahul Bhansali wrote: >>> Single packet free using rte_pktmbuf_free_bulk() is dropping the >>> performance. On cn10k, maximum of ~4% drop observed for IPsec event >>> mode single SA outbound case. >>> >>> To fix this issue, single packet free will use rte_pktmbuf_free API. >>> >>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free") >>> >>> Signed-off-by: Rahul Bhansali <rbhans...@marvell.com> >>> --- >>> examples/ipsec-secgw/ipsec-secgw.h | 7 +++---- >>> 1 file changed, 3 insertions(+), 4 deletions(-) >>> >>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h >>> b/examples/ipsec-secgw/ipsec-secgw.h >>> index 8baab44ee7..ec33a982df 100644 >>> --- a/examples/ipsec-secgw/ipsec-secgw.h >>> +++ b/examples/ipsec-secgw/ipsec-secgw.h >>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb) } >>> >>> /* helper routine to free bulk of packets */ -static inline void >>> -free_pkts(struct rte_mbuf *mb[], uint32_t n) >>> +static __rte_always_inline void >>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n) >>> { >>> - rte_pktmbuf_free_bulk(mb, n); >>> - >>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n); >>> core_stats_update_drop(n); >>> } >>> >> >> Hi Rahul, >> >> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be improved by >> similar change? > > Hi Ferruh, > Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with > __rte_pktmbuf_free_seg_via_array() both inline then performance can be > improved similar. >
Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being inline, OK. As you are doing performance testing in that area, can you please check if '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I expect it to be inlined. If not, can you please test with force inlining it (__rte_always_inline)? And I wonder if bulk() API may get single mbuf is a common theme, does it makes sense add a new inline wrapper to library to cover this case, if it is bringing ~4% improvement, like: ``` static inline void rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) { if (n == 1) return rte_pktmbuf_free(mb[0]); return rte_pktmbuf_free_bulk(mb, n); } ```