Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop

Ferruh Yigit Fri, 09 Feb 2024 05:51:47 -0800

On 2/9/2024 1:10 PM, Rahul Bhansali wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <[email protected]>
>> Sent: Wednesday, February 7, 2024 4:06 PM
>> To: Rahul Bhansali <[email protected]>; [email protected]; Radu Nicolau
>> <[email protected]>; Akhil Goyal <[email protected]>; Konstantin
>> Ananyev <[email protected]>; Anoob Joseph
>> <[email protected]>
>> Subject: Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance 
>> drop
>>
>> On 2/7/2024 6:46 AM, Rahul Bhansali wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <[email protected]>
>>>> Sent: Tuesday, February 6, 2024 11:55 PM
>>>> To: Rahul Bhansali <[email protected]>; [email protected]; Radu
>>>> Nicolau <[email protected]>; Akhil Goyal <[email protected]>;
>>>> Konstantin Ananyev <[email protected]>; Anoob Joseph
>>>> <[email protected]>
>>>> Subject: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec
>>>> performance drop
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> - On 2/6/2024 12:38 PM, Rahul Bhansali wrote:
>>>>> Single packet free using rte_pktmbuf_free_bulk() is dropping the
>>>>> performance. On cn10k, maximum of ~4% drop observed for IPsec event
>>>>> mode single SA outbound case.
>>>>>
>>>>> To fix this issue, single packet free will use rte_pktmbuf_free API.
>>>>>
>>>>> Fixes: bd7c063561b3 ("examples/ipsec-secgw: use bulk free")
>>>>>
>>>>> Signed-off-by: Rahul Bhansali <[email protected]>
>>>>> ---
>>>>>  examples/ipsec-secgw/ipsec-secgw.h | 7 +++----
>>>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> index 8baab44ee7..ec33a982df 100644
>>>>> --- a/examples/ipsec-secgw/ipsec-secgw.h
>>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.h
>>>>> @@ -229,11 +229,10 @@ free_reassembly_fail_pkt(struct rte_mbuf *mb)
>>>>> }
>>>>>
>>>>>  /* helper routine to free bulk of packets */ -static inline void
>>>>> -free_pkts(struct rte_mbuf *mb[], uint32_t n)
>>>>> +static __rte_always_inline void
>>>>> +free_pkts(struct rte_mbuf *mb[], const uint32_t n)
>>>>>  {
>>>>> - rte_pktmbuf_free_bulk(mb, n);
>>>>> -
>>>>> + n == 1 ? rte_pktmbuf_free(mb[0]) : rte_pktmbuf_free_bulk(mb, n);
>>>>>   core_stats_update_drop(n);
>>>>>  }
>>>>>
>>>>
>>>> Hi Rahul,
>>>>
>>>> Do you think the 'rte_pktmbuf_free_bulk()' API performance can be
>>>> improved by similar change?
>>>
>>> Hi Ferruh,
>>> Currently 'rte_pktmbuf_free_bulk() is not inline. If we make that along with
>> __rte_pktmbuf_free_seg_via_array()  both inline then performance can be
>> improved similar.
>>>
>>
>> Ah, so performance improvement is coming from 'rte_pktmbuf_free()' being
>> inline, OK.
>>
>> As you are doing performance testing in that area, can you please check if
>> '__rte_pktmbuf_free_seg_via_array()' is inlined, as it is static function I 
>> expect it
>> to be inlined. If not, can you please test with force inlining it
>> (__rte_always_inline)?
> It was not inline, did check with force inline also and no impact with this, 
> so I can make it force inline.
>


If there is no performance improvement, I think no need to force inline
'__rte_pktmbuf_free_seg_via_array()'.

>>
>>
>> And I wonder if bulk() API may get single mbuf is a common theme, does it 
>> makes
>> sense add a new inline wrapper to library to cover this case, if it is 
>> bringing ~4%
>> improvement, like:
>> ```
>> static inline void
>> rte_pktmbuf_free_bulk_or_one(... **mb, unsigned int n) {
>>      if (n == 1)
>>              return rte_pktmbuf_free(mb[0]);
>>      return rte_pktmbuf_free_bulk(mb, n);
>> }
> Agree, can make this wrapper to cover a case where bulk free API is called 
> but might have single mbuf to get better perf. It can be further optimize " 
> if (n == 1)" with compile time constant check,
> ```
> static inline void
> rte_pktmbuf_free_bulk_or_one(struct rte_mbuf **mb, unsigned int n)
> {
>        if (__builtin_constant_p(n) && (n == 1))
>                rte_pktmbuf_free(mb[0]);
>        else
>                rte_pktmbuf_free_bulk(mb, n);
> }
> ```
> Let me know if it is fine. I'll send v2. And, this will be " 
> __rte_experimental" right ?
>

Compile time constant check can prevent penalty from additional check,
which is good, and I can see this can work for the examples/ipsec-secgw
usecase above, which has some hardcoded single mbuf free calls.

But most of the other usecases I think 'n' won't be known in compile
time, so API will be effectively same as free_bulk().

If you have it with runtime check, do you still observe any performance
improvement? If not perhaps we can go only with example code update,
without new API.

Re: [EXT] Re: [PATCH] examples/ipsec-secgw: fix IPsec performance drop

Reply via email to