On Fri, 28 Jun 2019 15:16:30 +0100 "Medvedkin, Vladimir" <vladimir.medved...@intel.com> wrote:
> Hi Honnappa, > > On 28/06/2019 14:57, Honnappa Nagarahalli wrote: > >> Hi all, > >> > >> On 28/06/2019 05:34, Stephen Hemminger wrote: > >>> On Fri, 28 Jun 2019 02:44:54 +0000 > >>> "Ruifeng Wang (Arm Technology China)"<ruifeng.w...@arm.com> wrote: > >>> > >>>>>> Tests showed that the function inlining caused performance drop on > >>>>>> some x86 platforms with the memory ordering patches applied. > >>>>>> By force no-inline functions, the performance was better than > >>>>>> before on x86 and no impact to arm64 platforms. > >>>>>> > >>>>>> Suggested-by: Medvedkin Vladimir<vladimir.medved...@intel.com> > >>>>>> Signed-off-by: Ruifeng Wang<ruifeng.w...@arm.com> > >>>>>> Reviewed-by: Gavin Hu<gavin...@arm.com> > >>>>> { > >>>>> > >>>>> Do you actually need to force noinline or is just taking of inline > >>>>> enough? > >>>>> In general, letting compiler decide is often best practice. > >>>> The force noinline is an optimization for x86 platforms to keep > >>>> rte_lpm_add() API performance with memory ordering applied. > >>> I don't think you answered my question. What does a recent version of > >>> GCC do if you drop the inline. > >>> > >>> Actually all the functions in rte_lpm should drop inline. > >> I'm agree with Stephen. If it is not a fastpath and size of function is not > >> minimal it is good to remove inline qualifier for other control plane > >> functions > >> such as rule_add/delete/find/etc and let the compiler decide to inline it > >> (unless it affects performance). > > IMO, the rule needs to be simple. If it is control plane function, we > > should leave it to the compiler to decide. I do not think we need to worry > > too much about performance for control plane functions. > Control plane is not as important as data plane speed but it is still > important. For lpm we are talking not about initialization, but runtime > routes add/del related functions. If it is very slow the library will be > totally unusable because after it receives a route update it will be > blocked for a long time and route update queue would overflow. Control plane performance is more impacted by algorithmic choice. The original LPM had terrible (n^2?) control path. Current code is better. I had a patch using RB tree, but it was rejected because it used the /usr/include/bsd/sys/tree.h which added a dependency.