Hi Andrew, > Experiments with the netperf benchmark indicated that the size > selecting VMX-based copies in __copy_tofrom_user_power7() was > suboptimal on POWER8. Measurements showed that parity was in the > neighbourhood of 3328 bytes, rather than greater than 4096. The > change gives a 1.5-2.0% improvement in performance for 4096-byte > buffers, reducing the relative time spent in > __copy_tofrom_user_power7() from approximately 7% to approximately 5% > in the TCP_RR benchmark.
Nice work! All our context switch optimisations we've made over the last year has likely moved the break even point for this. Acked-by: Anton Blanchard <an...@samba.org> Anton > Signed-off-by: Andrew Jeffery <and...@aj.id.au> > --- > arch/powerpc/lib/copyuser_power7.S | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/copyuser_power7.S > b/arch/powerpc/lib/copyuser_power7.S index a24b4039352c..706b7cc19846 > 100644 --- a/arch/powerpc/lib/copyuser_power7.S > +++ b/arch/powerpc/lib/copyuser_power7.S > @@ -82,14 +82,14 @@ > _GLOBAL(__copy_tofrom_user_power7) > #ifdef CONFIG_ALTIVEC > cmpldi r5,16 > - cmpldi cr1,r5,4096 > + cmpldi cr1,r5,3328 > > std r3,-STACKFRAMESIZE+STK_REG(R31)(r1) > std r4,-STACKFRAMESIZE+STK_REG(R30)(r1) > std r5,-STACKFRAMESIZE+STK_REG(R29)(r1) > > blt .Lshort_copy > - bgt cr1,.Lvmx_copy > + bge cr1,.Lvmx_copy > #else > cmpldi r5,16 >