On Wed, May 23, 2018 at 09:47:32AM +0200, Christophe Leroy wrote: > At the time being, memcmp() compares two chunks of memory > byte per byte. > > This patch optimises the comparison by comparing word by word. > > A small benchmark performed on an 8xx comparing two chuncks > of 512 bytes performed 100000 times gives: > > Before : 5852274 TB ticks > After: 1488638 TB ticks
> diff --git a/arch/powerpc/lib/string_32.S b/arch/powerpc/lib/string_32.S > index 40a576d56ac7..542e6cecbcaf 100644 > --- a/arch/powerpc/lib/string_32.S > +++ b/arch/powerpc/lib/string_32.S > @@ -16,17 +16,45 @@ > .text > > _GLOBAL(memcmp) > - cmpwi cr0, r5, 0 > - beq- 2f > - mtctr r5 > - addi r6,r3,-1 > - addi r4,r4,-1 > -1: lbzu r3,1(r6) > - lbzu r0,1(r4) > - subf. r3,r0,r3 > - bdnzt 2,1b > + srawi. r7, r5, 2 /* Divide len by 4 */ > + mr r6, r3 > + beq- 3f > + mtctr r7 > + li r7, 0 > +1: > +#ifdef __LITTLE_ENDIAN__ > + lwbrx r3, r6, r7 > + lwbrx r0, r4, r7 > +#else > + lwzx r3, r6, r7 > + lwzx r0, r4, r7 > +#endif You don't test whether the pointers are word-aligned. Does that work? Say, when a load is crossing a page boundary, or segment boundary. Segher