From: Anton Blanchard
> I noticed ksm spending quite a lot of time in memcmp on a large
> KVM box. The current memcmp loop is very unoptimised - byte at a
> time compares with no loop unrolling. We can do much much better.
> 
> Optimise the loop in a few ways:
> 
> - Unroll the byte at a time loop
> 
> - For large (at least 32 byte) comparisons that are also 8 byte
>   aligned, use an unrolled modulo scheduled loop using 8 byte
>   loads. This is similar to our glibc memcmp.
> 
> A simple microbenchmark testing 10000000 iterations of an 8192 byte
> memcmp was used to measure the performance:
> 
> baseline:     29.93 s
> 
> modified:      1.70 s
> 
> Just over 17x faster.

The unrolled loop (deleted) looks excessive.
On a modern cpu with multiple execution units you can usually
manage to get the loop overhead to execute in parallel to the
actual 'work'.
So I suspect that a much simpler 'word at a time' loop will be
almost as fast - especially in the case where the code isn't
already in the cache and the compare is relatively short.
Try something based on:
        a1 = *a++;
        b1 = *b++;
        while {
                a2 = *a++;
                b2 = *b++;
                if (a1 != a2)
                        break;
                a1 = *a++;
                b1 = *b++;
        } while (a2 != a1);

        David

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to