From: Anton Blanchard > I noticed ksm spending quite a lot of time in memcmp on a large > KVM box. The current memcmp loop is very unoptimised - byte at a > time compares with no loop unrolling. We can do much much better. > > Optimise the loop in a few ways: > > - Unroll the byte at a time loop > > - For large (at least 32 byte) comparisons that are also 8 byte > aligned, use an unrolled modulo scheduled loop using 8 byte > loads. This is similar to our glibc memcmp. > > A simple microbenchmark testing 10000000 iterations of an 8192 byte > memcmp was used to measure the performance: > > baseline: 29.93 s > > modified: 1.70 s > > Just over 17x faster.
The unrolled loop (deleted) looks excessive. On a modern cpu with multiple execution units you can usually manage to get the loop overhead to execute in parallel to the actual 'work'. So I suspect that a much simpler 'word at a time' loop will be almost as fast - especially in the case where the code isn't already in the cache and the compare is relatively short. Try something based on: a1 = *a++; b1 = *b++; while { a2 = *a++; b2 = *b++; if (a1 != a2) break; a1 = *a++; b1 = *b++; } while (a2 != a1); David _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev