------- Additional Comments From greenrd at greenrd dot org 2005-08-28 23:25 ------- memcmp (which is compiled for i686 in fedora because it is part of glibc) is actually less efficient than the current code on my athlon! I was so surprised, I ran the memcmp benchmark again, and the results differed by no more than +/-2%.
Here are the wallclock times in ms, followed by the advantage of block compare over the current code. n is the length of the strings tested. n | Current | block compare | memcmp | Advantage of block compare ------------------------------------------------------------------- 10 | 10717 | 9236 | 11957 | 16% 30 | 16427 | 14618 | 19884 | 12% 50 | 22181 | 17539 | 27550 | 26% 70 | 28052 | 20978 | 35243 | 34% 90 | 32966 | 24695 | 42815 | 33% 110 | 42975 | 28453 | 55036 | 51% All these tests were done on x86 with the same -O, -g and -f flags as make bootstrap uses by default, using LD_PRELOAD to "hot-replace" the code, and without the assertion enabled in the benchmark. The advantage of block compare rises to 54% for n=10 and 81% for n=110 if -march=athlon-xp is used (to compile both the original code and my block compare code). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23495