------- Additional Comments From dank at kegel dot com  2005-06-18 22:45 -------
I asked the fellow who posted the original problem report to give
me the results of 'cat /proc/cpuinfo' on the affected machine.
Here it is:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 10
cpu MHz         : 896.153

This is the same as one of the two affected CPU types here.

The slow routine appears to be the buffer cleaning routine,
though I haven't verified this with oprofile yet.
Here's its loop:
static char cleanse_ctr;
...
    while (len--) {
        *(ptr++) = cleanse_ctr;
        cleanse_ctr += (17 + (unsigned char) ((int) ptr & 0xF));
    }
and the output of -O3 -fPIC for both gcc-2.95.3 and gcc-4.0.0:

--- gcc-2.95.3 ---
.L5:    
        movl [EMAIL PROTECTED](%ebx),%edi
        movb (%edi),%al
        movb %al,(%edx)
        incl %edx
        movb (%edi),%cl
        addb $17,%cl
        movb %dl,%al
        andb $15,%al
        addb %al,%cl
        movb %cl,(%edi)
        subl $1,%esi
        jnc .L5
.L4:

--- gcc-4 ---    
.L4:    
        movb    (%esi), %al
        movb    %al, (%edx)
        leal    (%ecx,%edi), %eax
        andl    $15, %eax
        incl    %ecx
        addb    (%esi), %al
        incl    %edx
        addl    $17, %eax
        cmpl    %ecx, 12(%ebp)
        movb    %al, (%esi)
        jne     .L4

It's not obvious to me why the gcc-4.0.0 generated code
should be slower when run on some CPUs, if in fact it is.
Is it the fact that the loop condition is checked with
a cmp against memory instead of a flag being set by subtracting
1 from a register?

(And where's the best place to learn about how to predict
how long assembly snippets like this will take to run
on various modern CPUs, anyway?)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19923

Reply via email to