Re: Another epic optimiser failure

Nicholas Vinson via Gcc Mon, 29 May 2023 16:44:57 -0700

On 5/29/23 15:01, Dave Blanchard wrote:

He's certainly got a few things wrong from time to time in his zeal, but his overall 
point seems to stand. Do you have any rebuttals of his argument to present yourself? Or 
do you prefer to just sit back and wait on "y'all" to do the heavy lifting?

He's gotten many details wrong including the proper flags to set for gcc(and the "bad documentation" does not justify all the errors he's made),his hand-generated assembly (I've personally pointed out logic errors inhis assembly on more than on occasion), and has failed to provideevidence that his solutions are better.

In almost all of his examples, he uses -O3 which is basically the "speedabove all else" optimization level. I pointed this out before; I alsopointed out that the smallest code (in bytes) with the fewestinstructions is not always the fastest. He has not provided any datashowing that his solutions result in faster executing code than what gccproduces. He has also raised questions that show a distinct lack ofunderstanding when it comes to storage hierarchy; something I feel onewould need to know to properly write fast assembly. Finally, I willadmit some of the examples of gcc produced code are a bit suspicious,and probably should be reviewed.

In short Stefan is not being taken seriously because he is notpresenting himself, or his arguments, in a manner that would convincepeople to take him seriously. As long as Stefan continues to communicatein such a manner, we're going to see similar such responses from (someof) the gcc devs (unfortunately).

The best next steps for Stefan, would be to review the constructivecriticism, expand on his examples by providing explanation and proof asto why they're better, and then present these updated findings in theproper manner.


Using his first example as my own, take the C code:

        int ispowerof2(unsigned long long argument)
        {
                return (argument & argument - 1) == 0;
        }

when compiled produces:

% gcc -m32 -O3 -c ispowerof2.c && objdump -d -Mintel ispowerof2.o

ispowerof2.o:     file format elf32-i386

Disassembly of section .text:

        00000000 <ispowerof2>:
           0:   f3 0f 7e 4c 24 04       movq   xmm1,QWORD PTR [esp+0x4]
           6:   66 0f 76 c0             pcmpeqd xmm0,xmm0
           a:   66 0f d4 c1             paddq  xmm0,xmm1
           e:   66 0f db c1             pand   xmm0,xmm1
          12:   66 0f 7e c2             movd   edx,xmm0
          16:   66 0f 73 d0 20          psrlq  xmm0,0x20
          1b:   66 0f 7e c0             movd   eax,xmm0
          1f:   09 c2                   or     edx,eax
          21:   0f 94 c0                sete   al
          24:   0f b6 c0                movzx  eax,al
          27:   c3                      ret

Whereas he claims the following is better:

        movq    xmm1, [esp+4]
        pcmpeqd xmm0, xmm0
        paddq   xmm0, xmm1
        pand    xmm0, xmm1
        pxor    xmm1, xmm1
        pcmpeqb xmm0, xmm1
        pmovmskb eax, xmm0
        cmp     al, 255
        sete    al
        ret

because it has 10 instructions and is 36 bytes long vs the 11instructions and 40 bytes. However, the rebuttals are 1. his code iswrong (can return values other than 0 or 1) and 2. -O3 doesn't optimizeon instruction count or byte size (as an aside: clang's output uses 14instructions but is only 32 bytes in size -- is it better or worse thangcc's?).

Therefore, while he's 1 instruction less and 4 bytes fewer (1 byte fewerif you add the needed correction), he presents no evidence that hissolution is actually faster. What he would need to do instead is showproof that his solution is indeed faster than what gcc produces.

Afterwards, he would be in a position to represent this data in a propermanner.

Re: Another epic optimiser failure

Reply via email to