Re: G++ could optimize ASM code more

Marc Glisse Wed, 09 May 2012 15:16:54 -0700

On Wed, 9 May 2012, Daniel Marschall wrote:

I could sucessfully do a benchmark of my code. I found out that theno-typecast-version (imull+movslq) needed 47 secs for 12 working packages,while the typecast-version (imulq) needed only 38 secs per 12 workingpackages. That is incredible!
Maybe you should still consider preferring imulq instead of imull+movslq ?
I wonder if GCC has an optimization which optimizes the machine code itself,without knowledge of the underlaying C code, e.g. it could eliminateunnecessary mov commands if a register is not used resp. using operationswhich do have lower latency. I think such an "assembler-only" optimizationstill can get additional performance since the rules of the underlayingprogramming language (e.g. the expansion to signed int) can be ignored if theend-result is the same. But I fear that this is rather a hard task and maybenot possible.

A lot of optimizations in gcc completely ignore the original code. At thertl level, you could try matching:


(set (reg:SI 1) (zero_extend:SI (match_operand:QI 4))
(set (reg:SI 2) (zero_extend:SI (match_operand:QI 3))
(set (reg:SI 5) (mult:SI (match_dup 1) (match_dup 2)))
(set (reg:DI 6) (sign_extend:DI (match_dup 5)))

and replacing it with your version that zero-extends to DI and does themultiplication there.


--
Marc Glisse

Re: G++ could optimize ASM code more

Reply via email to