4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

whaley at cs dot utsa dot edu Mon, 07 Aug 2006 10:19:41 -0700


------- Comment #41 from whaley at cs dot utsa dot edu  2006-08-07 17:19 -------
Paolo,


>Actually, the peephole phase may not change the register usage, but it
>could peruse a scratch register if available.  But it would be much more
>controversial (even if backed by your hard numbers on ATLAS) to state
>that splitting fmul[sl] to fld[sl]+fmul is always beneficial, unless

We'll have to see how this is in x87 code.  I have experience with it in SSE,
where doing it is fully a target issue.  For instance, the P4E likes you to
avoid the explicit load on the end, where the Hammer prefers the explicit load.
 If I recall right, there is a *slight* advantage on the intel to the from-mem
instruction, but I can't remember how much difference doing the separate
load/use made on the AMD.  We should get some idea by comparing gcc3 vs. your
patched compiler on the various platforms, though other gcc3/4 changes will
cloud the picture somewhat . . .

If this kind of machine difference in optimality holds true for x87 as well, I
assume a new peephole phase that looks for the scratch register could be called
if the appropriate -march were thrown?

Speaking of -march issues, when I get a compiler build that gens your new code,
I will pull the assembly trick to try it on the CoreDuo as well.  If the new
code is worse, you can probably not call your present peephole if that -march
is thrown?

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

Reply via email to