------- Comment #41 from whaley at cs dot utsa dot edu 2006-08-07 17:19 ------- Paolo,
>Actually, the peephole phase may not change the register usage, but it >could peruse a scratch register if available. But it would be much more >controversial (even if backed by your hard numbers on ATLAS) to state >that splitting fmul[sl] to fld[sl]+fmul is always beneficial, unless We'll have to see how this is in x87 code. I have experience with it in SSE, where doing it is fully a target issue. For instance, the P4E likes you to avoid the explicit load on the end, where the Hammer prefers the explicit load. If I recall right, there is a *slight* advantage on the intel to the from-mem instruction, but I can't remember how much difference doing the separate load/use made on the AMD. We should get some idea by comparing gcc3 vs. your patched compiler on the various platforms, though other gcc3/4 changes will cloud the picture somewhat . . . If this kind of machine difference in optimality holds true for x87 as well, I assume a new peephole phase that looks for the scratch register could be called if the appropriate -march were thrown? Speaking of -march issues, when I get a compiler build that gens your new code, I will pull the assembly trick to try it on the CoreDuo as well. If the new code is worse, you can probably not call your present peephole if that -march is thrown? Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827