Walter Bright wrote:
retard wrote:
 > There are no arch specific optimizations for PIII, Pentium 4, Pentium D,
Core, Core 2, Core i7, Core i7 2600K, and similar kinds of products from
AMD.

The optimal instruction sequences varied dramatically on those earlier processors, but not so much at all on the later ones. Reading the latest Intel/AMD instruction set references doesn't even provide that information anymore.

In particular, instruction scheduling no longer seems to matter, except for the Intel Atom, which benefits very much from Pentium style instruction scheduling. Ironically, dmc++ is the only available current compiler which supports that.

In hand-coded asm, instruction scheduling still gives more than half of the same benefit that it used to do. But, it's become ten times more difficult. You have to use Agner Fog's manuals, not Intel/AMD.

For example:
(1) a common bottleneck on all Intel processors, is that you can only read from three registers per cycle, but you can also read from any register which has been modified in the last three cycles.
(2) it's important to break dependency chains.

On the BigInt code, instruction scheduling gave a speedup of ~40%.

But still, cache effects are more important than instruction scheduling in 99% of cases.

No mention of auto-vectorization

dmc doesn't do auto-vectorization. I agree that's an issue.



 > or whole program

I looked into that, there's not a lot of oil in that well.


> and instruction level optimizations the very latest GCC and LLVM are now slowly adopting.

Huh? Every compiler in existence has done, and always has done, instruction level optimizations.


Note: a lot of modern compilers expend tremendous effort optimizing access to global variables (often screwing up multithreaded code in the process). I've always viewed this as a crock, since modern programming style eschews globals as much as possible.

Reply via email to