On Wed, Aug 31, 2005 at 11:15:51PM -0700, Godfrey DiGiorgi wrote: > > The difference is that the 30-50% gains with the automated tools > might mean more value for large bodies of code, where the number of > code sections that can be optimized to the maximum extent by hand > tend to be a lot smaller. As with all optimization strategies, one > has to pick and choose very carefully not only what to optimize but > how to do the optimization to obtain the maximum benefit and > performance per development dollar.
Another point to consider is that those table-driven automated tools can easily be modified when a new model of the CPU is released, with a slightly different set of costs, while hand-optimizing can often be a matter of going back to square one. My understanding (gleaned from a buddy of mine who's a senior engineer in the Intel compiler group) is that nowadays a really good optimising compiler can get within a factor of two or three of the best hand coding can do, and in most cases efficient use of CPU cycles is irrelevant; data access time dominates. But for certain cases (such as applying filter kernels to a large image) really good cycle-optimized hand-written code can realize that 3x over the best a current compiler can do. If that's where a program spends most of its time, the perceived benefit is enormous. For more linear programs, nowadays, it's seldom worth the effort of hand coding; it takes a really good coder some serious effort before being able to even match what a good globally-optimising auto-parallelising compiler can do, and in any case the difference between taking 125 or 150 microseconds just isn't noticeable.