On Fri, May 2, 2014 at 6:20 AM, Alan Modra <amo...@gmail.com> wrote: > In cases where the compiler has no alignment info, powerpc64le-linux > gcc generates byte at a time copies for -mstrict-align (which is on > for little-endian power7). That's awful code, a problem shared by > other strict-align targets, see pr50417. However, we also have a case > when -mno-strict-align generates less than ideal code, which I believe > stems from using alignment as a proxy for testing an address offset. > See http://gcc.gnu.org/ml/gcc-patches/1999-09n/msg01072.html. > > So my first attempt at fixing this problem looked at address offsets > directly. That worked fine too, but on thinking some more, I believe > we no longer have the movdi restriction. Nowadays we'll reload the > address if we have an offset that doesn't satisfy the "Y" constraint > (ie. a multiple of 4 offset). Which led to this simpler patch. > Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux > and powerpc-linux. OK to apply?
Hi, Alan Thanks for finding and addressing this. As you mention, recent server-class processors, at least POWER8, do not have the performance degradation for common, mis-aligned loads and stores of wider modes. But the patch should not impose this default on the large, installed based of processors, where mis-aligned loads can be a severe performance penalty. This heuristic has become processor-dependent and should not be hard-coded in the block_move and block_clear algorithms. PROCESSOR_DEFAULT is POWER8 for ELFv2 (and should be updated as the default for PowerLinux in general). Please update the patch to test rs6000_cpu, probably another boolean flag set in rs6000_option_override_internal(). Because of the processor defaults, the preferred instruction sequence will be the default without encoding an assumption about the heuristics in the algorithm itself. Thanks, David