On Fri, May 22, 2015 at 02:32:42PM -0500, Scott Wood wrote: > > I'd also have thought that the 64bit C version above would be generally > > 'good'. > > It doesn't generate the addc/addze sequence. At least with GCC 4.8.2, > it does something like: > > mr tmp0, csum > li tmp1, 0 > li tmp2, 0 > addc tmp3, addend, tmp0 > adde csum, tmp2, tmp1 > add csum, csum, tmp3
Right. Don't expect older compilers to do sane things here. All this begs a question... If it is worth spending so much time micro-optimising this, why not pick the low-hanging fruit first? Having a 32-bit accumulator for ones' complement sums, on a 64-bit system, is not such a great idea. Segher _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev