On Wed, Sep 28, 2016 at 1:40 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> > > Mr Hong Zhang has found that removing the manual unrolling from > MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump > in performance on KNL and pointed me to the Intel gospel > https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling > which we've always ignored in the past. It would be good try the unrolled > and non-unrolled also on Xeon. > > We've never done a good job of managing our unrolling, where, how and > when we do it and macros for unrolling such as PetscSparseDensePlusDot. > Intel would say just throw it all away. > Talking to some colleagues at Intel, the only time they do manual unrolling is for cases with nice unit-stride accesses and in which they are using Intel intrinsic instructions. Otherwise, it is best to rely on the compiler to do this. If you know a really good reason that a particular unrolling factor should be used, you can suggest it to the compiler with "#pragma unroll (n)". My guess is that, with the Intel compiler, at least, we are better off letting it do the unrolling. I'm not sure about other compilers out there. --Richard