Jeff, This may be more a bug report with respect to PETSc then with respect to Intel compilers. If we see this in a variety of routines I'll send you some details.
Barry > On Sep 28, 2016, at 9:43 PM, Jeff Hammond <jeff.scie...@gmail.com> wrote: > > If there is a minimal performance oriented test of this function, I can ask > the compiler team to study it w.r.t. unrolling heuristics. > > Jeff > > On Wednesday, September 28, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > Mr Hong Zhang has found that removing the manual unrolling from > MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump in > performance on KNL and pointed me to the Intel gospel > https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling which > we've always ignored in the past. It would be good try the unrolled and > non-unrolled also on Xeon. > > We've never done a good job of managing our unrolling, where, how and when > we do it and macros for unrolling such as PetscSparseDensePlusDot. Intel > would say just throw it all away. > > Barry > > > > > > -- > Jeff Hammond > jeff.scie...@gmail.com > http://jeffhammond.github.io/