Jeff,

    This may  be more a bug report with respect to PETSc then with respect to 
Intel compilers. If we see this in a variety of routines I'll send you some 
details.

   Barry

> On Sep 28, 2016, at 9:43 PM, Jeff Hammond <jeff.scie...@gmail.com> wrote:
> 
> If there is a minimal performance oriented test of this function, I can ask 
> the compiler team to study it w.r.t. unrolling heuristics.
> 
> Jeff 
> 
> On Wednesday, September 28, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
> 
> 
>    Mr Hong Zhang has found that removing the manual unrolling from 
> MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump in 
> performance on KNL and pointed me to the Intel gospel 
> https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling which 
> we've always ignored in the past. It would be good try the unrolled and 
> non-unrolled also on Xeon.
> 
>    We've never done a good job of managing our unrolling, where, how and when 
> we do it and macros for unrolling such as PetscSparseDensePlusDot. Intel 
> would say just throw it all away.
> 
>    Barry
> 
> 
> 
> 
> 
> -- 
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/

Reply via email to