On Wed, Sep 28, 2016 at 1:40 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>
>    Mr Hong Zhang has found that removing the manual unrolling from
> MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump
> in performance on KNL and pointed me to the Intel gospel
> https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling
> which we've always ignored in the past. It would be good try the unrolled
> and non-unrolled also on Xeon.
>
>    We've never done a good job of managing our unrolling, where, how and
> when we do it and macros for unrolling such as PetscSparseDensePlusDot.
> Intel would say just throw it all away.
>

Talking to some colleagues at Intel, the only time they do manual unrolling
is for cases with nice unit-stride accesses and in which they are using
Intel intrinsic instructions.  Otherwise, it is best to rely on the
compiler to do this.  If you know a really good reason that a particular
unrolling factor should be used, you can suggest it to the compiler with
"#pragma unroll (n)".

My guess is that, with the Intel compiler, at least, we are better off
letting it do the unrolling.  I'm not sure about other compilers out there.

--Richard

Reply via email to