Mr Hong Zhang has found that removing the manual unrolling from 
MatMult_SeqAIJ_Inode() (at least with inode size 2) results in a good bump in 
performance on KNL and pointed me to the Intel gospel 
https://software.intel.com/en-us/articles/avoid-manual-loop-unrolling which 
we've always ignored in the past. It would be good try the unrolled and 
non-unrolled also on Xeon.

   We've never done a good job of managing our unrolling, where, how and when 
we do it and macros for unrolling such as PetscSparseDensePlusDot. Intel would 
say just throw it all away.

   Barry



Reply via email to