On Mar 1, 2019, at 11:00 AM, Sajid Ali <sajidsyed2...@u.northwestern.edu<mailto:sajidsyed2...@u.northwestern.edu>> wrote:
Hi Hong, So, the speedup was coming from increased DRAM bandwidth and not the usage of MCDRAM. Certainly the speedup was coming from the usage of MCDRAM (which has much higher bandwidth than DRAM). What I meant is your code is still using MCDRAM, but MCDRAM acts like L3 cache in cache mode. Hong There is moderate MPI imbalance, large amount of Back-End stalls and good vectorization. I'm attaching my submit script, PETSc log file and Intel APS summary (all as non-HTML text). I can give more detailed analysis via Intel Vtune if needed. Thank You, Sajid Ali Applied Physics Northwestern University <submit_script><intel_aps_report><knl_petsc>