Hi Hong,

So, the speedup was coming from increased DRAM bandwidth and not the usage
of MCDRAM.

There is moderate MPI imbalance, large amount of Back-End stalls and good
vectorization.

I'm attaching my submit script, PETSc log file and Intel APS summary (all
as non-HTML text). I can give more detailed analysis via Intel Vtune if
needed.


Thank You,
Sajid Ali
Applied Physics
Northwestern University

Attachment: submit_script
Description: Binary data

Attachment: intel_aps_report
Description: Binary data

Attachment: knl_petsc
Description: Binary data

Reply via email to