Hi Hong, Thanks for the advice. I see that the example takes ~180 seconds to run but I can't see the DRAM vs MCDRAM info from Intel APS. I'll try to fix the profiling and get back with further questions.
Also, the intel-mpi manpages say that the use of tmi is now deprecated : https://software.intel.com/en-us/mpi-developer-guide-linux-fabrics-control Thank You, Sajid Ali Applied Physics Northwestern University