> On Oct 10, 2017, at 10:52 AM, Bakytzhan Kallemov <bkalle...@lbl.gov> wrote:
> 
> Hi,
> 
> My name is Baky Kallemov.
> 
> Currently, I am working on improving a scalibility of  the Chombo-Petsc 
> interface on cori machine at nersc system.
> 
> I successfully build the libs from master branch with --with-openmp and hypre.
> 
> However, I have not noticed any difference running my test problem on single 
> node KNL node using new MATAIJMKL

  hyre uses its own matrix operations so it won't get faster when using running 
PETSc with MATAIJMKL or any other specific matrix type.
> 
> 
> type for different hybrid mpi+openmp runs  compared to regular released 
> version.

   What are you comparing? Are you using say 32 MPI processes and 2 threads or 
16 MPI processes and 4 threads? How are you controlling the number of OpenMP 
threads, OpenMP environmental variable? What parts of the time in the code are 
you comparing? You should just -log_view and compare the times for PCApply and 
PCSetUp() between say 64 MPI process/1 thread and 32 MPI processes/2 threads 
and send us the output for those two cases.

> 
> It seems that it made no difference, so perhaps I am doing something wrong or 
> my build is not configured right.
> 
> Do you have any example that makes use of threads when running hybrid and 
> show an advantage?

   There is not reason to think that using threads on KNL is faster than just 
using MPI processes. Despite what the NERSc/LBL web pages may say, just because 
a website says something doesn't make it true.


> 
> I'd like to test it and make sure that my libs are configured correctly, 
> before start to investigate it further.
> 
> 
> Thanks,
> 
> Baky
> 
> 

Reply via email to