> On Oct 10, 2017, at 10:52 AM, Bakytzhan Kallemov <bkalle...@lbl.gov> wrote: > > Hi, > > My name is Baky Kallemov. > > Currently, I am working on improving a scalibility of the Chombo-Petsc > interface on cori machine at nersc system. > > I successfully build the libs from master branch with --with-openmp and hypre. > > However, I have not noticed any difference running my test problem on single > node KNL node using new MATAIJMKL
hyre uses its own matrix operations so it won't get faster when using running PETSc with MATAIJMKL or any other specific matrix type. > > > type for different hybrid mpi+openmp runs compared to regular released > version. What are you comparing? Are you using say 32 MPI processes and 2 threads or 16 MPI processes and 4 threads? How are you controlling the number of OpenMP threads, OpenMP environmental variable? What parts of the time in the code are you comparing? You should just -log_view and compare the times for PCApply and PCSetUp() between say 64 MPI process/1 thread and 32 MPI processes/2 threads and send us the output for those two cases. > > It seems that it made no difference, so perhaps I am doing something wrong or > my build is not configured right. > > Do you have any example that makes use of threads when running hybrid and > show an advantage? There is not reason to think that using threads on KNL is faster than just using MPI processes. Despite what the NERSc/LBL web pages may say, just because a website says something doesn't make it true. > > I'd like to test it and make sure that my libs are configured correctly, > before start to investigate it further. > > > Thanks, > > Baky > >