Thanks for your reply Matt. The problem seems to be the MKL threads I just realized.
Inside the MatShell I call: call omp_set_nested(.true.) call omp_set_dynamic(.false.) call mkl_set_dynamic(0) Then, inside the omp single thread I use: nMkl0 = mkl_set_num_threads_local(nMkl) where nMkl is set to 24 MKL_VERBOSE shows, that the calls to have access to 24 threads but the timings are the same as in 1 thread MKL_VERBOSE ZGEMV(N,12544,12544,0x7ffde9edc800,0x14e4662d2010,12544,0x14985e610,1,0x7ffde9edc7f0,0x189faaa90,1) 117.09ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:24 MKL_VERBOSE ZGEMV(N,12544,12544,0x7ffe00355700,0x14c8ec1e4010,12544,0x16959c830,1,0x7ffe003556f0,0x17dd7da70,1) 117.37ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:1 The configuration of OpenMP that is launching these MKL processes is as follows: OPENMP DISPLAY ENVIRONMENT BEGIN _OPENMP = '201511' OMP_DYNAMIC = 'FALSE' OMP_NESTED = 'TRUE' OMP_NUM_THREADS = '24' OMP_SCHEDULE = 'DYNAMIC' OMP_PROC_BIND = 'TRUE' OMP_PLACES = '{0:24}' OMP_STACKSIZE = '0' OMP_WAIT_POLICY = 'PASSIVE' OMP_THREAD_LIMIT = '4294967295' OMP_MAX_ACTIVE_LEVELS = '255' OMP_CANCELLATION = 'FALSE' OMP_DEFAULT_DEVICE = '0' OMP_MAX_TASK_PRIORITY = '0' OMP_DISPLAY_AFFINITY = 'FALSE' OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A' OMP_ALLOCATOR = 'omp_default_mem_alloc' OMP_TARGET_OFFLOAD = 'DEFAULT' GOMP_CPU_AFFINITY = '' GOMP_STACKSIZE = '0' GOMP_SPINCOUNT = '300000' OPENMP DISPLAY ENVIRONMENT END On Fri, Apr 7, 2023 at 1:25 PM Matthew Knepley <knep...@gmail.com> wrote: > On Fri, Apr 7, 2023 at 2:26 PM Astor Piaz <appiazzo...@gmail.com> wrote: > >> Hi Matthew, Jungchau, >> Thank you for your advice. The code still does not work, I give more >> details about it below, I can specify more about it as you wish. >> >> I am implementing a spectral method resulting in a block matrix where the >> off-diagonal blocks are Poincare-Steklov operators of >> impedance-to-impedance type. >> Those Poincare-Steklov operators have been created hierarchically merging >> subdomain operators (the HPS method), and I have a well tuned (but rather >> complex) OpenMP+MKL code that can apply this operator very fast. >> I would like to use PETSc's MPI-parallel GMRES solver with a MatShell >> that calls my OpenMP+MKL code, while each block can be in a different MPI >> process. >> >> At the moment the code runs correctly, except that PETSc is not letting >> my OpenMP+MKL code make the scheduling of threads as I choose. >> > > PETSc does not say anything about OpenMP threads. However, maybe you need > to launch the executable with the correct OMP env variables? > > Thanks, > > Matt > > >> I am using >> ./configure --with-scalar-type=complex --prefix=../install/fast/ >> --with-debugging=0 -with-openmp=1 --with-blaslapack-dir=${MKLROOT} >> --with-mkl_cpardiso-dir=${MKLROOT} --with-threadsafety --with-log=0 >> COPTFLAGS=-g -Ofast CXXOPTFLAGS=-g -Ofast FOPTFLAGS=-g -Ofast >> >> Attached is an image of htop showing that the MKL threads are indeed >> being spawn, but they remain unused by the code. The previous calculations >> on the code show that it is capable of using OpenMP and MKL, only when >> PETSC KSPSolver is called MKL seems to be turned off. >> >> On Fri, Apr 7, 2023 at 8:10 AM Matthew Knepley <knep...@gmail.com> wrote: >> >>> On Fri, Apr 7, 2023 at 10:06 AM Astor Piaz <appiazzo...@gmail.com> >>> wrote: >>> >>>> Hello petsc-users, >>>> I am trying to use a code that is parallelized with a combination of >>>> OpenMP and MKL parallelisms, where OpenMP threads are able to spawn MPI >>>> processes. >>>> I have carefully scheduled the processes such that the right amount is >>>> launched, at the right time. >>>> When trying to use my code inside a MatShell (for later use in an >>>> FGMRES KSPSolver), MKL processes are not being used. >>>> >>>> I am sorry if this has been asked before. >>>> What configuration should I use in order to profit from MPI+OpenMP+MKL >>>> parallelism? >>>> >>> >>> You should configure using --with-threadsafety >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thank you! >>>> -- >>>> Astor >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <http://www.cse.buffalo.edu/~knepley/> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >