I just tried your test code with gfortran [without petsc] - and I don't understand it. Does gfortran not support this openmp usage?
[tried gfortran 4.8.4 and 7.3.1] balay@es^/sandbox/balay/omp $ gfortran -fopenmp -c hellocount hellocount.F90 hellocount_main.F90 balay@es^/sandbox/balay/omp $ gfortran -fopenmp -c hellocount.F90 balay@es^/sandbox/balay/omp $ gfortran -fopenmp hellocount_main.F90 hellocount.o balay@es^/sandbox/balay/omp $ ./a.out Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 11 out of 32 Hello from 14 out of 32 Hello from 14 out of 32 Hello from 14 out of 32 Hello from 14 out of 32 Hello from 14 out of 32 Hello from 14 out of 32 Hello from 14 out of 32 ifort compiled test appears to behave correctly balay@es^/sandbox/balay/omp $ ifort -qopenmp -c hellocount.F90 balay@es^/sandbox/balay/omp $ ifort -qopenmp hellocount_main.F90 hellocount.o balay@es^/sandbox/balay/omp $ ./a.out |sort -n Hello from 0 out of 32 Hello from 10 out of 32 Hello from 11 out of 32 Hello from 12 out of 32 Hello from 13 out of 32 Hello from 14 out of 32 Hello from 15 out of 32 Hello from 16 out of 32 Hello from 17 out of 32 Hello from 18 out of 32 Hello from 19 out of 32 Hello from 1 out of 32 Hello from 20 out of 32 Hello from 21 out of 32 Hello from 22 out of 32 Hello from 23 out of 32 Hello from 24 out of 32 Hello from 25 out of 32 Hello from 26 out of 32 Hello from 27 out of 32 Hello from 28 out of 32 Hello from 29 out of 32 Hello from 2 out of 32 Hello from 30 out of 32 Hello from 31 out of 32 Hello from 3 out of 32 Hello from 4 out of 32 Hello from 5 out of 32 Hello from 6 out of 32 Hello from 7 out of 32 Hello from 8 out of 32 Hello from 9 out of 32 balay@es^/sandbox/balay/omp Now I build petsc with: ./configure --with-cc=icc --with-mpi=0 --with-openmp --with-fc=0 --with-cxx=0 PETSC_ARCH=arch-omp i.e balay@es^/sandbox/balay/omp $ ldd /sandbox/balay/petsc/arch-omp/lib/libpetsc.so linux-vdso.so.1 => (0x00007fff8bfb2000) liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f513fbbf000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f513e3b6000) libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f513e081000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f513de63000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f513dc5f000) libimf.so => /soft/com/packages/intel/16/u3/lib/intel64/libimf.so (0x00007f513d761000) libsvml.so => /soft/com/packages/intel/16/u3/lib/intel64/libsvml.so (0x00007f513c855000) libirng.so => /soft/com/packages/intel/16/u3/lib/intel64/libirng.so (0x00007f513c4e3000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f513c1dd000) libiomp5.so => /soft/com/packages/intel/16/u3/lib/intel64/libiomp5.so (0x00007f513be99000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f513bc83000) libintlc.so.5 => /soft/com/packages/intel/16/u3/lib/intel64/libintlc.so.5 (0x00007f513ba17000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f513b64e000) libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f513b334000) libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f513b115000) /lib64/ld-linux-x86-64.so.2 (0x00007f5142b40000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f513aed9000) libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f513acd5000) libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f513aacf000) And - then link in petsc with your test - and that works fine for me. balay@es^/sandbox/balay/omp $ rm -f *.o *.mod balay@es^/sandbox/balay/omp $ ifort -qopenmp -c hellocount.F90 balay@es^/sandbox/balay/omp $ ifort -qopenmp hellocount_main.F90 hellocount.o -Wl,-rpath,/sandbox/balay/petsc/arch-omp/lib -L/sandbox/balay/petsc/arch-omp/lib -lpetsc -liomp5 balay@es^/sandbox/balay/omp $ ./a.out |sort -n Hello from 0 out of 32 Hello from 10 out of 32 Hello from 11 out of 32 Hello from 12 out of 32 Hello from 13 out of 32 Hello from 14 out of 32 Hello from 15 out of 32 Hello from 16 out of 32 Hello from 17 out of 32 Hello from 18 out of 32 Hello from 19 out of 32 Hello from 1 out of 32 Hello from 20 out of 32 Hello from 21 out of 32 Hello from 22 out of 32 Hello from 23 out of 32 Hello from 24 out of 32 Hello from 25 out of 32 Hello from 26 out of 32 Hello from 27 out of 32 Hello from 28 out of 32 Hello from 29 out of 32 Hello from 2 out of 32 Hello from 30 out of 32 Hello from 31 out of 32 Hello from 3 out of 32 Hello from 4 out of 32 Hello from 5 out of 32 Hello from 6 out of 32 Hello from 7 out of 32 Hello from 8 out of 32 Hello from 9 out of 32 Satish On Fri, 2 Mar 2018, Adrián Amor wrote: > Thanks Satish, I tried the procedure you suggested and I get the same > performance, so I guess that MKL is not a problem in this case (I agree > with you that it has to be improved though... my makefile is a little > chaotic with all the libraries that I use). > > And thanks Barry and Matthew! I'll try to ask to the Intel compiler forum > since I also think that this is a problem related to the compiler and if I > make some advance I'll let you know! In the end, I guess I'll drop > acceleration through OpenMP threads... > > Thanks all! > > Adrian. > > 2018-03-02 17:11 GMT+01:00 Satish Balay <ba...@mcs.anl.gov>: > > > When using MKL - PETSc attempts to default to sequential MKL. > > > > Perhaps this pulls in a *conflicting* dependency against -liomp5 - and > > one has to use threaded MKL for this case. i.e not use > > -lmkl_sequential > > > > You appear to have multiple mkl libraires linked in - its not clear > > what they are for - and if there are any conflicts there. > > > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64 > > > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lpetsc -lmkl_intel_lp64 > > > -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lm > > > > > -lmkl_intel_lp64 lmkl_sequential -lmkl_core -lpthread > > > > To test this out - suggest rebuilding PETSc with > > --download-fblaslapack [and no mkl or related pacakges] - and then run > > this test case you have [with openmp] > > > > And then add back one mkl package at a time.. > > > > Satish > > > > > > On Fri, 2 Mar 2018, Adrián Amor wrote: > > > > > Hi all, > > > > > > I have been working in the last months with PETSC in a FEM program > > written > > > on FORTRAN, so far sequential. Now, I want to parallelize it with OpenMP > > > and I have found some problems. Finally, I have built a mockup program > > > trying to localize the error. > > > > > > 1. I have compiled PETSC with these options: > > > ./configure --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > > > --with-blas-lapack-dir=/opt/intel/mkl/lib/intel64/ --with-debugging=1 > > > --with-scalar-type=complex --with-threadcomm --with-pthreadclasses > > > --with-openmp > > > --with-openmp-include=/opt/intel/compilers_and_libraries_ > > 2016.1.150/linux/compiler/lib/intel64_lin > > > --with-openmp-lib=/opt/intel/compilers_and_libraries_2016. > > 1.150/linux/compiler/lib/intel64_lin/libiomp5.a > > > PETSC_ARCH=linux-intel-dbg PETSC-AVOID-MPIF-H=1 > > > > > > (I have tried also removing --with-threadcomm --with-pthreadclasses and > > > with libiomp5.so). > > > > > > 2. The program to be executed is composed of two files, one is > > > hellocount.F90: > > > MODULE hello_count > > > use omp_lib > > > IMPLICIT none > > > > > > CONTAINS > > > subroutine hello_print () > > > integer :: nthreads,mythread > > > > > > !pragma hello-who-omp-f > > > !$omp parallel > > > nthreads = omp_get_num_threads() > > > mythread = omp_get_thread_num() > > > write(*,'("Hello from",i3," out of",i3)') mythread,nthreads > > > !$omp end parallel > > > !pragma end > > > end subroutine hello_print > > > END MODULE hello_count > > > > > > and the other one is hellocount_main.F90: > > > Program Hello > > > > > > USE hello_count > > > > > > call hello_print > > > > > > STOP > > > > > > end Program Hello > > > > > > 3. To compile these two functions I use: > > > rm -rf _obj > > > mkdir _obj > > > > > > ifort -E -I/home/aamor/petsc/include > > > -I/home/aamor/petsc/linux-intel-dbg/include -c hellocount.F90 > > > >_obj/hellocount.f90 > > > ifort -E -I/home/aamor/petsc/include > > > -I/home/aamor/petsc/linux-intel-dbg/include -c hellocount_main.F90 > > > >_obj/hellocount_main.f90 > > > > > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp -module > > > _obj -I./_obj -I/home/aamor/MUMPS_5.1.2/include > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > mkl/include/intel64/lp64/ > > > -I/home/aamor/petsc/include -I/home/aamor/petsc/linux-intel-dbg/include > > -o > > > _obj/hellocount.o -c _obj/hellocount.f90 > > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp -module > > > _obj -I./_obj -I/home/aamor/MUMPS_5.1.2/include > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/include > > > -I/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > mkl/include/intel64/lp64/ > > > -I/home/aamor/petsc/include -I/home/aamor/petsc/linux-intel-dbg/include > > -o > > > _obj/hellocount_main.o -c _obj/hellocount_main.f90 > > > > > > mpiifort -CB -g -warn all -O0 -shared-intel -check:none -qopenmp -module > > > _obj -I./_obj -o exec/HELLO _obj/hellocount.o _obj/hellocount_main.o > > > /home/aamor/lib_tmp/libarpack_LinuxIntel15.a > > > /home/aamor/MUMPS_5.1.2/lib/libzmumps.a > > > /home/aamor/MUMPS_5.1.2/lib/libmumps_common.a > > > /home/aamor/MUMPS_5.1.2/lib/libpord.a > > > /home/aamor/parmetis-4.0.3/lib/libparmetis.a > > > /home/aamor/parmetis-4.0.3/lib/libmetis.a > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl/lib/intel64 > > > -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lpetsc -lmkl_intel_lp64 > > > -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread -lm > > > -L/home/aamor/lib_tmp -lgidpost -lz /home/aamor/lua-5.3.3/src/liblua.a > > > /home/aamor/ESEAS-master/libeseas.a > > > -Wl,-rpath,/home/aamor/petsc/linux-intel-dbg/lib > > > -L/home/aamor/petsc/linux-intel-dbg/lib > > > -Wl,-rpath,/opt/intel/mkl/lib/intel64 -L/opt/intel/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt > > -L/opt/intel/impi/ > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016. > > 1.150/linux/compiler/lib/intel64_lin > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > compiler/lib/intel64_lin > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/debug_mt > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -lmkl_intel_lp64 > > > -lmkl_sequential -lmkl_core -lpthread -lX11 -lssl -lcrypto -lifport > > > -lifcore_pic -lmpicxx -ldl -Wl,-rpath,/opt/intel/impi/ > > > 5.1.2.150/intel64/lib/debug_mt -L/opt/intel/impi/ > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib -lmpifort > > > -lmpi -lmpigi -lrt -lpthread -Wl,-rpath,/opt/intel/impi/ > > > 5.1.2.150/intel64/lib/debug_mt -L/opt/intel/impi/ > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016. > > 1.150/linux/compiler/lib/intel64_lin > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > compiler/lib/intel64_lin > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib/debug_mt > > > -Wl,-rpath,/opt/intel/mpi-rt/5.1/intel64/lib -limf -lsvml -lirng -lm > > -lipgo > > > -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s > > > -Wl,-rpath,/opt/intel/impi/5.1.2.150/intel64/lib/debug_mt > > -L/opt/intel/impi/ > > > 5.1.2.150/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/impi/ > > > 5.1.2.150/intel64/lib -L/opt/intel/impi/5.1.2.150/intel64/lib > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016. > > 1.150/linux/compiler/lib/intel64_lin > > > -L/opt/intel/compilers_and_libraries_2016.1.150/linux/ > > compiler/lib/intel64_lin > > > -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 > > > -Wl,-rpath,/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 > > > -L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 -ldl > > > > > > exec/HELLO > > > > > > 4. Then I have seen that: > > > 4.1. If I set OMP_NUM_THREADS=2 and I remove -lpetsc and -lifcore_pic > > from > > > the last step, I got: > > > Hello from 0 out of 2 > > > Hello from 1 out of 2 > > > 4.2 But if add -lpetsc and -lifcore_pic (because I want to use PETSC) I > > get > > > this error: > > > Hello from 0 out of 2 > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > Image PC Routine Line > > Source > > > HELLO 000000000041665C Unknown Unknown > > Unknown > > > HELLO 00000000004083C8 Unknown Unknown > > Unknown > > > libiomp5.so 00007F9C603566A3 Unknown Unknown > > Unknown > > > libiomp5.so 00007F9C60325007 Unknown Unknown > > Unknown > > > libiomp5.so 00007F9C603246F5 Unknown Unknown > > Unknown > > > libiomp5.so 00007F9C603569C3 Unknown Unknown > > Unknown > > > libpthread.so.0 0000003CE76079D1 Unknown Unknown > > Unknown > > > libc.so.6 0000003CE6AE88FD Unknown Unknown > > Unknown > > > If you set OMP_NUM_THREADS to 8, I get: > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > forrtl: severe (40): recursive I/O operation, unit -1, file unknown > > > > > > I am sorry if this is a trivial problem because I guess that lots of > > people > > > use PETSC with OpenMP in FORTRAN, but I have really done my best to > > figure > > > out where the error is. Can you help me? > > > > > > Thanks a lot! > > > > > > Adrian. > > > > > >