Hi Barry,
Thanks for the quick response. What I want to test is to check if OpenMP
has any benefit when total degrees of freedoms per processor drops below
5k. When using pure MPI my code shows good speedup if total degrees of
freedoms per processor is above 10k. But below this value, the parallel
efficiency decreases.
The petsc 3.6 change log indicates
* Removed all threadcomm support including --with-pthreadclasses and
--with-openmpclasses configure arguments
I guess petsc 3.5 version is the last version I can test, right?
Thanks,
Danyang
On 17-06-30 03:49 PM, Barry Smith wrote:
The current version of PETSc does not use OpenMP, you are free to use OpenMP
in your portions of the code of course. If you want PETSc using OpenMP you have
to use the old, unsupported version of PETSc. We never found any benefit to
using OpenMP.
Barry
On Jun 30, 2017, at 5:40 PM, Danyang Su <danyang...@gmail.com> wrote:
Dear All,
I recalled there was OpenMP available for PETSc for the old development version. When google
"petsc hybrid mpi openmp", there returned some papers about this feature. My code was
first parallelized using OpenMP and then redeveloped using PETSc, with OpenMP kept but not used
together with MPI. Before retesting the code using hybrid mpi-openmp, I picked one PETSc example
ex10 by adding "omp_set_num_threads(max_threads);" under PetscInitialize.
The PETSc is the current development version configured as follows
--with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 --CFLAGS=-fopenmp --CXXFLAGS=-fopenmp
--FFLAGS=-fopenmp COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native
-mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-large-file-io=1
--download-cmake=yes --download-mumps --download-scalapack --download-parmetis --download-metis
--download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist
--download-hdf5=yes --with-openmp --with-threadcomm --with-pthreadclasses --with-openmpclasses
The code can be successfully compiled. However, when I run the code with
OpenMP, it does not work, the time shows no change in performance if 1 or 2
threads per processor is used. Also, the CPU/Threads usage indicates that no
thread is used.
I just wonder if OpenMP is still available in the latest version, though it is
not recommended to use.
mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs
mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor
-ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view
ascii::ascii_info -log_view -max_threads 1 -threadcomm_type openmp
-threadcomm_nthreads 1
KSPSolve 1 1.0 8.9934e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04
7.8e+01 69 97 89 6 76 89 97 98 98 96 2290
PCSetUp 2 1.0 8.9590e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00
0.0e+00 7 3 0 0 0 9 3 0 0 0 648
PCSetUpOnBlocks 2 1.0 8.9465e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00
0.0e+00 7 3 0 0 0 9 3 0 0 0 649
PCApply 40 1.0 3.1993e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00
0.0e+00 24 25 0 0 0 32 25 0 0 0 1686
mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs
mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor
-ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view
ascii::ascii_info -log_view -max_threads 2 -threadcomm_type openmp
-threadcomm_nthreads 2
KSPSolve 1 1.0 8.9701e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04
7.8e+01 69 97 89 6 76 89 97 98 98 96 2296
PCSetUp 2 1.0 8.7635e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00
0.0e+00 7 3 0 0 0 9 3 0 0 0 663
PCSetUpOnBlocks 2 1.0 8.7511e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00
0.0e+00 7 3 0 0 0 9 3 0 0 0 664
PCApply 40 1.0 3.1878e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00
0.0e+00 24 25 0 0 0 32 25 0 0 0 1692
Thanks and regards,
Danyang
<ex10.c><makefile.txt>