Hi Barry,

Thanks for the quick response. What I want to test is to check if OpenMP has any benefit when total degrees of freedoms per processor drops below 5k. When using pure MPI my code shows good speedup if total degrees of freedoms per processor is above 10k. But below this value, the parallel efficiency decreases.

The petsc 3.6 change log indicates

 * Removed all threadcomm support including --with-pthreadclasses and
   --with-openmpclasses configure arguments

I guess petsc 3.5 version is the last version I can test, right?

Thanks,

Danyang


On 17-06-30 03:49 PM, Barry Smith wrote:
   The current version of PETSc does not use OpenMP, you are free to use OpenMP 
in your portions of the code of course. If you want PETSc using OpenMP you have 
to use the old, unsupported version of PETSc. We never found any benefit to 
using OpenMP.

    Barry

On Jun 30, 2017, at 5:40 PM, Danyang Su <danyang...@gmail.com> wrote:

Dear All,

I recalled there was OpenMP available for PETSc for the old development version. When google 
"petsc hybrid mpi openmp", there returned some papers about this feature. My code was 
first parallelized using OpenMP and then redeveloped using PETSc, with OpenMP kept but not used 
together with MPI. Before retesting the code using hybrid mpi-openmp, I picked one PETSc example 
ex10 by adding "omp_set_num_threads(max_threads);" under PetscInitialize.

The PETSc is the current development version configured as follows

--with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 --CFLAGS=-fopenmp --CXXFLAGS=-fopenmp 
--FFLAGS=-fopenmp COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native 
-mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-large-file-io=1 
--download-cmake=yes --download-mumps --download-scalapack --download-parmetis --download-metis 
--download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist 
--download-hdf5=yes --with-openmp --with-threadcomm --with-pthreadclasses --with-openmpclasses

The code can be successfully compiled. However, when I run the code with 
OpenMP, it does not work, the time shows no change in performance if 1 or 2 
threads per processor is used. Also, the CPU/Threads usage indicates that no 
thread is used.

I just wonder if OpenMP is still available in the latest version, though it is 
not recommended to use.

mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs 
mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor 
-ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view 
ascii::ascii_info -log_view -max_threads 1 -threadcomm_type openmp 
-threadcomm_nthreads 1

KSPSolve               1 1.0 8.9934e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 
7.8e+01 69 97 89  6 76  89 97 98 98 96  2290
PCSetUp                2 1.0 8.9590e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  7  3  0  0  0   9  3  0  0  0   648
PCSetUpOnBlocks        2 1.0 8.9465e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  7  3  0  0  0   9  3  0  0  0   649
PCApply               40 1.0 3.1993e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 24 25  0  0  0  32 25  0  0  0  1686

mpiexec -n 2 ./ex10 -f0 mat_rhs_pc_nonzero/a_react_in_2.bin -rhs 
mat_rhs_pc_nonzero/b_react_in_2.bin -ksp_rtol 1.0e-20 -ksp_monitor 
-ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view 
ascii::ascii_info -log_view -max_threads 2 -threadcomm_type openmp 
-threadcomm_nthreads 2

KSPSolve               1 1.0 8.9701e-01 1.0 1.03e+09 1.0 7.8e+01 3.6e+04 
7.8e+01 69 97 89  6 76  89 97 98 98 96  2296
PCSetUp                2 1.0 8.7635e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  7  3  0  0  0   9  3  0  0  0   663
PCSetUpOnBlocks        2 1.0 8.7511e-02 1.0 2.91e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  7  3  0  0  0   9  3  0  0  0   664
PCApply               40 1.0 3.1878e-01 1.0 2.70e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 24 25  0  0  0  32 25  0  0  0  1692

Thanks and regards,

Danyang


<ex10.c><makefile.txt>

Reply via email to