[petsc-dev] OpenMP in PETSc when calling from Fortran?

Åsmund Ervik Wed, 6 Mar 2013 09:39:26 +0000

Hi again,

On 01. mars 2013 20:06, Jed Brown wrote:
> 
> Matrix and vector operations are probably running in parallel, but probably
> not the operations that are taking time. Always send -log_summary if you
> have a performance question.
>


I don't think they are running in parallel. When I analyze my code in
Intel Vtune Amplifier, the only routines running in parallel are my own
OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my
code, it never uses more than one thread.

-log_summary is shown below; this is using -pc_type lu -ksp_type bcgs.
The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used
LU instead here in order to limit the test to PETSc only. The summary
agrees with Vtune that MatLUFactorNumeric is the most time-consuming
routine; in general it seems that the PC is always the most time-consuming.

Any advice on how to get OpenMP working?

Regards,
?smund



---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder
Wed Mar  6 10:14:55 2013
Using Petsc Development HG revision:
58cc6199509f1642f637843f1ca468283bf5ced9  HG Date: Wed Jan 30 00:39:35
2013 -0600

                         Max       Max/Min        Avg      Total
Time (sec):           4.446e+02      1.00000   4.446e+02
Objects:              2.017e+03      1.00000   2.017e+03
Flops:                3.919e+11      1.00000   3.919e+11  3.919e+11
Flops/sec:            8.815e+08      1.00000   8.815e+08  8.815e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       2.818e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 4.4460e+02 100.0%  3.9191e+11 100.0%  0.000e+00
0.0%  0.000e+00        0.0%  2.817e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops
         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot               802 1.0 9.2811e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2117
VecDotNorm2          401 1.0 7.1333e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
4.0e+02  0  0  0  0 14   0  0  0  0 14  2755
VecNorm             1203 1.0 7.8265e-02 1.0 2.95e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  3766
VecCopy              802 1.0 1.1754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1211 1.0 9.9961e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              401 1.0 4.5847e-02 1.0 9.82e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2143
VecAXPBYCZ           802 1.0 1.3489e-01 1.0 3.93e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2913
VecWAXPY             802 1.0 1.2292e-01 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1599
VecAssemblyBegin     802 1.0 2.4509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       802 1.0 6.7234e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult             1203 1.0 1.1513e+00 1.0 1.32e+09 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1149
MatSolve            1604 1.0 1.4714e+01 1.0 2.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00  3  5  0  0  0   3  5  0  0  0  1405
MatLUFactorSym       401 1.0 4.0197e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+03  9  0  0  0 43   9  0  0  0 43     0
MatLUFactorNum       401 1.0 2.3728e+02 1.0 3.69e+11 1.0 0.0e+00 0.0e+00
0.0e+00 53 94  0  0  0  53 94  0  0  0  1553
MatAssemblyBegin     401 1.0 1.7977e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       401 1.0 3.1975e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ          401 1.0 9.1545e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering       401 1.0 2.0361e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
8.0e+02  5  0  0  0 28   5  0  0  0 28     0
KSPSetUp             401 1.0 4.1821e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             401 1.0 3.1511e+02 1.0 3.92e+11 1.0 0.0e+00 0.0e+00
2.8e+03 71100  0  0100  71100  0  0100  1244
PCSetUp              401 1.0 2.9844e+02 1.0 3.69e+11 1.0 0.0e+00 0.0e+00
2.0e+03 67 94  0  0 71  67 94  0  0 71  1235
PCApply             1604 1.0 1.4717e+01 1.0 2.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00  3  5  0  0  0   3  5  0  0  0  1405
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector   409            409    401422048     0
              Matrix   402            402  31321054412     0
       Krylov Solver     1              1         1128     0
      Preconditioner     1              1         1152     0
           Index Set  1203           1203    393903904     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-ksp_type bcgs
-log_summary
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Fri Mar  1 12:53:06 2013
Configure options: --with-pthreadclasses --with-openmp
--with-debugging=0 --with-shared-libraries=1 --download-mpich
--download-hypre --with-boost-dir=/usr COPTFLAGS=-O3 FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Fri Mar  1 12:53:06 2013 on vsl161
Machine characteristics: Linux-3.7.9-1-ARCH-x86_64-with-glibc2.2.5
Using PETSc directory: /opt/petsc/petsc-dev-install
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler:
/opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpicc  -fPIC -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 -fopenmp
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler:
/opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpif90  -fPIC  -Wall
-Wno-unused-variable -Wno-unused-dummy-argument -O3 -fopenmp
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths:
-I/opt/petsc/petsc-dev-install/arch-linux2-c-opt/include
-I/opt/petsc/petsc-dev-install/include
-I/opt/petsc/petsc-dev-install/include
-I/opt/petsc/petsc-dev-install/arch-linux2-c-opt/include -I/usr/include
-----------------------------------------

Using C linker: /opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpicc
Using Fortran linker:
/opt/petsc/petsc-dev-install/arch-linux2-c-opt/bin/mpif90
Using libraries:
-Wl,-rpath,/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib
-L/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib
-L/opt/petsc/petsc-dev-install/arch-linux2-c-opt/lib -lHYPRE
-Wl,-rpath,/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.2
-L/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.2
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/compiler/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/compiler/lib/intel64
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/ipp/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/ipp/lib/intel64
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64
-Wl,-rpath,/opt/intel/composer_xe_2013.1.117/tbb/lib/intel64
-L/opt/intel/composer_xe_2013.1.117/tbb/lib/intel64 -lmpichcxx -lstdc++
-llapack -lblas -lX11 -lpthread -lm -lmpichf90 -lgfortran -lm -lgfortran
-lm -lquadmath -lm -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrt
-lgcc_s -ldl
-----------------------------------------

[petsc-dev] OpenMP in PETSc when calling from Fortran?

Reply via email to