On Wed, Dec 22, 2010 at 6:53 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> On Wed, 22 Dec 2010, Yongjun Chen wrote: > > > On Wed, Dec 22, 2010 at 6:32 PM, Satish Balay <balay at mcs.anl.gov> wrote: > > > > > Thanks a lot, Satish. It is much clear now. But for the choice of the > two, > > the program dmidecode does not show this information. Do you know any way > to > > get it? > > why do you expect dmidecode to show that? > > You'll have to look for the CPU/chipset hardware documentation - and > look at the details - and sometimes they mention these details.. > > Satish > Thanks, Satish. Yes, I need to check it. Just now I re-configured PETSC with the option --with-device=ch3:nemsis. The results are almost the same as --with-device=ch3:sock. As can be seen in the attachment. I hope the matrix partitioning - reordering algorithm would have some positive effects. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101222/3dc041ce/attachment-0001.htm> -------------- next part -------------- Process 0 of total 8 on wmss04 Process 4 of total 8 on wmss04 Process 1 of total 8 on wmss04 Process 5 of total 8 on wmss04 Process 6 of total 8 on wmss04 Process 2 of total 8 on wmss04 Process 3 of total 8 on wmss04 Process 7 of total 8 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly.End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 17:41:47 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.32502e-06 Norm of error 1.32502e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 333.681 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 17:47:21 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny Wed Dec 22 18:47:21 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 3.558e+02 1.00000 3.558e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 Flops/sec: 2.190e+08 1.09702 2.140e+08 1.712e+09 MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.5581e+02 100.0% 6.0914e+11 100.0% 4.135e+04 100.0% 2.430e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.5404e+02 1.6 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 35 47 50 50 0 35 47 50 50 0 1876 MatMultTranspose 1473 1.0 1.4721e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 0.0e+00 37 47 50 50 0 37 47 50 50 0 1962 MatAssemblyBegin 1 1.0 6.0289e-0316.6 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.2618e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.0790e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.0855e+0112.8 0.00e+00 0.0 1.4e+01 5.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 9.9344e+0120.5 8.67e+08 1.0 0.0e+00 0.0e+00 2.9e+03 12 1 0 0 66 12 1 0 0 66 70 VecNorm 1475 1.0 5.6723e+00 2.9 4.34e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 613 VecCopy 4 1.0 5.5063e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.1978e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 8.6108e+00 1.3 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1209 VecAYPX 2944 1.0 6.0635e+00 1.4 8.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1144 VecAssemblyBegin 6 1.0 4.8455e-0217.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 3.5286e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 8.7080e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 399 VecScatterBegin 2947 1.0 1.8601e+00 2.6 0.00e+00 0.0 4.1e+04 2.4e+05 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 2947 1.0 9.0296e+0116.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 KSPSetup 1 1.0 9.8538e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 3.2263e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 4.4e+03 91100100100 99 91100100100 99 1887 PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 8.7381e+00 1.3 4.34e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 397 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 84944064 0 Vec 18 18 15741712 0 Vec Scatter 2 2 1736 0 Index Set 4 4 409008 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 4.98295e-06 Average time for MPI_Barrier(): 9.76086e-05 Average time for zero size MPI_Send(): 2.81334e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 18:24:43 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3nemsis ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 12 on wmss04 Process 4 of total 12 on wmss04 Process 6 of total 12 on wmss04 Process 5 of total 12 on wmss04Process 11 of total 12 on wmss04 Process 2 of total 12 on wmss04 Process 7 of total 12 on wmss04 Process 3 of total 12 on wmss04 Process 8 of total 12 on wmss04 Process 1 of total 12 on wmss04 Process 9 of total 12 on wmss04 Process 10 of total 12 on wmss04 The dimension of Matrix A is n = 1177754 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 17:55:12 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.28414e-06 Norm of error 1.28414e-06, Iterations 1473 ========================================================= The solver has finished successfully! ========================================================= The solving time is 241.392 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 17:59:13 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny Wed Dec 22 18:59:13 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 2.594e+02 1.00000 2.594e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 Flops/sec: 2.004e+08 1.11689 1.956e+08 2.348e+09 MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 MPI Reductions: 4.477e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.5935e+02 100.0% 6.0890e+11 100.0% 6.498e+04 100.0% 2.345e+05 100.0% 4.461e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1474 1.0 1.1203e+02 1.5 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 39 47 50 50 0 39 47 50 50 0 2579 MatMultTranspose 1473 1.0 9.9342e+01 1.3 2.47e+10 1.1 3.2e+04 2.3e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2906 MatAssemblyBegin 1 1.0 3.7930e-03 8.9 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.1536e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 2.2507e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.2744e+0166.4 0.00e+00 0.0 2.2e+01 3.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2946 1.0 5.4256e+0115.3 5.78e+08 1.0 0.0e+00 0.0e+00 2.9e+03 6 1 0 0 66 6 1 0 0 66 128 VecNorm 1475 1.0 7.3386e+00 5.2 2.90e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 473 VecCopy 4 1.0 6.2873e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8843 1.0 2.5036e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4420 1.0 7.4288e+00 1.8 8.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1401 VecAYPX 2944 1.0 5.0487e+00 2.5 5.78e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1374 VecAssemblyBegin 6 1.0 3.4969e-0211.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 5.5075e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2948 1.0 7.2035e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 482 VecScatterBegin 2947 1.0 2.5759e+00 2.7 0.00e+00 0.0 6.5e+04 2.3e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2947 1.0 5.1555e+0111.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 KSPSetup 1 1.0 8.2631e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.2851e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 4.4e+03 88100100100 99 88100100100 99 2664 PCSetUp 1 1.0 7.1526e-06 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2948 1.0 7.2339e+00 1.7 2.89e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 480 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 56593044 0 Vec 18 18 10534536 0 Vec Scatter 2 2 1736 0 Index Set 4 4 305424 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 7.82013e-06 Average time for MPI_Barrier(): 9.52244e-05 Average time for zero size MPI_Send(): 2.15769e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 18:24:43 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3nemsis ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------ -------------- next part -------------- Process 0 of total 16 on wmss04 Process 8 of total 16 on wmss04 Process 4 of total 16 on wmss04 Process 6 of total 16 on wmss04 Process 14 of total 16 on wmss04 Process 12 of total 16 on wmss04 Process 2 of total 16 on wmss04 Process 10 of total 16 on wmss04 Process Process 3 of total 16 on wmss04 Process 15 of total 16 on wmss04 7 of total 16 on wmss04Process 1 of total 16 on wmss04 Process 9 of total 16 on wmss04 Process 5 of total 16 on wmss04 Process 13 of total 16 on wmss04 The dimension of Matrix A is n = 1177754 Process 11 of total 16 on wmss04 Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly: Begin Assembly:Begin Assembly: Begin Assembly: End Assembly. End Assembly. End Assembly.End Assembly. End Assembly.End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. End Assembly. ========================================================= Begin the solving: ========================================================= The current time is: Wed Dec 22 17:50:47 2010 KSP Object: type: bicg maximum iterations=10000, initial guess is zero tolerances: relative=1e-07, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: type: jacobi linear system matrix = precond matrix: Matrix Object: type=mpisbaij, rows=1177754, cols=1177754 total: nonzeros=49908476, allocated nonzeros=49908476 block size is 1 norm(b-Ax)=1.23596e-06 Norm of error 1.23596e-06, Iterations 1481 ========================================================= The solver has finished successfully! ========================================================= The solving time is 227.888 seconds. The time accuracy is 1e-06 second. The current time is Wed Dec 22 17:54:35 2010 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny Wed Dec 22 18:54:35 2010 Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 Max Max/Min Avg Total Time (sec): 2.442e+02 1.00001 2.442e+02 Objects: 3.000e+01 1.00000 3.000e+01 Flops: 3.922e+10 1.13060 3.822e+10 6.116e+11 Flops/sec: 1.606e+08 1.13060 1.565e+08 2.504e+09 MPI Messages: 1.187e+04 3.99916 7.051e+03 1.128e+05 MPI Message Lengths: 1.929e+09 7.80850 1.819e+05 2.052e+10 MPI Reductions: 4.501e+03 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.4422e+02 100.0% 6.1159e+11 100.0% 1.128e+05 100.0% 1.819e+05 100.0% 4.485e+03 99.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 1482 1.0 1.1549e+02 2.0 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 36 47 50 50 0 36 47 50 50 0 2513 MatMultTranspose 1481 1.0 9.3652e+01 1.4 1.86e+10 1.1 5.6e+04 1.8e+05 0.0e+00 32 47 50 50 0 32 47 50 50 0 3097 MatAssemblyBegin 1 1.0 4.6110e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 5.1871e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 MatView 1 1.0 5.1212e-04 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecView 1 1.0 1.2031e+01123.8 0.00e+00 0.0 3.0e+01 2.9e+05 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecDot 2962 1.0 7.2313e+0122.5 4.36e+08 1.0 0.0e+00 0.0e+00 3.0e+03 13 1 0 0 66 13 1 0 0 66 96 VecNorm 1483 1.0 5.2508e+00 4.6 2.18e+08 1.0 0.0e+00 0.0e+00 1.5e+03 1 1 0 0 33 1 1 0 0 33 665 VecCopy 4 1.0 3.2623e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 8891 1.0 2.5386e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 4444 1.0 6.6341e+00 1.6 6.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 1578 VecAYPX 2960 1.0 4.2830e+00 1.7 4.36e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1628 VecAssemblyBegin 6 1.0 4.0186e-0213.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 6 1.0 6.0081e-05 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 2964 1.0 6.2569e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 558 VecScatterBegin 2963 1.0 2.9219e+00 4.0 0.00e+00 0.0 1.1e+05 1.8e+05 0.0e+00 1 0100100 0 1 0100100 0 0 VecScatterEnd 2963 1.0 5.0568e+01 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 KSPSetup 1 1.0 5.8019e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 2.1573e+02 1.0 3.92e+10 1.1 1.1e+05 1.8e+05 4.4e+03 88100100100 99 88100100100 99 2834 PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 2964 1.0 6.2830e+00 1.6 2.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 556 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 3 42424600 0 Vec 18 18 7924896 0 Vec Scatter 2 2 1736 0 Index Set 4 4 247632 0 Krylov Solver 1 1 832 0 Preconditioner 1 1 872 0 Viewer 1 1 544 0 ======================================================================================================================== Average time to get PetscTime(): 1.38998e-05 Average time for MPI_Barrier(): 0.00011363 Average time for zero size MPI_Send(): 2.03103e-05 #PETSc Option Table entries: -ksp_type bicg -log_summary -pc_type jacobi #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Wed Dec 22 18:24:43 2010 Configure options: --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 --download-superlu_dist=1 --download-hypre=1 --download-ml=1 --download-parmetis=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-mpich=1 --with-device=ch3:nemsis --with-debugging=0 --with-batch --known-mpi-shared=1 ----------------------------------------- Libraries compiled on Wed Dec 22 18:26:55 CET 2010 on wmss04 Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized Using PETSc arch: linux-gnu-c-opt-ch3nemsis ----------------------------------------- Using C compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran compiler: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O ----------------------------------------- Using include paths: -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include -I/sun42/cheny/petsc-3.1-p5-optimized/include -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/include ------------------------------------------ Using C linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -O Using Fortran linker: /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/bin/mpif90 -Wall -Wno-unused-variable -O Using libraries: -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lpetsc -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -lHYPRE -lmpichcxx -lstdc++ -lsuperlu_dist_2.4 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -lml -lmpichcxx -lstdc++ -lflapack -lfblas -lnsl -laio -lrt -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt-ch3nemsis/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lpthread -lrt -lgcc_s -ldl ------------------------------------------
