Matt, Barry, thanks a lot for your reply! I will try mpich hydra firstly and see what I can get.
Yongjun On Mon, Dec 20, 2010 at 8:21 PM, Matthew Knepley <knepley at gmail.com> wrote: > On Mon, Dec 20, 2010 at 10:38 AM, Yongjun Chen <yjxd.chen at gmail.com>wrote: > >> Hi Matt, >> >> Thanks for your reply. Just now I have carried out a series of tests with >> k=2, 4, 8, 12 and 16 cores on the first server again with the -log_summary >> option. From 8 cores to 12 cores, a small speed up has been found this time, >> but from 12 cores to 16 cores, the computation time increase! >> Attached please find these 5 log files. Thank you very much! >> > > Its very clear from these, but Barry was right in his reply. These are > memory bandwidth limited > computations, so if you don't get any more bandwidth you will not speed up. > This is rarely mentioned > in sales pitches for multicore computers. LAMMPS is not limited by > bandwidth for most computations. > > Matt > > >> mpiexec -n *k* ./AMG_Solver_MPI -pc_type jacobi -ksp_type bicg >> -log_summary >> Here, I use ksp bicg instead of gmres, because the two ksp gives almost >> the same speed up performance, as I have tried many times. >> ---------------------- >> (1) k=2 >> ---------------------- >> Process 1 of total 2 on wmss04 >> Process 0 of total 2 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 17:42:23 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.25862e-06 >> Norm of error 1.25862e-06, Iterations 1475 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 762.874 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 17:55:06 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 2 processors, by cheny >> Mon Dec 20 18:55:06 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 8.160e+02 1.00000 8.160e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 3.120e+11 1.04720 3.050e+11 6.100e+11 >> Flops/sec: 3.824e+08 1.04720 3.737e+08 7.475e+08 >> MPI Messages: 2.958e+03 1.00068 2.958e+03 5.915e+03 >> MPI Message Lengths: 9.598e+08 1.00034 3.245e+05 1.919e+09 >> MPI Reductions: 4.483e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 8.1603e+02 100.0% 6.0997e+11 100.0% 5.915e+03 >> 100.0% 3.245e+05 100.0% 4.467e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1476 1.0 3.4220e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 >> 0.0e+00 41 47 50 50 0 41 47 50 50 0 846 >> MatMultTranspose 1475 1.0 3.4208e+02 1.0 1.48e+11 1.0 3.0e+03 3.2e+05 >> 0.0e+00 42 47 50 50 0 42 47 50 50 0 846 >> MatAssemblyBegin 1 1.0 1.5492e-0281.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 8.1615e-02 1.0 0.00e+00 0.0 1.0e+01 1.1e+05 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 1.5807e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.0809e+01 2.1 0.00e+00 0.0 2.0e+00 2.4e+06 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecDot 2950 1.0 2.0457e+01 1.9 3.47e+09 1.0 0.0e+00 0.0e+00 >> 3.0e+03 2 1 0 0 66 2 1 0 0 66 340 >> VecNorm 1477 1.0 1.2103e+01 1.7 1.74e+09 1.0 0.0e+00 0.0e+00 >> 1.5e+03 1 1 0 0 33 1 1 0 0 33 287 >> VecCopy 4 1.0 1.0110e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8855 1.0 6.0069e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4426 1.0 1.8430e+01 1.2 5.21e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 566 >> VecAYPX 2948 1.0 1.3610e+01 1.2 3.47e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 510 >> VecAssemblyBegin 6 1.0 9.1116e-0317.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 1.7405e-05 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2952 1.0 1.7966e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 194 >> VecScatterBegin 2951 1.0 8.6552e-01 1.1 0.00e+00 0.0 5.9e+03 3.2e+05 >> 0.0e+00 0 0100100 0 0 0100100 0 0 >> VecScatterEnd 2951 1.0 2.7126e+01 8.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> KSPSetup 1 1.0 3.9254e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 7.5170e+02 1.0 3.12e+11 1.0 5.9e+03 3.2e+05 >> 4.4e+03 92100100100 99 92100100100 99 811 >> PCSetUp 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2952 1.0 1.8043e+01 1.1 1.74e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 193 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 339744648 0 >> Vec 18 18 62239872 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 974736 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 1.21593e-06 >> Average time for MPI_Barrier(): 1.44005e-05 >> Average time for zero size MPI_Send(): 1.94311e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> Libraries compiled on Tue Nov 23 15:57:11 CET 2010 on wmss04 >> Machine characteristics: Linux wmss04 2.6.16.60-0.21-smp #1 SMP Tue May 6 >> 12:41:02 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux >> Using PETSc directory: /sun42/cheny/petsc-3.1-p5-optimized >> Using PETSc arch: linux-gnu-c-opt >> ----------------------------------------- >> Using C compiler: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall >> -Wwrite-strings -Wno-strict-aliasing -O >> Using Fortran compiler: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall >> -Wno-unused-variable -O >> ----------------------------------------- >> Using include paths: >> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include >> -I/sun42/cheny/petsc-3.1-p5-optimized/include >> -I/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/include >> ------------------------------------------ >> Using C linker: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpicc -Wall >> -Wwrite-strings -Wno-strict-aliasing -O >> Using Fortran linker: >> /sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/bin/mpif90 -Wall >> -Wno-unused-variable -O >> Using libraries: >> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib >> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lpetsc >> -Wl,-rpath,/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib >> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib -lHYPRE -lmpichcxx >> -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord >> -lparmetis -lmetis -lscalapack -lblacs -lflapack -lfblas -lnsl -laio -lrt >> -L/sun42/cheny/petsc-3.1-p5-optimized/linux-gnu-c-opt/lib >> -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2 >> -L/opt/intel/Compiler/11.0/083/ipp/em64t/lib >> -L/opt/intel/Compiler/11.0/083/mkl/lib/em64t >> -L/opt/intel/Compiler/11.0/083/tbb/em64t/cc4.1.0_libc2.4_kernel2.6.16.21/lib >> -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lpthread -lrt -lgcc_s -lmpichf90 >> -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich >> -lpthread -lrt -lgcc_s -ldl >> ------------------------------------------ >> >> >> ---------------------- >> (2) k=4 >> ---------------------- >> Process 0 of total 4 on wmss04 >> Process 2 of total 4 on wmss04 >> Process 3 of total 4 on wmss04 >> Process 1 of total 4 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 17:33:24 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.28342e-06 >> Norm of error 1.28342e-06, Iterations 1473 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 450.583 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 17:40:55 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 4 processors, by cheny >> Mon Dec 20 18:40:55 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 4.807e+02 1.00000 4.807e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 1.558e+11 1.06872 1.523e+11 6.091e+11 >> Flops/sec: 3.241e+08 1.06872 3.168e+08 1.267e+09 >> MPI Messages: 5.906e+03 2.00017 4.430e+03 1.772e+04 >> MPI Message Lengths: 1.727e+09 2.74432 2.658e+05 4.710e+09 >> MPI Reductions: 4.477e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 4.8066e+02 100.0% 6.0914e+11 100.0% 1.772e+04 >> 100.0% 2.658e+05 100.0% 4.461e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1474 1.0 1.9344e+02 1.1 7.40e+10 1.1 8.8e+03 2.7e+05 >> 0.0e+00 39 47 50 50 0 39 47 50 50 0 1494 >> MatMultTranspose 1473 1.0 1.9283e+02 1.0 7.40e+10 1.1 8.8e+03 2.7e+05 >> 0.0e+00 40 47 50 50 0 40 47 50 50 0 1498 >> MatAssemblyBegin 1 1.0 1.5624e-0263.8 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 6.3599e-02 1.0 0.00e+00 0.0 3.0e+01 9.3e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 1.8096e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.1063e+01 4.7 0.00e+00 0.0 6.0e+00 1.2e+06 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecDot 2946 1.0 2.5350e+01 2.7 1.73e+09 1.0 0.0e+00 0.0e+00 >> 2.9e+03 3 1 0 0 66 3 1 0 0 66 274 >> VecNorm 1475 1.0 1.1197e+01 3.0 8.69e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 1 1 0 0 33 1 1 0 0 33 310 >> VecCopy 4 1.0 6.0010e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8843 1.0 3.6737e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4420 1.0 1.4221e+01 1.4 2.60e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 2 0 0 0 3 2 0 0 0 732 >> VecAYPX 2944 1.0 1.1377e+01 1.1 1.73e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 610 >> VecAssemblyBegin 6 1.0 2.8596e-0223.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 2.4796e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2948 1.0 1.7210e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 202 >> VecScatterBegin 2947 1.0 1.9806e+00 2.4 0.00e+00 0.0 1.8e+04 2.7e+05 >> 0.0e+00 0 0100100 0 0 0100100 0 0 >> VecScatterEnd 2947 1.0 4.3833e+01 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 6 0 0 0 0 6 0 0 0 0 0 >> KSPSetup 1 1.0 2.1496e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 4.3931e+02 1.0 1.56e+11 1.1 1.8e+04 2.7e+05 >> 4.4e+03 91100100100 99 91100100100 99 1386 >> PCSetUp 1 1.0 3.0994e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2948 1.0 1.7256e+01 1.2 8.68e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 201 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 169902696 0 >> Vec 18 18 31282096 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 638616 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 1.5974e-06 >> Average time for MPI_Barrier(): 3.48091e-05 >> Average time for zero size MPI_Send(): 1.8537e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> >> ---------------------- >> (3) k=8 >> ---------------------- >> Process 0 of total 8 on wmss04 >> Process 4 of total 8 on wmss04 >> Process 2 of total 8 on wmss04 >> Process 6 of total 8 on wmss04 >> Process 3 of total 8 on wmss04 >> Process 7 of total 8 on wmss04 >> Process 1 of total 8 on wmss04 >> Process 5 of total 8 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 18:14:59 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.32502e-06 >> Norm of error 1.32502e-06, Iterations 1473 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 311.937 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 18:20:11 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 8 processors, by cheny >> Mon Dec 20 19:20:11 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 3.330e+02 1.00000 3.330e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 7.792e+10 1.09702 7.614e+10 6.091e+11 >> Flops/sec: 2.340e+08 1.09702 2.286e+08 1.829e+09 >> MPI Messages: 5.906e+03 2.00017 5.169e+03 4.135e+04 >> MPI Message Lengths: 1.866e+09 4.61816 2.430e+05 1.005e+10 >> MPI Reductions: 4.477e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 3.3302e+02 100.0% 6.0914e+11 100.0% 4.135e+04 >> 100.0% 2.430e+05 100.0% 4.461e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1474 1.0 1.4230e+02 1.4 3.70e+10 1.1 2.1e+04 2.4e+05 >> 0.0e+00 38 47 50 50 0 38 47 50 50 0 2031 >> MatMultTranspose 1473 1.0 1.3627e+02 1.1 3.70e+10 1.1 2.1e+04 2.4e+05 >> 0.0e+00 38 47 50 50 0 38 47 50 50 0 2120 >> MatAssemblyBegin 1 1.0 8.0800e-0324.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 5.3647e-02 1.0 0.00e+00 0.0 7.0e+01 8.5e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 2.1791e-04 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.0902e+0112.1 0.00e+00 0.0 1.4e+01 5.9e+05 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecDot 2946 1.0 3.5689e+01 7.6 8.67e+08 1.0 0.0e+00 0.0e+00 >> 2.9e+03 6 1 0 0 66 6 1 0 0 66 194 >> VecNorm 1475 1.0 8.1093e+00 4.0 4.34e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 1 1 0 0 33 1 1 0 0 33 428 >> VecCopy 4 1.0 5.2011e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8843 1.0 3.0491e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4420 1.0 9.2421e+00 1.6 1.30e+09 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 1127 >> VecAYPX 2944 1.0 6.8297e+00 1.5 8.67e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 1015 >> VecAssemblyBegin 6 1.0 2.6218e-0210.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 3.6240e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2948 1.0 9.6646e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 359 >> VecScatterBegin 2947 1.0 2.2599e+00 2.3 0.00e+00 0.0 4.1e+04 2.4e+05 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> VecScatterEnd 2947 1.0 7.7004e+0120.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 >> KSPSetup 1 1.0 1.4287e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 3.0090e+02 1.0 7.79e+10 1.1 4.1e+04 2.4e+05 >> 4.4e+03 90100100100 99 90100100100 99 2024 >> PCSetUp 1 1.0 4.0531e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2948 1.0 9.7001e+00 1.4 4.34e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 3 1 0 0 0 3 1 0 0 0 358 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 84944064 0 >> Vec 18 18 15741712 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 409008 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 3.38554e-06 >> Average time for MPI_Barrier(): 7.40051e-05 >> Average time for zero size MPI_Send(): 1.88947e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> >> ---------------------- >> (4) k=12 >> ---------------------- >> Process 1 of total 12 on wmss04 >> Process 5 of total 12 on wmss04 >> Process 2 of total 12 on wmss04 >> Process 9 of total 12 on wmss04 >> Process 6 of total 12 on wmss04 >> Process 7 of total 12 on wmss04 >> Process 10 of total 12 on wmss04 >> Process 3 of total 12 on wmss04 >> Process 11 of total 12 on wmss04 >> Process 4 of total 12 on wmss04 >> Process 8 of total 12 on wmss04 >> Process 0 of total 12 on wmss04 >> The dimension of Matrix A is n = 1177754 >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly.End Assembly. >> End Assembly. >> End Assembly. >> >> End Assembly. >> End Assembly. >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 17:56:36 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.28414e-06 >> Norm of error 1.28414e-06, Iterations 1473 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 291.503 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 18:01:28 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 12 processors, by cheny >> Mon Dec 20 19:01:28 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 3.089e+02 1.00012 3.089e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 5.197e+10 1.11689 5.074e+10 6.089e+11 >> Flops/sec: 1.683e+08 1.11689 1.643e+08 1.971e+09 >> MPI Messages: 5.906e+03 2.00017 5.415e+03 6.498e+04 >> MPI Message Lengths: 1.887e+09 6.23794 2.345e+05 1.524e+10 >> MPI Reductions: 4.477e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 3.0887e+02 100.0% 6.0890e+11 100.0% 6.498e+04 >> 100.0% 2.345e+05 100.0% 4.461e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1474 1.0 1.4069e+02 2.1 2.47e+10 1.1 3.2e+04 2.3e+05 >> 0.0e+00 35 47 50 50 0 35 47 50 50 0 2054 >> MatMultTranspose 1473 1.0 1.3272e+02 1.8 2.47e+10 1.1 3.2e+04 2.3e+05 >> 0.0e+00 34 47 50 50 0 34 47 50 50 0 2175 >> MatAssemblyBegin 1 1.0 6.4070e-0314.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 6.2698e-02 1.0 0.00e+00 0.0 1.1e+02 8.2e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 2.4605e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.1164e+0182.6 0.00e+00 0.0 2.2e+01 3.9e+05 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecDot 2946 1.0 1.1499e+0234.8 5.78e+08 1.0 0.0e+00 0.0e+00 >> 2.9e+03 13 1 0 0 66 13 1 0 0 66 60 >> VecNorm 1475 1.0 1.0804e+01 7.7 2.90e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 2 1 0 0 33 2 1 0 0 33 322 >> VecCopy 4 1.0 6.9451e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8843 1.0 2.9336e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4420 1.0 1.0803e+01 2.3 8.68e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 964 >> VecAYPX 2944 1.0 6.6637e+00 2.1 5.78e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 1041 >> VecAssemblyBegin 6 1.0 3.7719e-0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 5.3883e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2948 1.0 8.7972e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 395 >> VecScatterBegin 2947 1.0 3.3624e+00 4.3 0.00e+00 0.0 6.5e+04 2.3e+05 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> VecScatterEnd 2947 1.0 8.0508e+0119.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 12 0 0 0 0 12 0 0 0 0 0 >> KSPSetup 1 1.0 1.1752e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 2.8016e+02 1.0 5.20e+10 1.1 6.5e+04 2.3e+05 >> 4.4e+03 91100100100 99 91100100100 99 2173 >> PCSetUp 1 1.0 5.9605e-06 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2948 1.0 8.8313e+00 2.3 2.89e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 393 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 56593044 0 >> Vec 18 18 10534536 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 305424 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 6.48499e-06 >> Average time for MPI_Barrier(): 0.000102377 >> Average time for zero size MPI_Send(): 2.15967e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> ---------------------- >> (5) k=16 >> ---------------------- >> Process 0 of total 16 on wmss04 >> Process 8 of total 16 on wmss04 >> Process 4 of total 16 on wmss04 >> Process 12 of total 16 on wmss04 >> Process 2 of total 16 on wmss04 >> Process 6 of total 16 on wmss04 >> Process 5 of total 16 on wmss04 >> Process 11 of total 16 on wmss04 >> Process 14 of total 16 on wmss04 >> Process 7 of total 16 on wmss04 >> Process Process 15 of total 16 on wmss04 >> 3Process 13 of total 16 on wmss04 >> Process 10 of total 16 on wmss04 >> Process 9 of total 16 on wmss04 >> Process 1 of total 16 on wmss04 >> The dimension of Matrix A is n = 1177754 >> of total 16 on wmss04 >> >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> >> Begin Assembly: >> Begin Assembly: >> Begin Assembly: >> >> Begin Assembly: >> Begin Assembly: >> End Assembly. >> End Assembly.End Assembly. >> End Assembly.End Assembly.End Assembly.End Assembly. >> End Assembly. >> End Assembly. >> End Assembly.End Assembly. >> >> End Assembly. >> End Assembly. >> End Assembly. >> End Assembly.End Assembly. >> >> >> >> ========================================================= >> Begin the solving: >> ========================================================= >> The current time is: Mon Dec 20 18:02:28 2010 >> >> KSP Object: >> type: bicg >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-07, absolute=1e-50, divergence=10000 >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: >> type: jacobi >> linear system matrix = precond matrix: >> Matrix Object: >> type=mpisbaij, rows=1177754, cols=1177754 >> total: nonzeros=49908476, allocated nonzeros=49908476 >> block size is 1 >> >> norm(b-Ax)=1.15892e-06 >> Norm of error 1.15892e-06, Iterations 1497 >> ========================================================= >> The solver has finished successfully! >> ========================================================= >> The solving time is 337.91 seconds. >> The time accuracy is 1e-06 second. >> The current time is Mon Dec 20 18:08:06 2010 >> >> >> ************************************************************************************************************************ >> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >> -fCourier9' to print this document *** >> >> ************************************************************************************************************************ >> >> ---------------------------------------------- PETSc Performance Summary: >> ---------------------------------------------- >> >> ./AMG_Solver_MPI on a linux-gnu named wmss04 with 16 processors, by cheny >> Mon Dec 20 19:08:06 2010 >> Using Petsc Release Version 3.1.0, Patch 5, Mon Sep 27 11:51:54 CDT 2010 >> >> Max Max/Min Avg Total >> Time (sec): 3.534e+02 1.00001 3.534e+02 >> Objects: 3.000e+01 1.00000 3.000e+01 >> Flops: 3.964e+10 1.13060 3.864e+10 6.182e+11 >> Flops/sec: 1.122e+08 1.13060 1.093e+08 1.749e+09 >> MPI Messages: 1.200e+04 3.99917 7.127e+03 1.140e+05 >> MPI Message Lengths: 1.950e+09 7.80999 1.819e+05 2.074e+10 >> MPI Reductions: 4.549e+03 1.00000 >> >> Flop counting convention: 1 flop = 1 real number operation of type >> (multiply/divide/add/subtract) >> e.g., VecAXPY() for real vectors of length N >> --> 2N flops >> and VecAXPY() for complex vectors of length N >> --> 8N flops >> >> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >> --- -- Message Lengths -- -- Reductions -- >> Avg %Total Avg %Total counts >> %Total Avg %Total counts %Total >> 0: Main Stage: 3.5342e+02 100.0% 6.1820e+11 100.0% 1.140e+05 >> 100.0% 1.819e+05 100.0% 4.533e+03 99.6% >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> See the 'Profiling' chapter of the users' manual for details on >> interpreting output. >> Phase summary info: >> Count: number of times phase was executed >> Time and Flops: Max - maximum over all processors >> Ratio - ratio of maximum to minimum over all processors >> Mess: number of messages sent >> Avg. len: average message length >> Reduct: number of global reductions >> Global: entire computation >> Stage: stages of a computation. Set stages with PetscLogStagePush() and >> PetscLogStagePop(). >> %T - percent time in this phase %F - percent flops in this >> phase >> %M - percent messages in this phase %L - percent message lengths >> in this phase >> %R - percent reductions in this phase >> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >> over all processors) >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) >> Flops --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio Mess Avg len >> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> --- Event Stage 0: Main Stage >> >> MatMult 1498 1.0 1.8860e+02 1.7 1.88e+10 1.1 5.7e+04 1.8e+05 >> 0.0e+00 40 47 50 50 0 40 47 50 50 0 1555 >> MatMultTranspose 1497 1.0 1.4165e+02 1.3 1.88e+10 1.1 5.7e+04 1.8e+05 >> 0.0e+00 35 47 50 50 0 35 47 50 50 0 2069 >> MatAssemblyBegin 1 1.0 1.0044e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 7.3835e-02 1.0 0.00e+00 0.0 1.8e+02 6.7e+04 >> 1.2e+01 0 0 0 0 0 0 0 0 0 0 0 >> MatView 1 1.0 2.6107e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecView 1 1.0 1.1282e+01109.0 0.00e+00 0.0 3.0e+01 2.9e+05 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> VecDot 2994 1.0 6.7490e+0119.6 4.41e+08 1.0 0.0e+00 0.0e+00 >> 3.0e+03 10 1 0 0 66 10 1 0 0 66 104 >> VecNorm 1499 1.0 1.3431e+0110.8 2.21e+08 1.0 0.0e+00 0.0e+00 >> 1.5e+03 2 1 0 0 33 2 1 0 0 33 263 >> VecCopy 4 1.0 7.3178e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 8987 1.0 3.1772e+00 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> VecAXPY 4492 1.0 1.1361e+01 3.1 6.61e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 931 >> VecAYPX 2992 1.0 7.3248e+00 2.5 4.40e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 1 0 0 0 1 1 0 0 0 962 >> VecAssemblyBegin 6 1.0 3.6338e-0212.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 1.8e+01 0 0 0 0 0 0 0 0 0 0 0 >> VecAssemblyEnd 6 1.0 7.2002e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecPointwiseMult 2996 1.0 9.7892e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 360 >> VecScatterBegin 2995 1.0 4.0570e+00 5.5 0.00e+00 0.0 1.1e+05 1.8e+05 >> 0.0e+00 1 0100100 0 1 0100100 0 0 >> VecScatterEnd 2995 1.0 1.7309e+0251.3 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 22 0 0 0 0 22 0 0 0 0 0 >> KSPSetup 1 1.0 1.3058e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 1 1.0 3.2641e+02 1.0 3.96e+10 1.1 1.1e+05 1.8e+05 >> 4.5e+03 92100100100 99 92100100100 99 1893 >> PCSetUp 1 1.0 8.1062e-06 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> PCApply 2996 1.0 9.8336e+00 2.4 2.21e+08 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 1 0 0 0 2 1 0 0 0 359 >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> Memory usage is given in bytes: >> >> Object Type Creations Destructions Memory Descendants' >> Mem. >> Reports information only for process 0. >> >> --- Event Stage 0: Main Stage >> >> Matrix 3 3 42424600 0 >> Vec 18 18 7924896 0 >> Vec Scatter 2 2 1736 0 >> Index Set 4 4 247632 0 >> Krylov Solver 1 1 832 0 >> Preconditioner 1 1 872 0 >> Viewer 1 1 544 0 >> >> ======================================================================================================================== >> Average time to get PetscTime(): 6.10352e-06 >> Average time for MPI_Barrier(): 0.000129986 >> Average time for zero size MPI_Send(): 2.08169e-05 >> #PETSc Option Table entries: >> -ksp_type bicg >> -log_summary >> -pc_type jacobi >> #End of PETSc Option Table entries >> Compiled without FORTRAN kernels >> Compiled with full precision matrices (default) >> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >> sizeof(PetscScalar) 8 >> Configure run at: Tue Nov 23 15:54:45 2010 >> Configure options: --known-level1-dcache-size=65536 >> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=2 >> --known-memcmp-ok=1 --known-sizeof-char=1 --known-sizeof-void-p=8 >> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8 >> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-double=8 >> --known-sizeof-size_t=8 --known-bits-per-byte=8 --known-sizeof-MPI_Comm=4 >> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --with-cc=gcc >> --with-cxx=g++ --with-F77=ifort --with-FC=ifort --download-f-blas-lapack=1 >> --download-superlu-dist=1 --download-hypre=1 --download-trilinos=1 >> --download-parmetis=1 --download-mumps=1 --download-scalapack=1 >> --download-blacs=1 --download-mpich=1 --with-debugging=0 --with-batch >> --known-mpi-shared=1 >> ----------------------------------------- >> >> >> >> >> On Mon, Dec 20, 2010 at 6:06 PM, Matthew Knepley <knepley at gmail.com>wrote: >> >>> On Mon, Dec 20, 2010 at 8:46 AM, Yongjun Chen <yjxd.chen at gmail.com>wrote: >>> >>>> >>>> Hi everyone, >>>> >>>> >>>> I use PETSC (version 3.1-p5) to solve a linear problem Ax=b. The matrix >>>> A and right hand vector b are read from files. The dimension of A is >>>> 1.2Million*1.2Million. I am pretty sure the matrix A and vector b have been >>>> read correctly. >>>> >>>> I compiled the program with optimized version (--with-debugging=0), >>>> tested the speed up performance on two servers, and I have found that the >>>> performance is very poor. >>>> >>>> For the two servers, one is 4 cpus * 4 cores per cpu, i.e., with a total >>>> 16 cores. And the other one is 4 cpus * 12 cores per cpu, with a total 48 >>>> cores. >>>> >>>> On each of them, with the increasing of computing cores k from 1 to 8 >>>> (mpiexec ?n k ./Solver_MPI -pc_type jacobi -ksp-type gmres), the speed up >>>> will increase from 1 to 6, but when the computing cores k increase from 9 >>>> to >>>> 16(for the first server) or 48 (for the second server), the speed up >>>> decrease firstly and then remains a constant value 5.0 (for the first >>>> server) or 4.5(for the second server). >>>> >>> >>> We cannot say anything at all without -log_summary data for your runs. >>> >>> Matt >>> >>> >>>> Actually, the program LAMMPS speed up excellently on these two >>>> servers. >>>> >>>> Any comments are very appreciated! Thanks! >>>> >>>> >>>> >>>> >>>> -------------------------------------------------------------------------------------------------------------------------- >>>> >>>> PS: the related codes are as following, >>>> >>>> >>>> //firstly read A and b from files >>>> >>>> ... >>>> >>>> //then >>>> >>>> >>>> >>>> ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>> CHKERRQ(ierr); >>>> >>>> ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>> CHKERRQ(ierr); >>>> >>>> ierr = VecAssemblyBegin(b); CHKERRQ(ierr); >>>> >>>> ierr = VecAssemblyEnd(b); CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = MatSetOption(A,MAT_SYMMETRIC,PETSC_TRUE); >>>> CHKERRQ(ierr); >>>> >>>> ierr = MatGetRowUpperTriangular(A); CHKERRQ(ierr); >>>> >>>> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = >>>> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr); >>>> >>>> ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); >>>> >>>> ierr = >>>> KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr); >>>> >>>> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = >>>> KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = KSPGetSolution(ksp, &x);CHKERRQ(ierr); >>>> >>>> >>>> >>>> ierr = VecAssemblyBegin(x);CHKERRQ(ierr); >>>> >>>> ierr = VecAssemblyEnd(x);CHKERRQ(ierr); >>>> >>>> ... >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> >> >> -- >> Dr.Yongjun Chen >> Room 2507, Building M >> Institute of Materials Science and Technology >> Technical University of Hamburg-Harburg >> Ei?endorfer Stra?e 42, 21073 Hamburg, Germany. >> Tel: +49 (0)40-42878-4386 >> Fax: +49 (0)40-42878-4070 >> E-mail: yjxd.chen at gmail.com >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101220/79c6179b/attachment-0001.htm>
