OK, I have thought that space was a typo. btw, this option does not show up in -h. I changed number of ranks to use all cores on each node to avoid misleading ratio in -log_view. Since one node has 36 cores, I ran with 6^3=216 ranks, and 12^3=1728 ranks. I also found call counts of MatSOR etc in the two tests were different. So they are not strict weak scaling tests. I tried to add -ksp_max_it 6 -pc_mg_levels 6, but still could not make the two have the same MatSOR count. Anyway, I attached the load balance output.
I find PCApply_MG calls PCMGMCycle_Private, which is recursive and indirectly calls MatSOR_MPIAIJ. I believe the following code in MatSOR_MPIAIJ practically syncs {MatSOR, MatMultAdd}_SeqAIJ between processors through VecScatter at each MG level. If SOR and MatMultAdd are imbalanced, the cost is accumulated along MG levels and shows up as large VecScatter cost. 1460: while (its--) {1461: VecScatterBegin <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterBegin.html#VecScatterBegin>(mat->Mvctx,xx,mat->lvec,INSERT_VALUES <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/INSERT_VALUES.html#INSERT_VALUES>,SCATTER_FORWARD <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1462: VecScatterEnd <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterEnd.html#VecScatterEnd>(mat->Mvctx,xx,mat->lvec,INSERT_VALUES <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/INSERT_VALUES.html#INSERT_VALUES>,SCATTER_FORWARD <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); 1464: /* update rhs: bb1 = bb - B*x */1465: VecScale <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScale.html#VecScale>(mat->lvec,-1.0);1466: (*mat->B->ops->multadd)(mat->B,mat->lvec,bb,bb1); 1468: /* local sweep */1469: (*mat->A->ops->sor)(mat->A,bb1,omega,SOR_SYMMETRIC_SWEEP <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSORType.html#MatSORType>,fshift,lits,1,xx);1470: } --Junchao Zhang On Thu, Jun 7, 2018 at 3:11 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > On Jun 7, 2018, at 12:27 PM, Zhang, Junchao <jczh...@mcs.anl.gov> wrote: > > > > Searched but could not find this option, -mat_view::load_balance > > There is a space between the view and the : load_balance is a > particular viewer format that causes the printing of load balance > information about number of nonzeros in the matrix. > > Barry > > > > > --Junchao Zhang > > > > On Thu, Jun 7, 2018 at 10:46 AM, Smith, Barry F. <bsm...@mcs.anl.gov> > wrote: > > So the only surprise in the results is the SOR. It is embarrassingly > parallel and normally one would not see a jump. > > > > The load balance for SOR time 1.5 is better at 1000 processes than for > 125 processes of 2.1 not worse so this number doesn't easily explain it. > > > > Could you run the 125 and 1000 with -mat_view ::load_balance and see > what you get out? > > > > Thanks > > > > Barry > > > > Notice that the MatSOR time jumps a lot about 5 secs when the -log_sync > is on. My only guess is that the MatSOR is sharing memory bandwidth (or > some other resource? cores?) with the VecScatter and for some reason this > is worse for 1000 cores but I don't know why. > > > > > On Jun 6, 2018, at 9:13 PM, Junchao Zhang <jczh...@mcs.anl.gov> wrote: > > > > > > Hi, PETSc developers, > > > I tested Michael Becker's code. The code calls the same KSPSolve 1000 > times in the second stage and needs cubic number of processors to run. I > ran with 125 ranks and 1000 ranks, with or without -log_sync option. I > attach the log view output files and a scaling loss excel file. > > > I profiled the code with 125 processors. It looks {MatSOR, MatMult, > MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c took > ~50% of the time, The other half time was spent on waiting in MPI. > MatSOR_SeqAIJ took 30%, mostly in PetscSparseDenseMinusDot(). > > > I tested it on a 36 cores/node machine. I found 32 ranks/node gave > better performance (about 10%) than 36 ranks/node in the 125 ranks > testing. I guess it is because processors in the former had more balanced > memory bandwidth. I collected PAPI_DP_OPS (double precision operations) and > PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files). > It looks ranks at the two ends have less DP_OPS and TOT_CYC. > > > Does anyone familiar with the algorithm have quick explanations? > > > > > > --Junchao Zhang > > > > > > On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker < > michael.bec...@physik.uni-giessen.de> wrote: > > > Hello again, > > > > > > this took me longer than I anticipated, but here we go. > > > I did reruns of the cases where only half the processes per node were > used (without -log_sync): > > > > > > 125 procs,1st 125 procs,2nd > 1000 procs,1st 1000 procs,2nd > > > Max Ratio Max Ratio Max > Ratio Max Ratio > > > KSPSolve 1.203E+02 1.0 1.210E+02 1.0 > 1.399E+02 1.1 1.365E+02 1.0 > > > VecTDot 6.376E+00 3.7 6.551E+00 4.0 > 7.885E+00 2.9 7.175E+00 3.4 > > > VecNorm 4.579E+00 7.1 5.803E+00 10.2 > 8.534E+00 6.9 6.026E+00 4.9 > > > VecScale 1.070E-01 2.1 1.129E-01 2.2 > 1.301E-01 2.5 1.270E-01 2.4 > > > VecCopy 1.123E-01 1.3 1.149E-01 1.3 > 1.301E-01 1.6 1.359E-01 1.6 > > > VecSet 7.063E-01 1.7 6.968E-01 1.7 > 7.432E-01 1.8 7.425E-01 1.8 > > > VecAXPY 1.166E+00 1.4 1.167E+00 1.4 > 1.221E+00 1.5 1.279E+00 1.6 > > > VecAYPX 1.317E+00 1.6 1.290E+00 1.6 > 1.536E+00 1.9 1.499E+00 2.0 > > > VecScatterBegin 6.142E+00 3.2 5.974E+00 2.8 > 6.448E+00 3.0 6.472E+00 2.9 > > > VecScatterEnd 3.606E+01 4.2 3.551E+01 4.0 > 5.244E+01 2.7 4.995E+01 2.7 > > > MatMult 3.561E+01 1.6 3.403E+01 1.5 > 3.435E+01 1.4 3.332E+01 1.4 > > > MatMultAdd 1.124E+01 2.0 1.130E+01 2.1 > 2.093E+01 2.9 1.995E+01 2.7 > > > MatMultTranspose 1.372E+01 2.5 1.388E+01 2.6 > 1.477E+01 2.2 1.381E+01 2.1 > > > MatSolve 1.949E-02 0.0 1.653E-02 0.0 > 4.789E-02 0.0 4.466E-02 0.0 > > > MatSOR 6.610E+01 1.3 6.673E+01 1.3 > 7.111E+01 1.3 7.105E+01 1.3 > > > MatResidual 2.647E+01 1.7 2.667E+01 1.7 > 2.446E+01 1.4 2.467E+01 1.5 > > > PCSetUpOnBlocks 5.266E-03 1.4 5.295E-03 1.4 > 5.427E-03 1.5 5.289E-03 1.4 > > > PCApply 1.031E+02 1.0 1.035E+02 1.0 > 1.180E+02 1.0 1.164E+02 1.0 > > > > > > I also slimmed down my code and basically wrote a simple weak scaling > test (source files attached) so you can profile it yourself. I appreciate > the offer Junchao, thank you. > > > You can adjust the system size per processor at runtime via > "-nodes_per_proc 30" and the number of repeated calls to the function > containing KSPsolve() via "-iterations 1000". The physical problem is > simply calculating the electric potential from a homogeneous charge > distribution, done multiple times to accumulate time in KSPsolve(). > > > A job would be started using something like > > > mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4 > -iterations 1000 \\ > > > -ksp_rtol 1E-6 \ > > > -log_view -log_sync\ > > > -pc_type gamg -pc_gamg_type classical\ > > > -ksp_type cg \ > > > -ksp_norm_type unpreconditioned \ > > > -mg_levels_ksp_type richardson \ > > > -mg_levels_ksp_norm_type none \ > > > -mg_levels_pc_type sor \ > > > -mg_levels_ksp_max_it 1 \ > > > -mg_levels_pc_sor_its 1 \ > > > -mg_levels_esteig_ksp_type cg \ > > > -mg_levels_esteig_ksp_max_it 10 \ > > > -gamg_est_ksp_type cg > > > , ideally started on a cube number of processes for a cubical process > grid. > > > Using 125 processes and 10.000 iterations I get the output in > "log_view_125_new.txt", which shows the same imbalance for me. > > > Michael > > > > > > > > > Am 02.06.2018 um 13:40 schrieb Mark Adams: > > >> > > >> > > >> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <jczh...@mcs.anl.gov> > wrote: > > >> Hi,Michael, > > >> You can add -log_sync besides -log_view, which adds barriers to > certain events but measures barrier time separately from the events. I find > this option makes it easier to interpret log_view output. > > >> > > >> That is great (good to know). > > >> > > >> This should give us a better idea if your large VecScatter costs are > from slow communication or if it catching some sort of load imbalance. > > >> > > >> > > >> --Junchao Zhang > > >> > > >> On Wed, May 30, 2018 at 3:27 AM, Michael Becker < > michael.bec...@physik.uni-giessen.de> wrote: > > >> Barry: On its way. Could take a couple days again. > > >> > > >> Junchao: I unfortunately don't have access to a cluster with a faster > network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking fat-tree > network, which I realize causes parallel slowdown if the nodes are not > connected to the same switch. Each node has 24 processors (2x12/socket) and > four NUMA domains (two for each socket). > > >> The ranks are usually not distributed perfectly even, i.e. for 125 > processes, of the six required nodes, five would use 21 cores and one 20. > > >> Would using another CPU type make a difference communication-wise? I > could switch to faster ones (on the same network), but I always assumed > this would only improve performance of the stuff that is unrelated to > communication. > > >> > > >> Michael > > >> > > >> > > >> > > >>> The log files have something like "Average time for zero size > MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow > network. A typical machine should give less than 1/10 of the latency you > have. An easy way to try is just running the code on a machine with a > faster network and see what happens. > > >>> > > >>> Also, how many cores & numa domains does a compute node have? I > could not figure out how you distributed the 125 MPI ranks evenly. > > >>> > > >>> --Junchao Zhang > > >>> > > >>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker < > michael.bec...@physik.uni-giessen.de> wrote: > > >>> Hello again, > > >>> > > >>> here are the updated log_view files for 125 and 1000 processors. I > ran both problems twice, the first time with all processors per node > allocated ("-1.txt"), the second with only half on twice the number of > nodes ("-2.txt"). > > >>> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker < > michael.bec...@physik.uni-giessen.de> > > >>>>> wrote: > > >>>>> > > >>>>> I noticed that for every individual KSP iteration, six vector > objects are created and destroyed (with CG, more with e.g. GMRES). > > >>>>> > > >>>> Hmm, it is certainly not intended at vectors be created and > destroyed within each KSPSolve() could you please point us to the code that > makes you think they are being created and destroyed? We create all the > work vectors at KSPSetUp() and destroy them in KSPReset() not during the > solve. Not that this would be a measurable distance. > > >>>> > > >>> > > >>> I mean this, right in the log_view output: > > >>> > > >>>> Memory usage is given in bytes: > > >>>> > > >>>> Object Type Creations Destructions Memory Descendants' Mem. > > >>>> Reports information only for process 0. > > >>>> > > >>>> --- Event Stage 0: Main Stage > > >>>> > > >>>> ... > > >>>> > > >>>> --- Event Stage 1: First Solve > > >>>> > > >>>> ... > > >>>> > > >>>> --- Event Stage 2: Remaining Solves > > >>>> > > >>>> Vector 23904 23904 1295501184 0. > > >>> I logged the exact number of KSP iterations over the 999 timesteps > and its exactly 23904/6 = 3984. > > >>> Michael > > >>> > > >>> > > >>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.: > > >>>> > > >>>> Please send the log file for 1000 with cg as the solver. > > >>>> > > >>>> You should make a bar chart of each event for the two cases to > see which ones are taking more time and which are taking less (we cannot > tell with the two logs you sent us since they are for different solvers.) > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker < > michael.bec...@physik.uni-giessen.de> > > >>>>> wrote: > > >>>>> > > >>>>> I noticed that for every individual KSP iteration, six vector > objects are created and destroyed (with CG, more with e.g. GMRES). > > >>>>> > > >>>> Hmm, it is certainly not intended at vectors be created and > destroyed within each KSPSolve() could you please point us to the code that > makes you think they are being created and destroyed? We create all the > work vectors at KSPSetUp() and destroy them in KSPReset() not during the > solve. Not that this would be a measurable distance. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>> This seems kind of wasteful, is this supposed to be like this? Is > this even the reason for my problems? Apart from that, everything seems > quite normal to me (but I'm not the expert here). > > >>>>> > > >>>>> > > >>>>> Thanks in advance. > > >>>>> > > >>>>> Michael > > >>>>> > > >>>>> > > >>>>> > > >>>>> <log_view_125procs.txt><log_vi > > >>>>> ew_1000procs.txt> > > >>>>> > > >>> > > >>> > > >> > > >> > > >> > > > > > > > > > <o-wstest-125.txt><Scaling-loss.png><o-wstest-1000.txt>< > o-wstest-sync-125.txt><o-wstest-sync-1000.txt><MatSOR_ > SeqAIJ.png><PAPI_TOT_CYC.png><PAPI_DP_OPS.png> > > > > > >
using 216 of 216 processes 30^3 unknowns per processor total system size: 180^3 mesh size: 0.0001 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 186300 avg 188100 max 189000 Mat Object: 216 MPI processes type: mpiaij Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 161520 avg 188100 max 188520 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 156360 avg 177577 max 189000 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 75656 avg 87908 max 94500 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 75656 avg 87908 max 94500 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 201530 avg 237200 max 256500 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 201530 avg 237200 max 256500 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 85956 avg 102829 max 111569 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 54571 avg 64151 max 69123 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 84688 avg 107835 max 117713 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 83920 avg 107459 max 117667 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 20241 avg 25363 max 27748 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 6042 avg 7152 max 7637 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 3423 avg 5291 max 5994 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 3047 avg 4938 max 5691 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 1105 avg 1767 max 2171 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 284 avg 475 max 584 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 137 avg 484 max 972 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 484 max 7633 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 284 avg 475 max 584 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 413 max 6197 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 139 max 2244 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 34 max 614 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 24 max 752 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 24 max 5282 Mat Object: 216 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 34 max 614 initsolve: 7 iterations solve 1: 6 iterations solve 2: 6 iterations solve 3: 6 iterations solve 4: 6 iterations solve 5: 6 iterations solve 6: 6 iterations solve 7: 6 iterations solve 8: 6 iterations solve 9: 6 iterations solve 10: 6 iterations solve 20: 6 iterations solve 30: 6 iterations solve 40: 6 iterations solve 50: 6 iterations solve 60: 6 iterations solve 70: 6 iterations solve 80: 6 iterations solve 90: 6 iterations solve 100: 6 iterations solve 200: 6 iterations solve 300: 6 iterations solve 400: 6 iterations solve 500: 6 iterations solve 600: 6 iterations solve 700: 6 iterations solve 800: 6 iterations solve 900: 6 iterations solve 1000: 6 iterations Time in solve(): 89.4284 s Time in KSPSolve(): 89.1823 s (99.7248%) Number of KSP iterations (total): 6000 Number of solve iterations (total): 1000 (ratio: 6.00) ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./wstest on a intel-bdw-opt named bdw-0140 with 216 processors, by jczhang Thu Jun 7 17:04:25 2018 Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200 Max Max/Min Avg Total Time (sec): 1.916e+02 1.00001 1.916e+02 Objects: 3.044e+04 1.00003 3.044e+04 Flop: 3.177e+10 1.15810 3.035e+10 6.557e+12 Flop/sec: 1.658e+08 1.15810 1.584e+08 3.422e+10 MPI Messages: 1.594e+06 3.50605 1.083e+06 2.339e+08 MPI Message Lengths: 1.961e+09 2.19940 1.466e+03 3.428e+11 MPI Reductions: 3.258e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.0241e-01 0.1% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.1% 1: First Solve: 1.0204e+02 53.3% 9.8679e+09 0.2% 7.808e+05 0.3% 4.093e+03 0.9% 5.530e+02 1.7% 2: Remaining Solves: 8.9446e+01 46.7% 6.5467e+12 99.8% 2.331e+08 99.7% 1.457e+03 99.1% 3.200e+04 98.2% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecSet 2 1.0 6.4135e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 --- Event Stage 1: First Solve BuildTwoSided 10 1.0 3.3987e-03 1.7 0.00e+00 0.0 1.6e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0 BuildTwoSidedF 27 1.0 7.8870e+00 3.1 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 4 0 2 4 0 0 KSPSetUp 8 1.0 2.9860e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 0 0 0 3 0 KSPSolve 1 1.0 1.0204e+02 1.0 4.82e+07 1.2 7.8e+05 4.1e+03 5.5e+02 53 0 0 1 2 100100100100100 97 VecTDot 14 1.0 2.9919e-03 2.2 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 2 0 0 3 54578 VecNorm 9 1.0 1.2019e-03 1.8 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 1 0 0 2 87344 VecScale 35 1.0 3.3951e-04 2.7 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 48655 VecCopy 1 1.0 1.0705e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 154 1.0 1.9858e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 14 1.0 9.7609e-04 1.2 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 167297 VecAYPX 42 1.0 1.5566e-03 1.5 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 88739 VecAssemblyBegin 2 1.0 4.7922e-0522.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 2.9087e-0530.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 150 1.0 5.4379e-03 2.0 0.00e+00 0.0 2.7e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 0 35 12 0 0 VecScatterEnd 150 1.0 1.9689e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 43 1.0 2.1787e-02 1.2 1.05e+07 1.1 9.2e+04 2.1e+03 0.0e+00 0 0 0 0 0 0 22 12 6 0 99634 MatMultAdd 35 1.0 9.9871e-03 1.5 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 48362 MatMultTranspose 35 1.0 1.1008e-02 1.4 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 43876 MatSolve 7 0.0 2.2888e-04 0.0 8.72e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 381 MatSOR 70 1.0 5.0331e-02 1.1 1.90e+07 1.2 8.3e+04 1.6e+03 1.4e+01 0 0 0 0 0 0 40 11 4 3 77978 MatLUFactorSym 1 1.0 3.8791e-0428.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 3.1900e-0478.7 3.10e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 973 MatResidual 35 1.0 1.7441e-02 1.3 7.97e+06 1.2 8.3e+04 1.6e+03 0.0e+00 0 0 0 0 0 0 17 11 4 0 93440 MatAssemblyBegin 82 1.0 7.8904e+00 3.1 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 4 0 2 4 0 0 MatAssemblyEnd 82 1.0 7.4100e-02 1.0 0.00e+00 0.0 1.1e+05 6.2e+02 2.1e+02 0 0 0 0 1 0 0 15 2 38 0 MatGetRow 3100265 1.2 4.7804e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 24 0 0 0 0 45 0 0 0 0 0 MatGetRowIJ 1 0.0 3.3140e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 5 1.0 1.8501e-01 2.3 0.00e+00 0.0 1.0e+05 1.8e+04 1.0e+01 0 0 0 1 0 0 0 13 55 2 0 MatCreateSubMat 5 1.0 2.7853e-01 1.0 0.00e+00 0.0 3.6e+04 1.6e+04 8.4e+01 0 0 0 0 0 0 0 5 18 15 0 MatGetOrdering 1 0.0 1.4496e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatIncreaseOvrlp 5 1.0 3.0473e-02 1.2 0.00e+00 0.0 4.8e+04 1.0e+03 1.0e+01 0 0 0 0 0 0 0 6 2 2 0 MatCoarsen 5 1.0 9.5112e-03 1.1 0.00e+00 0.0 9.2e+04 6.3e+02 3.0e+01 0 0 0 0 0 0 0 12 2 5 0 MatZeroEntries 5 1.0 1.7691e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 26 1.0 5.6732e-01 1.0 0.00e+00 0.0 3.3e+04 1.7e+04 5.1e+01 0 0 0 0 0 1 0 4 18 9 0 MatPtAP 5 1.0 1.3221e-01 1.0 1.13e+07 1.3 1.2e+05 2.7e+03 8.2e+01 0 0 0 0 0 0 23 15 10 15 16915 MatPtAPSymbolic 5 1.0 8.2783e-02 1.0 0.00e+00 0.0 6.1e+04 2.8e+03 3.5e+01 0 0 0 0 0 0 0 8 5 6 0 MatPtAPNumeric 5 1.0 4.9810e-02 1.0 1.13e+07 1.3 5.5e+04 2.6e+03 4.5e+01 0 0 0 0 0 0 23 7 4 8 44898 MatGetLocalMat 5 1.0 2.6979e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 5 1.0 4.0371e-03 1.5 0.00e+00 0.0 3.6e+04 3.7e+03 0.0e+00 0 0 0 0 0 0 0 5 4 0 0 SFSetGraph 10 1.0 9.2030e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 10 1.0 5.9166e-03 1.1 0.00e+00 0.0 4.8e+04 6.4e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0 SFBcastBegin 40 1.0 1.4107e-03 1.8 0.00e+00 0.0 9.4e+04 7.4e+02 0.0e+00 0 0 0 0 0 0 0 12 2 0 0 SFBcastEnd 40 1.0 2.5785e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 GAMG: createProl 5 1.0 1.0119e+02 1.0 0.00e+00 0.0 3.6e+05 5.4e+03 2.6e+02 53 0 0 1 1 99 0 45 60 46 0 GAMG: partLevel 5 1.0 1.4521e-01 1.0 1.13e+07 1.3 1.2e+05 2.6e+03 1.9e+02 0 0 0 0 1 0 23 15 10 34 15401 repartition 2 1.0 9.1791e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0 Invert-Sort 2 1.0 6.6185e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 Move A 2 1.0 3.4759e-03 1.1 0.00e+00 0.0 1.5e+03 9.0e+02 3.6e+01 0 0 0 0 0 0 0 0 0 7 0 Move P 2 1.0 7.2892e-03 1.0 0.00e+00 0.0 1.7e+03 1.7e+01 3.6e+01 0 0 0 0 0 0 0 0 0 7 0 PCSetUp 2 1.0 1.0135e+02 1.0 1.13e+07 1.3 4.7e+05 4.7e+03 4.7e+02 53 0 0 1 1 99 23 61 70 85 22 PCSetUpOnBlocks 7 1.0 1.0257e-03 5.0 3.10e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 302 PCApply 7 1.0 8.5200e-02 1.0 3.18e+07 1.2 2.6e+05 1.3e+03 1.4e+01 0 0 0 0 0 0 66 34 10 3 76535 --- Event Stage 2: Remaining Solves KSPSolve 1000 1.0 8.9193e+01 1.0 3.17e+10 1.2 2.3e+08 1.5e+03 3.2e+04 47100100 99 98 100100100100100 73399 VecTDot 12000 1.0 5.0107e+00 1.3 6.48e+08 1.0 0.0e+00 0.0e+00 1.2e+04 2 2 0 0 37 5 2 0 0 38 27933 VecNorm 8000 1.0 2.0433e+00 1.1 4.32e+08 1.0 0.0e+00 0.0e+00 8.0e+03 1 1 0 0 25 2 1 0 0 25 45667 VecScale 30000 1.0 1.7645e-01 1.7 8.12e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 80243 VecCopy 1000 1.0 8.1942e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 108000 1.0 1.3471e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 12000 1.0 8.1873e-01 1.2 6.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 170957 VecAYPX 36000 1.0 1.1726e+00 1.3 5.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 100259 VecScatterBegin 127000 1.0 4.3927e+00 2.1 0.00e+00 0.0 2.3e+08 1.5e+03 0.0e+00 2 0100 99 0 4 0100100 0 0 VecScatterEnd 127000 1.0 2.1218e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 17 0 0 0 0 0 MatMult 37000 1.0 1.9416e+01 1.2 9.03e+09 1.1 7.9e+07 2.1e+03 0.0e+00 9 29 34 49 0 19 29 34 49 0 96389 MatMultAdd 30000 1.0 1.1328e+01 1.7 2.06e+09 1.3 4.1e+07 7.1e+02 0.0e+00 4 6 18 9 0 10 6 18 9 0 36548 MatMultTranspose 30000 1.0 1.0679e+01 1.6 2.06e+09 1.3 4.1e+07 7.1e+02 0.0e+00 4 6 18 9 0 9 6 18 9 0 38767 MatSolve 6000 0.0 1.0994e-01 0.0 7.48e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 680 MatSOR 60000 1.0 4.4873e+01 1.1 1.63e+10 1.2 7.1e+07 1.6e+03 1.2e+04 22 51 31 33 37 48 51 31 33 38 74798 MatResidual 30000 1.0 1.5853e+01 1.2 6.83e+09 1.2 7.1e+07 1.6e+03 0.0e+00 7 21 31 33 0 16 21 31 33 0 88112 PCSetUpOnBlocks 6000 1.0 9.1378e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 6000 1.0 7.7131e+01 1.0 2.72e+10 1.2 2.3e+08 1.3e+03 1.2e+04 40 85 96 83 37 86 85 97 84 38 72361 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 1 8 10120 0. DMKSP interface 1 1 656 0. Vector 4 45 2361256 0. Matrix 0 59 14313348 0. Distributed Mesh 1 1 5248 0. Index Set 2 14 247728 0. IS L to G Mapping 1 1 131728 0. Star Forest Graph 2 2 1728 0. Discrete System 1 1 932 0. Vec Scatter 1 12 231168 0. Preconditioner 1 8 8692 0. Viewer 1 2 1680 0. Application Order 0 1 46656664 0. --- Event Stage 1: First Solve Krylov Solver 7 0 0 0. Vector 137 96 3375264 0. Matrix 124 65 27659940 0. Matrix Coarsen 5 5 3180 0. Index Set 102 90 24085864 0. Star Forest Graph 10 10 8640 0. Vec Scatter 28 17 21488 0. Preconditioner 7 0 0 0. Viewer 2 0 0 0. Application Order 1 0 0 0. --- Event Stage 2: Remaining Solves Vector 30000 30000 1940160000 0. ======================================================================================================================== Average time to get PetscTime(): 6.19888e-07 Average time for MPI_Barrier(): 1.00136e-05 Average time for zero size MPI_Send(): 6.69007e-06 #PETSc Option Table entries: -gamg_est_ksp_type cg -iterations 1000 -ksp_norm_type unpreconditioned -ksp_rtol 1E-6 -ksp_type cg -log_view -mat_view ::load_balance -mesh_size 1E-4 -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_max_it 1 -mg_levels_ksp_norm_type none -mg_levels_ksp_type richardson -mg_levels_pc_sor_its 1 -mg_levels_pc_type sor -nodes_per_proc 30 -pc_gamg_type classical -pc_mg_levels 6 -pc_type gamg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc ----------------------------------------- Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core Using PETSc directory: /home/jczhang/petsc Using PETSc arch: intel-bdw-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp ----------------------------------------- Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -----------------------------------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node. using 1728 of 1728 processes 30^3 unknowns per processor total system size: 360^3 mesh size: 0.0001 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 186300 avg 188550 max 189000 Mat Object: 1728 MPI processes type: mpiaij Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 161490 avg 188550 max 188850 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 156360 avg 183219 max 189000 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 75656 avg 91164 max 94500 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 75656 avg 91164 max 94500 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 201530 avg 246725 max 256500 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 201530 avg 246725 max 256500 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 85956 avg 107132 max 111569 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 54571 avg 66550 max 69123 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 84688 avg 112657 max 117713 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 83920 avg 112441 max 117667 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 20241 avg 26366 max 27748 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 6042 avg 7328 max 7637 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 3423 avg 5508 max 5994 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 3047 avg 5197 max 5691 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 1105 avg 1934 max 2180 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 284 avg 479 max 584 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 137 avg 542 max 972 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 542 max 8392 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 284 avg 479 max 584 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 493 max 7084 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 145 max 2349 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 31 max 670 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 24 max 1100 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 24 max 42986 Mat Object: 1728 MPI processes type: mpiaij Load Balance - Nonzeros: Min 0 avg 31 max 670 initsolve: 8 iterations solve 1: 6 iterations solve 2: 6 iterations solve 3: 6 iterations solve 4: 6 iterations solve 5: 6 iterations solve 6: 6 iterations solve 7: 6 iterations solve 8: 6 iterations solve 9: 6 iterations solve 10: 6 iterations solve 20: 6 iterations solve 30: 6 iterations solve 40: 6 iterations solve 50: 6 iterations solve 60: 6 iterations solve 70: 6 iterations solve 80: 6 iterations solve 90: 6 iterations solve 100: 6 iterations solve 200: 6 iterations solve 300: 6 iterations solve 400: 6 iterations solve 500: 6 iterations solve 600: 6 iterations solve 700: 6 iterations solve 800: 6 iterations solve 900: 6 iterations solve 1000: 6 iterations Time in solve(): 120.025 s Time in KSPSolve(): 119.738 s (99.7606%) Number of KSP iterations (total): 6000 Number of solve iterations (total): 1000 (ratio: 6.00) ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./wstest on a intel-bdw-opt named bdw-0545 with 1728 processors, by jczhang Thu Jun 7 17:05:39 2018 Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200 Max Max/Min Avg Total Time (sec): 2.315e+02 1.00001 2.315e+02 Objects: 3.544e+04 1.00003 3.544e+04 Flop: 3.637e+10 1.16136 3.554e+10 6.141e+13 Flop/sec: 1.571e+08 1.16136 1.535e+08 2.653e+11 MPI Messages: 2.226e+06 4.17170 1.509e+06 2.608e+09 MPI Message Lengths: 2.235e+09 2.20450 1.340e+03 3.494e+12 MPI Reductions: 3.560e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 8.5928e-02 0.0% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0% 1: First Solve: 1.1133e+02 48.1% 8.9706e+10 0.1% 8.086e+06 0.3% 3.671e+03 0.8% 5.810e+02 1.6% 2: Remaining Solves: 1.2004e+02 51.9% 6.1318e+13 99.9% 2.600e+09 99.7% 1.332e+03 99.1% 3.500e+04 98.3% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecSet 2 1.0 1.2875e-04 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 --- Event Stage 1: First Solve BuildTwoSided 10 1.0 4.9443e-03 1.6 0.00e+00 0.0 1.6e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0 BuildTwoSidedF 27 1.0 1.1099e+01 4.1 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 4 0 1 4 0 0 KSPSetUp 8 1.0 1.9672e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 0 0 0 3 0 KSPSolve 1 1.0 1.1133e+02 1.0 6.83e+07 1.5 8.1e+06 3.7e+03 5.8e+02 48 0 0 1 2 100100100100100 806 VecTDot 16 1.0 9.3598e-03 1.7 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 2 0 0 3 159508 VecNorm 10 1.0 3.8018e-03 2.8 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 1 0 0 2 245440 VecScale 40 1.0 2.3422e-0320.4 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 72283 VecCopy 1 1.0 1.5903e-04 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 172 1.0 3.5458e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 16 1.0 1.1342e-03 1.3 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 1316389 VecAYPX 48 1.0 1.8997e-03 1.7 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 671749 VecAssemblyBegin 2 1.0 5.9843e-0562.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 7.6056e-0579.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 171 1.0 6.8316e-03 2.3 0.00e+00 0.0 3.0e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 0 37 14 0 0 VecScatterEnd 171 1.0 6.3600e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 49 1.0 3.5715e-02 1.7 1.19e+07 1.1 1.0e+06 2.0e+03 0.0e+00 0 0 0 0 0 0 23 12 7 0 565174 MatMultAdd 40 1.0 4.9321e-02 4.7 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 92805 MatMultTranspose 40 1.0 2.4180e-02 2.9 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 189301 MatSolve 8 0.0 1.4651e-03 0.0 1.89e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1293 MatSOR 80 1.0 7.9000e-02 1.3 2.18e+07 1.2 9.2e+05 1.5e+03 1.6e+01 0 0 0 0 0 0 41 11 5 3 464960 MatLUFactorSym 1 1.0 4.4470e-03373.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 1 1.0 1.3872e-024848.7 2.12e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1532 MatResidual 40 1.0 3.0815e-02 2.0 9.11e+06 1.2 9.2e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 17 11 5 0 497042 MatAssemblyBegin 82 1.0 1.1102e+01 4.1 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 4 0 1 4 0 0 MatAssemblyEnd 82 1.0 1.2929e-01 1.1 0.00e+00 0.0 1.1e+06 5.2e+02 2.1e+02 0 0 0 0 1 0 0 14 2 36 0 MatGetRow 3100266 1.2 5.0643e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 43 0 0 0 0 0 MatGetRowIJ 1 0.0 1.6308e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCreateSubMats 5 1.0 1.9433e-01 2.2 0.00e+00 0.0 1.0e+06 1.6e+04 1.0e+01 0 0 0 0 0 0 0 13 56 2 0 MatCreateSubMat 5 1.0 1.8586e+00 1.0 0.00e+00 0.0 3.7e+05 1.3e+04 8.4e+01 1 0 0 0 0 2 0 5 16 14 0 MatGetOrdering 1 0.0 4.2415e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatIncreaseOvrlp 5 1.0 8.5395e-02 1.1 0.00e+00 0.0 4.6e+05 9.9e+02 1.0e+01 0 0 0 0 0 0 0 6 2 2 0 MatCoarsen 5 1.0 2.5278e-02 1.2 0.00e+00 0.0 9.7e+05 5.5e+02 5.2e+01 0 0 0 0 0 0 0 12 2 9 0 MatZeroEntries 5 1.0 1.6418e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 26 1.0 3.8725e+00 1.0 0.00e+00 0.0 3.3e+05 1.4e+04 5.1e+01 2 0 0 0 0 3 0 4 16 9 0 MatPtAP 5 1.0 2.0472e-01 1.0 1.11e+07 1.3 1.1e+06 2.5e+03 8.3e+01 0 0 0 0 0 0 21 14 9 14 89957 MatPtAPSymbolic 5 1.0 1.2353e-01 1.0 0.00e+00 0.0 5.8e+05 2.7e+03 3.5e+01 0 0 0 0 0 0 0 7 5 6 0 MatPtAPNumeric 5 1.0 8.0794e-02 1.0 1.11e+07 1.3 5.5e+05 2.3e+03 4.5e+01 0 0 0 0 0 0 21 7 4 8 227941 MatGetLocalMat 5 1.0 2.8760e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 5 1.0 4.8778e-03 1.8 0.00e+00 0.0 3.4e+05 3.4e+03 0.0e+00 0 0 0 0 0 0 0 4 4 0 0 SFSetGraph 10 1.0 1.1182e-0419.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 10 1.0 8.0597e-03 1.2 0.00e+00 0.0 4.8e+05 5.8e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0 SFBcastBegin 62 1.0 2.1942e-03 2.3 0.00e+00 0.0 1.0e+06 6.4e+02 0.0e+00 0 0 0 0 0 0 0 12 2 0 0 SFBcastEnd 62 1.0 6.9718e-03 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 GAMG: createProl 5 1.0 1.0694e+02 1.0 0.00e+00 0.0 3.6e+06 5.1e+03 2.8e+02 46 0 0 1 1 96 0 44 61 48 0 GAMG: partLevel 5 1.0 2.7904e-01 1.0 1.11e+07 1.3 1.2e+06 2.4e+03 1.9e+02 0 0 0 0 1 0 21 14 10 33 65998 repartition 2 1.0 1.8520e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0 Invert-Sort 2 1.0 4.2000e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 Move A 2 1.0 4.0763e-02 1.0 0.00e+00 0.0 1.6e+04 7.9e+02 3.6e+01 0 0 0 0 0 0 0 0 0 6 0 Move P 2 1.0 2.8355e-02 1.1 0.00e+00 0.0 2.2e+04 1.3e+01 3.6e+01 0 0 0 0 0 0 0 0 0 6 0 PCSetUp 2 1.0 1.0727e+02 1.0 2.98e+07 3.5 4.8e+06 4.4e+03 4.9e+02 46 0 0 1 1 96 21 59 71 85 172 PCSetUpOnBlocks 8 1.0 1.8798e-02202.7 2.12e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1130 PCApply 8 1.0 1.5085e-01 1.0 5.39e+07 1.8 2.9e+06 1.2e+03 1.6e+01 0 0 0 0 0 0 68 36 11 3 405880 --- Event Stage 2: Remaining Solves KSPSolve 1000 1.0 1.1975e+02 1.0 3.63e+10 1.2 2.6e+09 1.3e+03 3.5e+04 52100100 99 98 100100100100100 512039 VecTDot 13000 1.0 9.7158e+00 1.3 7.02e+08 1.0 0.0e+00 0.0e+00 1.3e+04 4 2 0 0 37 7 2 0 0 37 124852 VecNorm 8000 1.0 2.9320e+00 1.1 4.32e+08 1.0 0.0e+00 0.0e+00 8.0e+03 1 1 0 0 22 2 1 0 0 23 254601 VecScale 35000 1.0 2.7666e-01 2.7 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 535462 VecCopy 1000 1.0 8.3770e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 126000 1.0 1.5955e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 12000 1.0 8.3211e-01 1.2 6.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 1345675 VecAYPX 41000 1.0 1.3790e+00 1.5 5.92e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 737825 VecScatterBegin 147000 1.0 5.5527e+00 2.3 0.00e+00 0.0 2.6e+09 1.3e+03 0.0e+00 2 0100 99 0 4 0100100 0 0 VecScatterEnd 147000 1.0 3.3839e+01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 20 0 0 0 0 0 MatMult 42000 1.0 2.4343e+01 1.4 1.01e+10 1.1 8.7e+08 1.9e+03 0.0e+00 9 28 33 48 0 16 28 33 48 0 703788 MatMultAdd 35000 1.0 2.0074e+01 2.3 2.40e+09 1.3 4.6e+08 6.6e+02 0.0e+00 7 7 18 9 0 14 7 18 9 0 199518 MatMultTranspose 35000 1.0 1.7168e+01 2.3 2.40e+09 1.3 4.6e+08 6.6e+02 0.0e+00 4 7 18 9 0 8 7 18 9 0 233286 MatSolve 7000 0.0 1.3333e+00 0.0 1.66e+09 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1244 MatSOR 70000 1.0 5.9088e+01 1.1 1.90e+10 1.2 8.0e+08 1.5e+03 1.4e+04 24 52 31 34 39 46 52 31 34 40 542874 MatResidual 35000 1.0 2.1124e+01 1.5 7.97e+09 1.2 8.0e+08 1.5e+03 0.0e+00 7 22 31 34 0 14 22 31 34 0 634453 PCSetUpOnBlocks 7000 1.0 1.1204e-0119.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 7000 1.0 1.0287e+02 1.0 3.18e+10 1.2 2.5e+09 1.2e+03 1.4e+04 44 87 97 85 39 85 87 97 86 40 519965 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 1 8 10120 0. DMKSP interface 1 1 656 0. Vector 4 45 2366712 0. Matrix 0 59 16548712 0. Distributed Mesh 1 1 5248 0. Index Set 2 14 305000 0. IS L to G Mapping 1 1 131728 0. Star Forest Graph 2 2 1728 0. Discrete System 1 1 932 0. Vec Scatter 1 12 231168 0. Preconditioner 1 8 8692 0. Viewer 1 2 1680 0. Application Order 0 1 373248664 0. --- Event Stage 1: First Solve Krylov Solver 7 0 0 0. Vector 142 101 3702616 0. Matrix 124 65 27964988 0. Matrix Coarsen 5 5 3180 0. Index Set 102 90 187439200 0. Star Forest Graph 10 10 8640 0. Vec Scatter 28 17 21488 0. Preconditioner 7 0 0 0. Viewer 2 0 0 0. Application Order 1 0 0 0. --- Event Stage 2: Remaining Solves Vector 35000 35000 2262792000 0. ======================================================================================================================== Average time to get PetscTime(): 6.19888e-07 Average time for MPI_Barrier(): 1.27792e-05 Average time for zero size MPI_Send(): 6.85591e-06 #PETSc Option Table entries: -gamg_est_ksp_type cg -iterations 1000 -ksp_norm_type unpreconditioned -ksp_rtol 1E-6 -ksp_type cg -log_view -mat_view ::load_balance -mesh_size 1E-4 -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_max_it 1 -mg_levels_ksp_norm_type none -mg_levels_ksp_type richardson -mg_levels_pc_sor_its 1 -mg_levels_pc_type sor -nodes_per_proc 30 -pc_gamg_type classical -pc_mg_levels 6 -pc_type gamg #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc ----------------------------------------- Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core Using PETSc directory: /home/jczhang/petsc Using PETSc arch: intel-bdw-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp ----------------------------------------- Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl -----------------------------------------