Thanks a lot Hong, The switch definitely seemed to balance the load during the SuperLU matmatsolve. Although I'm not completely sure what I'm seeing. Changing the #dof also seemed to affect the load balance of the Mumps MatMatSolve. I need to investigate a bit more.
Looking in the profile. The majority of the time is spent in the MatSolve called by the MatMatSolve. ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ VecCopy 135030 1.0 6.3319e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecWAXPY 30 1.0 1.6069e-04 1.9 4.32e+03 1.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 840 VecScatterBegin 30 1.0 7.6072e-03 1.5 0.00e+00 0.0 4.7e+04 9.0e+02 0.0e+00 0 0 15 0 0 0 0 50 0 0 0 VecScatterEnd 30 1.0 9.1272e-02 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMultAdd 30 1.0 3.3028e-01 1.4 3.89e+07 1.7 4.7e+04 9.0e+02 0.0e+00 0 0 15 0 0 0 0 50 0 0 3679 MatSolve 135030 1.0 3.0340e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 78 0 0 0 0 81 0 0 0 0 0 MatLUFactorSym 30 1.0 2.2563e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 30 1.0 2.7990e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0 MatConvert 150 1.0 2.9276e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+02 0 0 0 0 4 0 0 0 0 30 0 MatScale 60 1.0 2.7492e-01 1.9 1.94e+07 1.7 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2210 MatAssemblyBegin 180 1.0 1.1748e+02236.9 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+02 2 0 0 0 5 2 0 0 0 40 0 MatAssemblyEnd 180 1.0 1.9992e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+02 0 0 0 0 5 0 0 0 0 40 0 MatGetRow 4320 1.7 2.2634e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMatMult 30 1.0 4.2578e+02 1.0 1.75e+11 1.7 4.7e+04 4.0e+06 2.4e+02 11 100 15 97 5 11100 50100 40 12841 MatMatSolve 30 1.0 3.0256e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 77 0 0 0 1 81 0 0 0 10 0 df On Fri, 13 Mar 2009, Hong Zhang wrote: > David, > > You may run with option '-log_summary <log_file>' and > check which function dominates the time. > I suspect the symbolic factorization, because it is > implemented sequentially in mumps. > > If this is the case, you may swich to superlu_dist > which supports parallel symbolic factorization > in the latest release. > > Let us know what you get, > > Hong > > On Fri, 13 Mar 2009, David Fuentes wrote: > >> >> The majority of time in my code is spent in the MatMatSolve. I'm running >> MatMatSolve in parallel using Mumps as the factored matrix. >> Using top, I've noticed that during the MatMatSolve >> the majority of the load seems to be on the root process. >> Is this expected? Or do I most likely have a problem with the matrices that >> I'm passing in? >> >> >> >> thank you, >> David Fuentes >> >> >