On Tue, Mar 23, 2021 at 9:08 PM Junchao Zhang <[email protected]> wrote:
> In the new log, I saw > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- > -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count %Total > Avg %Total Count %Total > 0: Main Stage: 5.4095e+00 2.3% 4.3700e+03 0.0% 4.764e+05 3.0% > 3.135e+02 1.0% 2.244e+04 12.6% 1: Solute_Assembly: 1.3977e+02 59.4% > 7.3353e+09 4.6% 3.263e+06 20.7% 1.278e+03 26.9% 1.059e+04 6.0% > > > But I didn't see any event in this stage had a cost close to 140s. What > happened? > This is true, but all the PETSc operations are speeding up by a factor 2x. It is hard to believe these were run on the same machine. For example, VecScale speeds up!?! So it is not network, or optimizations. I cannot explain this. Matt --- Event Stage 1: Solute_Assembly > > BuildTwoSided 3531 1.0 2.8025e+0026.3 0.00e+00 0.0 3.6e+05 4.0e+00 > 3.5e+03 1 0 2 0 2 1 0 11 0 33 0 > BuildTwoSidedF 3531 1.0 2.8678e+0013.2 0.00e+00 0.0 7.1e+05 3.6e+03 > 3.5e+03 1 0 5 17 2 1 0 22 62 33 0 > VecScatterBegin 7062 1.0 7.1911e-02 1.9 0.00e+00 0.0 7.1e+05 3.5e+02 > 0.0e+00 0 0 5 2 0 0 0 22 6 0 0 > VecScatterEnd 7062 1.0 2.1248e-01 3.0 1.60e+06 2.7 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 73 > SFBcastOpBegin 3531 1.0 2.6516e-02 2.4 0.00e+00 0.0 3.6e+05 3.5e+02 > 0.0e+00 0 0 2 1 0 0 0 11 3 0 0 > SFBcastOpEnd 3531 1.0 9.5041e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SFReduceBegin 3531 1.0 3.8955e-02 2.1 0.00e+00 0.0 3.6e+05 3.5e+02 > 0.0e+00 0 0 2 1 0 0 0 11 3 0 0 > SFReduceEnd 3531 1.0 1.3791e-01 3.9 1.60e+06 2.7 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 112 > SFPack 7062 1.0 6.5591e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > SFUnpack 7062 1.0 7.4186e-03 2.1 1.60e+06 2.7 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 2080 > MatAssemblyBegin 3531 1.0 4.7846e+00 1.1 0.00e+00 0.0 7.1e+05 3.6e+03 > 3.5e+03 2 0 5 17 2 3 0 22 62 33 0 > MatAssemblyEnd 3531 1.0 1.5468e+00 2.7 1.68e+07 2.7 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 1 2 0 0 0 104 > MatZeroEntries 3531 1.0 3.0998e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > --Junchao Zhang > > > > On Tue, Mar 23, 2021 at 5:24 PM Mohammad Gohardoust <[email protected]> > wrote: > >> Thanks Dave for your reply. >> >> For sure PETSc is awesome :D >> >> Yes, in both cases petsc was configured with --with-debugging=0 and >> fortunately I do have the old and new -log-veiw outputs which I attached. >> >> Best, >> Mohammad >> >> On Tue, Mar 23, 2021 at 1:37 AM Dave May <[email protected]> wrote: >> >>> Nice to hear! >>> The answer is simple, PETSc is awesome :) >>> >>> Jokes aside, assuming both petsc builds were configured with >>> —with-debugging=0, I don’t think there is a definitive answer to your >>> question with the information you provided. >>> >>> It could be as simple as one specific implementation you use was >>> improved between petsc releases. Not being an Ubuntu expert, the change >>> might be associated with using a different compiler, and or a more >>> efficient BLAS implementation (non threaded vs threaded). However I doubt >>> this is the origin of your 2x performance increase. >>> >>> If you really want to understand where the performance improvement >>> originated from, you’d need to send to the email list the result of >>> -log_view from both the old and new versions, running the exact same >>> problem. >>> >>> From that info, we can see what implementations in PETSc are being used >>> and where the time reduction is occurring. Knowing that, it should be >>> clearer to provide an explanation for it. >>> >>> >>> Thanks, >>> Dave >>> >>> >>> On Tue 23. Mar 2021 at 06:24, Mohammad Gohardoust <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I am using a code which is based on petsc (and also parmetis). Recently >>>> I made the following changes and now the code is running about two times >>>> faster than before: >>>> >>>> - Upgraded Ubuntu 18.04 to 20.04 >>>> - Upgraded petsc 3.13.4 to 3.14.5 >>>> - This time I installed parmetis and metis directly via petsc by >>>> --download-parmetis --download-metis flags instead of installing them >>>> separately and using --with-parmetis-include=... and >>>> --with-parmetis-lib=... (the version of installed parmetis was 4.0.3 >>>> before) >>>> >>>> I was wondering what can possibly explain this speedup? Does anyone >>>> have any suggestions? >>>> >>>> Thanks, >>>> Mohammad >>>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
