Here is the plot of run time in old and new petsc using 1,2,4,8, and 16 CPUs (in logarithmic scale):
[image: Screenshot from 2021-03-28 10-48-56.png] On Thu, Mar 25, 2021 at 12:51 PM Mohammad Gohardoust <[email protected]> wrote: > That's right, these loops also take roughly half time as well. If I am not > mistaken, petsc (MatSetValue) is called after doing some calculations over > each tetrahedral element. > Thanks for your suggestion. I will try that and will post the results. > > Mohammad > > On Wed, Mar 24, 2021 at 3:23 PM Junchao Zhang <[email protected]> > wrote: > >> >> >> >> On Wed, Mar 24, 2021 at 2:17 AM Mohammad Gohardoust <[email protected]> >> wrote: >> >>> So the code itself is a finite-element scheme and in stage 1 and 3 there >>> are expensive loops over entire mesh elements which consume a lot of time. >>> >> So these expensive loops must also take half time with newer petsc? And >> these loops do not call petsc routines? >> I think you can build two PETSc versions with the same configuration >> options, then run your code with one MPI rank to see if there is a >> difference. >> If they give the same performance, then scale to 2, 4, ... ranks and see >> what happens. >> >> >> >>> >>> Mohammad >>> >>> On Tue, Mar 23, 2021 at 6:08 PM Junchao Zhang <[email protected]> >>> wrote: >>> >>>> In the new log, I saw >>>> >>>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages >>>> --- -- Message Lengths -- -- Reductions -- >>>> Avg %Total Avg %Total Count >>>> %Total Avg %Total Count %Total >>>> 0: Main Stage: 5.4095e+00 2.3% 4.3700e+03 0.0% 4.764e+05 >>>> 3.0% 3.135e+02 1.0% 2.244e+04 12.6% 1: Solute_Assembly: >>>> 1.3977e+02 59.4% 7.3353e+09 4.6% 3.263e+06 20.7% 1.278e+03 >>>> 26.9% 1.059e+04 6.0% >>>> >>>> >>>> But I didn't see any event in this stage had a cost close to 140s. What >>>> happened? >>>> >>>> --- Event Stage 1: Solute_Assembly >>>> >>>> BuildTwoSided 3531 1.0 2.8025e+0026.3 0.00e+00 0.0 3.6e+05 4.0e+00 >>>> 3.5e+03 1 0 2 0 2 1 0 11 0 33 0 >>>> BuildTwoSidedF 3531 1.0 2.8678e+0013.2 0.00e+00 0.0 7.1e+05 3.6e+03 >>>> 3.5e+03 1 0 5 17 2 1 0 22 62 33 0 >>>> VecScatterBegin 7062 1.0 7.1911e-02 1.9 0.00e+00 0.0 7.1e+05 3.5e+02 >>>> 0.0e+00 0 0 5 2 0 0 0 22 6 0 0 >>>> VecScatterEnd 7062 1.0 2.1248e-01 3.0 1.60e+06 2.7 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 73 >>>> SFBcastOpBegin 3531 1.0 2.6516e-02 2.4 0.00e+00 0.0 3.6e+05 3.5e+02 >>>> 0.0e+00 0 0 2 1 0 0 0 11 3 0 0 >>>> SFBcastOpEnd 3531 1.0 9.5041e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFReduceBegin 3531 1.0 3.8955e-02 2.1 0.00e+00 0.0 3.6e+05 3.5e+02 >>>> 0.0e+00 0 0 2 1 0 0 0 11 3 0 0 >>>> SFReduceEnd 3531 1.0 1.3791e-01 3.9 1.60e+06 2.7 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 112 >>>> SFPack 7062 1.0 6.5591e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SFUnpack 7062 1.0 7.4186e-03 2.1 1.60e+06 2.7 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2080 >>>> MatAssemblyBegin 3531 1.0 4.7846e+00 1.1 0.00e+00 0.0 7.1e+05 3.6e+03 >>>> 3.5e+03 2 0 5 17 2 3 0 22 62 33 0 >>>> MatAssemblyEnd 3531 1.0 1.5468e+00 2.7 1.68e+07 2.7 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 1 2 0 0 0 104 >>>> MatZeroEntries 3531 1.0 3.0998e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> >>>> >>>> --Junchao Zhang >>>> >>>> >>>> >>>> On Tue, Mar 23, 2021 at 5:24 PM Mohammad Gohardoust < >>>> [email protected]> wrote: >>>> >>>>> Thanks Dave for your reply. >>>>> >>>>> For sure PETSc is awesome :D >>>>> >>>>> Yes, in both cases petsc was configured with --with-debugging=0 and >>>>> fortunately I do have the old and new -log-veiw outputs which I attached. >>>>> >>>>> Best, >>>>> Mohammad >>>>> >>>>> On Tue, Mar 23, 2021 at 1:37 AM Dave May <[email protected]> >>>>> wrote: >>>>> >>>>>> Nice to hear! >>>>>> The answer is simple, PETSc is awesome :) >>>>>> >>>>>> Jokes aside, assuming both petsc builds were configured with >>>>>> —with-debugging=0, I don’t think there is a definitive answer to your >>>>>> question with the information you provided. >>>>>> >>>>>> It could be as simple as one specific implementation you use was >>>>>> improved between petsc releases. Not being an Ubuntu expert, the change >>>>>> might be associated with using a different compiler, and or a more >>>>>> efficient BLAS implementation (non threaded vs threaded). However I doubt >>>>>> this is the origin of your 2x performance increase. >>>>>> >>>>>> If you really want to understand where the performance improvement >>>>>> originated from, you’d need to send to the email list the result of >>>>>> -log_view from both the old and new versions, running the exact same >>>>>> problem. >>>>>> >>>>>> From that info, we can see what implementations in PETSc are being >>>>>> used and where the time reduction is occurring. Knowing that, it should >>>>>> be >>>>>> clearer to provide an explanation for it. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Dave >>>>>> >>>>>> >>>>>> On Tue 23. Mar 2021 at 06:24, Mohammad Gohardoust < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am using a code which is based on petsc (and also parmetis). >>>>>>> Recently I made the following changes and now the code is running about >>>>>>> two >>>>>>> times faster than before: >>>>>>> >>>>>>> - Upgraded Ubuntu 18.04 to 20.04 >>>>>>> - Upgraded petsc 3.13.4 to 3.14.5 >>>>>>> - This time I installed parmetis and metis directly via petsc by >>>>>>> --download-parmetis --download-metis flags instead of installing them >>>>>>> separately and using --with-parmetis-include=... and >>>>>>> --with-parmetis-lib=... (the version of installed parmetis was 4.0.3 >>>>>>> before) >>>>>>> >>>>>>> I was wondering what can possibly explain this speedup? Does anyone >>>>>>> have any suggestions? >>>>>>> >>>>>>> Thanks, >>>>>>> Mohammad >>>>>>> >>>>>>
