No, is the same problem running with different number of processors, i have data from 1 to 20 processors in increments of 20 processors/1 node, and additionally for 1 processor.
On Fri, Mar 22, 2019 at 2:48 PM Zhang, Junchao <jczh...@mcs.anl.gov> wrote: > Did you change problem size with different runs? > > On Fri, Mar 22, 2019 at 4:09 PM Manuel Valera <mvaler...@sdsu.edu> wrote: > >> Hello, >> >> I repeated the timings with the -log_sync option and now i get for 200 >> processors / 20 nodes: >> >> >> ------------------------------------------------------------------------------------------------------------------------ >> Event Count Time (sec) Flop >> --- Global --- --- Stage --- Total >> Max Ratio Max Ratio Max Ratio >> Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >> >> ------------------------------------------------------------------------------------------------------------------------ >> >> VecScatterBarrie 3014 1.0 5.6771e+01 3.9 0.00e+00 0.0 *0.0e+00 >> 0.0e+00 0.0e+00 5 * 0 0 0 0 5 0 0 0 0 0 >> VecScatterBegin 3014 1.0 3.1684e+01 2.0 0.00e+00 0.0 *4.2e+06 1.1e+06 >> 2.8e+01 4* 0 63 56 0 4 0 63 56 0 0 >> VecScatterEnd 2976 1.0 1.1383e+02 1.8 0.00e+00 0.0 *0.0e+00 >> 0.0e+00 0.0e+00 14* 0 0 0 0 14 0 0 0 0 0 >> >> With 100 processors / 10 nodes: >> >> VecScatterBarrie 3010 1.0 7.4430e+01 5.0 0.00e+00 0.0 *0.0e+00 >> 0.0e+00 0.0e+00 7* 0 0 0 0 7 0 0 0 0 0 >> VecScatterBegin 3010 1.0 3.8504e+01 2.4 0.00e+00 0.0 *1.6e+06 2.0e+06 >> 2.8e+01 4 * 0 71 66 0 4 0 71 66 0 0 >> VecScatterEnd 2972 1.0 8.5158e+01 1.2 0.00e+00 0.0 *0.0e+00 >> 0.0e+00 0.0e+00 9 * 0 0 0 0 9 0 0 0 0 0 >> >> And with 20 processors / 1 node: >> >> VecScatterBarrie 2596 1.0 4.0614e+01 7.3 0.00e+00 0.0* 0.0e+00 >> 0.0e+00 0.0e+00 4 * 0 0 0 0 4 0 0 0 0 0 >> VecScatterBegin 2596 1.0 1.4970e+01 1.3 0.00e+00 0.0 *1.2e+05 >> 4.0e+06 3.0e+01 1 * 0 81 61 0 1 0 81 61 0 0 >> VecScatterEnd 2558 1.0 1.4903e+01 1.3 0.00e+00 0.0* 0.0e+00 >> 0.0e+00 0.0e+00 1 * 0 0 0 0 1 0 0 0 0 0 >> >> Can you help me interpret this? what i see is the End portion taking more >> relative time and Begin staying the same beyond one node, also Barrier and >> Begin counts are the same every time, but how do i estimate communication >> times from here? >> >> Thanks, >> >> >> On Wed, Mar 20, 2019 at 3:24 PM Zhang, Junchao <jczh...@mcs.anl.gov> >> wrote: >> >>> Forgot to mention long VecScatter time might also due to local memory >>> copies. If the communication pattern has large local to local (self to >>> self) scatter, which often happens thanks to locality, then the memory >>> copy time is counted in VecScatter. You can analyze your code's >>> communication pattern to see if it is the case. >>> >>> --Junchao Zhang >>> >>> >>> On Wed, Mar 20, 2019 at 4:44 PM Zhang, Junchao via petsc-users < >>> petsc-users@mcs.anl.gov> wrote: >>> >>>> >>>> >>>> On Wed, Mar 20, 2019 at 4:18 PM Manuel Valera <mvaler...@sdsu.edu> >>>> wrote: >>>> >>>>> Thanks for your answer, so for example i have a log for 200 cores >>>>> across 10 nodes that reads: >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------------------ >>>>> Event Count Time (sec) Flop >>>>> --- Global --- --- Stage --- Total >>>>> Max Ratio Max Ratio Max Ratio >>>>> Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>>>> >>>>> ---------------------------------------------------------------------------------------------------------------------- >>>>> VecScatterBegin 3014 1.0 4.5550e+01 2.6 0.00e+00 0.0 *4.2e+06 >>>>> 1.1e+06 2.8e+01 4* 0 63 56 0 4 0 63 56 0 0 >>>>> VecScatterEnd 2976 1.0 1.2143e+02 1.7 0.00e+00 0.0 *0.0e+00 >>>>> 0.0e+00 0.0e+00 14* 0 0 0 0 14 0 0 0 0 0 >>>>> >>>>> While for 20 nodes at one node i have: >>>>> >>>> What does that mean? >>>> >>>>> VecScatterBegin 2596 1.0 2.9142e+01 2.1 0.00e+00 0.0 *1.2e+05 >>>>> 4.0e+06 3.0e+01 2* 0 81 61 0 2 0 81 61 0 0 >>>>> VecScatterEnd 2558 1.0 8.0344e+01 7.9 0.00e+00 0.0 *0.0e+00 >>>>> 0.0e+00 0.0e+00 3* 0 0 0 0 3 0 0 0 0 0 >>>>> >>>>> Where do i see the max/min ratio in here? and why End step is all >>>>> 0.0e00 in both but still grows from 3% to 14% of total time? It seems i >>>>> would need to run again with the -log_sync option, is this correct? >>>>> >>>>> e.g., 2.1, 7.9. MPI send/recv are in VecScatterBegin(). >>>> VecScatterEnd() only does MPI_Wait. That is why it has zero messages. Yes, >>>> run with -log_sync and see what happens. >>>> >>>> >>>>> Different question, can't i estimate the total communication time if i >>>>> had a typical communication time per MPI message times the number of MPI >>>>> messages reported in the log? or it doesn't work like that? >>>>> >>>>> Probably not work because you have multiple processes doing send/recv >>>> at the same time. They might saturate the bandwidth. Petsc also does >>>> computation/communication overlapping. >>>> >>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Mar 20, 2019 at 2:02 PM Zhang, Junchao <jczh...@mcs.anl.gov> >>>>> wrote: >>>>> >>>>>> See the "Mess AvgLen Reduct" number in each log stage. Mess is >>>>>> the total number of messages sent in an event over all processes. AvgLen >>>>>> is average message len. Reduct is the number of global reduction. >>>>>> Each event like VecScatterBegin/End has a maximal execution time over >>>>>> all processes, and a max/min ratio. %T is sum(execution time of the >>>>>> event >>>>>> on each process)/sum(execution time of the stage on each process). %T >>>>>> indicates how expensive the event is. It is a number you should pay >>>>>> attention to. >>>>>> If your code is imbalanced (i.e., with a big max/min ratio), then the >>>>>> performance number is skewed and becomes misleading because some >>>>>> processes >>>>>> are just waiting for others. Then, besides -log_view, you can add >>>>>> -log_sync, which adds an extra MPI_Barrier for each event to let them >>>>>> start >>>>>> at the same time. With that, it is easier to interpret the number. >>>>>> src/vec/vscat/examples/ex4.c is a tiny example for VecScatter >>>>>> logging. >>>>>> >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Wed, Mar 20, 2019 at 2:58 PM Manuel Valera via petsc-users < >>>>>> petsc-users@mcs.anl.gov> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am working on timing my model, which we made MPI scalable using >>>>>>> petsc DMDAs, i want to know more about the output log and how to >>>>>>> calculate >>>>>>> a total communication times for my runs, so far i see we have "MPI >>>>>>> Messages" and "MPI Messages Lengths" in the log, along VecScatterEnd and >>>>>>> VecScatterBegin reports. >>>>>>> >>>>>>> My question is, how do i interpret these number to get a rough >>>>>>> estimate on how much overhead we have just from MPI communications >>>>>>> times in >>>>>>> my model runs? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>>