You could access the VecScatter inside the matrix-multiply and call 
VecScatterView() with an ASCII viewer with the format PETSC_VIEWER_ASCII_INFO 
(make sure you use this format) and it provides information about how much 
communication is being done and how many neighbors are being communicated with

  Barry


> On Jun 21, 2019, at 10:56 AM, Jed Brown <j...@jedbrown.org> wrote:
> 
> What is the partition like?  Suppose you randomly assigned nodes to
> processes; then in the typical case, all neighbors would be on different
> processors.  Then the "diagonal block" would be nearly diagonal and the
> off-diagonal block would be huge, requiring communication with many
> other processes.
> 
> "Smith, Barry F. via petsc-users" <petsc-users@mcs.anl.gov> writes:
> 
>>   The load balance is definitely out of whack. 
>> 
>> 
>> 
>> BuildTwoSidedF         1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatMult              138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 
>> 0.0e+00  2  4 13 13  0  15 25100100  0 2935476
>> MatAssemblyBegin       1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatAssemblyEnd         1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecNorm                2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 
>> 2.0e+00  1  0  0  0  0   5  0  0  0  1 12780
>> VecCopy                6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> VecAXPY                2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 41000838
>> VecScatterBegin      138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 
>> 0.0e+00  1  0 13 13  0   4  0100100  0     0
>> VecScatterEnd        138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
>> VecSetRandom           1 1.0 9.6668e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> 
>> Note that VecCopy/AXPY/SetRandom which are all embarrassingly parallel have 
>> a balance ratio above 2 which means some processes have more than twice the 
>> work of others. Meanwhile the ratio for anything with communication is 
>> extremely in balanced, some processes get to the synchronization point well 
>> before other processes. 
>> 
>> The first thing I would do is worry about the load imbalance, what is its 
>> cause? is it one process with much less work than others (not great but not 
>> terrible) or is it one process with much more work then the others 
>> (terrible) or something in between. I think once you get a handle on the 
>> load balance the rest may fall into place, otherwise we still have some 
>> exploring to do. This is not expected behavior for a good machine with a 
>> good network and a well balanced job. After you understand the load 
>> balancing you may need to use one of the parallel performance visualization 
>> tools to see why the synchronization is out of whack.
>> 
>>   Good luck
>> 
>>  Barry
>> 
>> 
>>> On Jun 21, 2019, at 9:27 AM, Ale Foggia <amfog...@gmail.com> wrote:
>>> 
>>> I'm sending one with a bit less time.
>>> I'm timing the functions also with std::chronos and for the case of 180 
>>> seconds the program runs out of memory (and crushes) before the PETSc log 
>>> gets to be printed, so I know the time only from my function. Anyway, in 
>>> every case, the times between std::chronos and the PETSc log match.
>>> 
>>> (The large times are in part "4b- Building offdiagonal part" or "Event 
>>> Stage 5: Offdiag").
>>> 
>>> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao (<jczh...@mcs.anl.gov>) 
>>> escribió:
>>> 
>>> 
>>> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia <amfog...@gmail.com> wrote:
>>> Thanks both of you for your answers,
>>> 
>>> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. (<bsm...@mcs.anl.gov>) 
>>> escribió:
>>> 
>>>  Note that this is a one time cost if the nonzero structure of the matrix 
>>> stays the same. It will not happen in future MatAssemblies.
>>> 
>>>> On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users 
>>>> <petsc-users@mcs.anl.gov> wrote:
>>>> 
>>>> Those messages were used to build MatMult communication pattern for the 
>>>> matrix. They were not part of the matrix entries-passing you imagined, but 
>>>> indeed happened in MatAssemblyEnd. If you want to make sure processors do 
>>>> not set remote entries, you can use 
>>>> MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an 
>>>> error when an off-proc entry is set.
>>> 
>>> I started being concerned about this when I saw that the assembly was 
>>> taking a few hundreds of seconds in my code, like 180 seconds, which for me 
>>> it's a considerable time. Do you think (or maybe you need more information 
>>> to answer this) that this time is "reasonable" for communicating the 
>>> pattern for the matrix? I already checked that I'm not setting any remote 
>>> entries. 
>>> It is not reasonable. Could you send log view of that test with 180 seconds 
>>> MatAssembly?
>>> 
>>> Also I see (in my code) that even if there are no messages being passed 
>>> during MatAssemblyBegin, it is taking time and the "ratio" is very big.
>>> 
>>>> 
>>>> 
>>>> --Junchao Zhang
>>>> 
>>>> 
>>>> On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users 
>>>> <petsc-users@mcs.anl.gov> wrote:
>>>> Hello all!
>>>> 
>>>> During the conference I showed you a problem happening during 
>>>> MatAssemblyEnd in a particular code that I have. Now, I tried the same 
>>>> with a simple code (a symmetric problem corresponding to the Laplacian 
>>>> operator in 1D, from the SLEPc Hands-On exercises). As I understand (and 
>>>> please, correct me if I'm wrong), in this case the elements of the matrix 
>>>> are computed locally by each process so there should not be any 
>>>> communication during the assembly. However, in the log I get that there 
>>>> are messages being passed. Also, the number of messages changes with the 
>>>> number of processes used and the size of the matrix. Could you please help 
>>>> me understand this?
>>>> 
>>>> I attach the code I used and the log I get for a small problem.
>>>> 
>>>> Cheers,
>>>> Ale
>>>> 
>>> 
>>> <log.txt>

Reply via email to