Re: [petsc-users] Enhancing MatScale computing time

Antoine Côté Thu, 22 Oct 2020 13:18:03 -0700

Hi Sir,

MatScale in "Main Stage" is indeed called 6 times for 0% run time. In stage 
"Stiff_Adj" though, we get :


MatScale            8192 1.0 7.1185e+01 1.0 3.43e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 50 46  0  0  0  80 98  0  0  0   482

MatMult is indeed expensive (23% run time) and should be improved, but MatScale 
in "Stiff_Adj" is still taking 50% run time

Thanks,

Antoine
________________________________
De : Barry Smith <[email protected]>
Envoyé : 22 octobre 2020 16:09
À : Antoine Côté <[email protected]>
Cc : [email protected] <[email protected]>
Objet : Re: [petsc-users] Enhancing MatScale computing time


MatMult             9553 1.0 3.2824e+01 1.0 3.54e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 23 48  0  0  0  61 91  0  0  0  1079
MatScale               6 1.0 5.3896e-02 1.0 2.52e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0   467

Though the flop rate of MatScale is not so high (467) it is taking very little 
(0 percent of the run time while MatMult takes 23 percent of the time).

So the main cost related to the matrices is MatMult because it has a lot of 
operations 9553, you might think about your algorithms you are using and if 
there
improvements.

It looks like you are using some kind of multigrid and solve 6 problems with 
1357 total iterations which is 200 iterations per solve. This is absolutely 
HUGE for multigrain, you need to tune the multigrid for you problem to bring 
that down to at most a couple dozen iterations per solve.

  Barry

On Oct 22, 2020, at 3:02 PM, Antoine Côté 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

See attached files for both outputs. Tell me if you need any clarification. It 
was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). With 3 
dof per nodes, problem has a total of 28611 dof.

Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. 
PetscLogStagePush/Pop was used.

Regards,

Antoine
________________________________
De : Matthew Knepley <[email protected]<mailto:[email protected]>>
Envoyé : 22 octobre 2020 15:35
À : Antoine Côté 
<[email protected]<mailto:[email protected]>>
Cc : [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Objet : Re: [petsc-users] Enhancing MatScale computing time

On Thu, Oct 22, 2020 at 3:23 PM Antoine Côté 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse 
matrix Mat K. The Mat is modified repeatedly by the program, using the commands 
(in that order) :

MatZeroEntries(K)
In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
MatDiagonalScale(K, vec1, vec1)
MatDiagonalSet(K, vec2, ADD_VALUES)

Computing time seems high and I would like to improve it. Running tests with 
"-log_view" tells me that MatScale() is the bottle neck (50% of total computing 
time) . From manual pages, I've tried a few tweaks :

  *   DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of 
freedom per node, ... BAIJ can significantly enhance performance", Chapter 
14.2.4
  *   Used MatMissingDiagonal() to confirm there is no missing diagonal entries 
: "If the matrix Y is missing some diagonal entries this routine can be very 
slow", MatDiagonalSet() manual
  *   Tried MatSetOption()
     *   MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly 
efficiency
     *   MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly 
processes have one less global reduction"
     *   MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly 
processes have one less global reduction"
     *   MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix 
assembly"

According to "-log_view", assembly is fast (0% of total time), and the use of a 
DMDA makes me believe preallocation isn't the cause of performance issue.

I would like to know how could I improve MatScale(). What are the best 
practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? 
Instead of MatDiagonalScale(), should I use another command to obtain the same 
result faster?

Something is definitely strange. Can you please send the output of

  -log_view -info :mat

  Thanks,

     Matt

Thank you very much!

Antoine Côté



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C2f4d6ff4e9aa48b4058a08d876c6665d%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389941843624498%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EgZu%2BdmuXzZwE8LSyMC4BhoC7Or%2BHvrwykv%2BcPZOCXg%3D&reserved=0>
<LogView.out><mat.0>

Re: [petsc-users] Enhancing MatScale computing time

Reply via email to