On 7/14/22 03:34, Paras Kumar wrote:

I am working on solving the a nonlinear coupled problem involving a vector displacement field and a scalar phase-field variable. The code is MPI parallelized using p:d:t and TrilinosWrappers for linear algebra.

Usually I use CG+AMG for solving the SLEs when solving for each of the variables within a staggered scheme.  But for certain scenarios, the iterative linear solver fails and we switch to Amesos_Superludist solver. The code is run on 2 nodes (144 MPI processes in total) and as shown by the code performance monitor, the flop count of one of the nodes drops to (almost) zero and only one one node seems to be doing the computations once the solver switch from iterative to direct solver occurs. Please see attached flops and memory bandwidth plots. The blue and red lines here represent the two nodes. Similar observations were also made for  a larger problem involving 8 nodes.

These plots seem to  hint that Superlu-dist solver does not scale across multiple nodes. One possible reason I could think of is that I probably missed some option while installing dealii with trilinos and superlu-dist using spack. I also attach the spack spec which I installed on the cluster.  The gcc compiler and corresponding openmpi@4.1.2 are available form the cluster.


Paras:
I'm not sure any of us have experience with Amesos:SuperLU, so I'm not sure anyone will know right away what the problem may be.

But here are a couple of questions:
* What happens if you run the program with just two MPI jobs on one machine? In that case, you can watch what the two programs are doing by having 'top' run in a separate window. * How do you distribute the matrix and right hand side? Are they both fully distributed?
* Is the solution you get correct?
* If the answer to the last question is yes, then either Amesos or SuperLU is apparently copying the data of the linear system from all other processes to just one process that then solves the linear system. It might be useful to take a debugger, running with just two MPI processes, to step into the Amesos routines to see if you get to a place where that is happening, and then to read the code in that place to see what flags need to be set to make sure the solution really does happen in a distributed way.

That's about all I can offer.
Best
 W.

--
------------------------------------------------------------------------
Wolfgang Bangerth          email:                 bange...@colostate.edu
                           www: http://www.math.colostate.edu/~bangerth/

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/b207e535-5f6b-f06a-e902-87628bbcf5e5%40colostate.edu.

Reply via email to