[petsc-users] large PetscCommDuplicate overhead

Matthew Overholt Wed, 05 Oct 2016 12:31:26 -0700

Hi Petsc-Users,


I am trying to understand an issue where PetscCommDuplicate() calls are
taking an increasing percentage of time as I run a fixed-sized problem on
more processes.

 

I am using the FEM to solve the steady-state heat transfer equation (K.x =
q) using a PC direct solver, like MUMPS.

 

I am running on the NERSC Cray X30, which has two Xeon's per node with 12
cores each, and profiling the code using CrayPat sampling.

 

On a typical problem (1E+6 finite elements), running on a single node:

-for 2 cores (1 on each Xeon), about 1% of time is PetscCommDuplicate (on
process 1, but on the root it is less), and (for reference) 9% of total time
is for MUMPS.

-for 8 cores (4 on each Xeon), over 6% of time is PetscCommDuplicate (on
every process except the root, where it is <1%), and 9-10% of total time is
for MUMPS.

 

What is the large PetscCommDuplicate time connected to, an increasing number
of messages (tags)?  Would using fewer MatSetValues() and VecSetValues()
calls (with longer message lengths) alleviate this?

 

For reference, the PETSc calling sequence in the code is as follows.

    // Create the solution and RHS vectors

    ierr = VecCreate(petscData->mpicomm,&mesh->hpx);

   ierr = PetscObjectSetName((PetscObject) mesh->hpx, "Solution");

    ierr = VecSetSizes(mesh->hpx,mesh->lxN,mesh->neqns); // size = # of
equations; distribution to match mesh

    ierr = VecSetFromOptions(mesh->hpx);    // allow run time options

    ierr = VecDuplicate(mesh->hpx,&q);      // create the RHS vector

    // Create the stiffnexx matrix

    ierr = MatCreate(petscData->mpicomm,&K);

    ierr = MatSetSizes(K,mesh->lxN,mesh->lxN,mesh->neqns,mesh->neqns);

    ierr = MatSetType(K,MATAIJ);  // default sparse type

    // Do preallocation

    ierr = MatMPIAIJSetPreallocation(K,d_nz,NULL,o_nz,NULL);

    ierr = MatSeqAIJSetPreallocation(K,d_nz,NULL);

    ierr = MatSetOption(K,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);

    ierr = MatSetUp(K);

    // Create and set up the KSP context as a PreConditioner Only (Direct)
Solution

    ierr = KSPCreate(petscData->mpicomm,&ksp);

    ierr = KSPSetOperators(ksp,K,K);

    ierr = KSPSetType(ksp,KSPPREONLY);

    // Set the temperature vector

    ierr = VecSet(mesh->hpx,mesh->Tmin);

    // Set the default PC method as MUMPS

    ierr = KSPGetPC(ksp,&pc);     // extract the preconditioner

    ierr = PCSetType(pc,PCLU);    // set pc options

    ierr = PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);

    ierr = KSPSetFromOptions(ksp);

    

    // Set the values for the K matrix and q vector

        // which involves a lot of these calls

        ierr = MatSetValues(K,mrows,idxm,ncols,idxn,pKe,ADD_VALUES);    // 1
call per matrix row (equation)

        ierr = VecSetValues(q,nqe,ixn,pqe,ADD_VALUES);      // 1 call per
element

    ierr = VecAssemblyBegin(q);

    ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);

    ierr = VecAssemblyEnd(q);

    ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);

 

    // Solve //////////////////////////////////////

    ierr = KSPSolve(ksp,q,mesh->hpx);

...

*Note that the code evenly divides the finite elements over the total number
of processors, and I am using ghosting of the FE vertices vector to handle
the vertices that are needed on more than 1 process.

 

Thanks in advance for your help,

Matt Overholt

CapeSym, Inc.



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

[petsc-users] large PetscCommDuplicate overhead

Reply via email to