Stefano, I will report the results to these shortly. To your second question, this is the first matrix assembly of the code.
-Manav > On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zamp...@gmail.com> > wrote: > > Manav > > Can you add a MPI_Barrier before > > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > > Also, in order to assess where the issue is, we need to see the values (per > rank) of > > ((Mat_SeqAIJ*)aij->B->data)->nonew > mat->was_assembled > aij->donotstash > mat->nooffprocentries > > Another question: is this the first matrix assembly of the code? > If you change to pc_none, do you get the same issue? > >> On Aug 20, 2020, at 3:10 PM, Manav Bhatia <bhatiama...@gmail.com >> <mailto:bhatiama...@gmail.com>> wrote: >> >> >> >>> On Aug 19, 2020, at 9:39 PM, Matthew Knepley <knep...@gmail.com >>> <mailto:knep...@gmail.com>> wrote: >>> >>> Jed is more knowledgeable about the communication, but I have a simple >>> question about the FEM method. Normally, the way >>> we divide unknowns is that the only unknowns which might have entries >>> computed off-process are those on the partition boundary. >>> However, it sounds like you have a huge number of communicated values. Is >>> it possible that the division of rows in your matrix does >>> not match the division of the cells you compute element matrices for? >> >> >> I hope that is not the case. I am using libMesh to manage the mesh and >> creation of sparsity pattern, which uses Parmetis to create the partitions. >> libMesh ensures that off-process entries are only at the partition boundary >> (unless an extra set of DoFs are marked for coupling. >> >> I also printed and looked at the n_nz and n_oz values on each rank and it >> does not seem to raise any flags. >> >> I will try to dig in a bit further to make sure everything checks out. >> >> Looking at the screenshots I had shared yesterday, all processes are in this >> function: >> >> PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode) >> { >> Mat_MPIAIJ *aij = (Mat_MPIAIJ*)mat->data; >> Mat_SeqAIJ *a = (Mat_SeqAIJ*)aij->A->data; >> PetscErrorCode ierr; >> PetscMPIInt n; >> PetscInt i,j,rstart,ncols,flg; >> PetscInt *row,*col; >> PetscBool other_disassembled; >> PetscScalar *val; >> >> /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in >> disassembly */ >> >> PetscFunctionBegin; >> if (!aij->donotstash && !mat->nooffprocentries) { >> while (1) { >> ierr = >> MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); >> if (!flg) break; >> >> for (i=0; i<n; ) { >> /* Now identify the consecutive vals belonging to the same row */ >> for (j=i,rstart=row[j]; j<n; j++) { >> if (row[j] != rstart) break; >> } >> if (j < n) ncols = j-i; >> else ncols = n-i; >> /* Now assemble all these values with a single function call */ >> ierr = >> MatSetValues_MPIAIJ(mat,1,row+i,ncols,col+i,val+i,mat->insertmode);CHKERRQ(ierr); >> >> i = j; >> } >> } >> ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); >> } >> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr); >> >> /* determine if any processor has disassembled, if so we must >> also disassemble ourselfs, in order that we may reassemble. */ >> /* >> if nonzero structure of submatrix B cannot change then we know that >> no processor disassembled thus we can skip this stuff >> */ >> if (!((Mat_SeqAIJ*)aij->B->data)->nonew) { >> ierr = >> MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); >> if (mat->was_assembled && !other_disassembled) { >> ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr); >> } >> } >> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> } >> ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr); >> ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr); >> >> ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr); >> >> aij->rowvalues = 0; >> >> ierr = VecDestroy(&aij->diag);CHKERRQ(ierr); >> if (a->inode.size) mat->ops->multdiagonalblock = >> MatMultDiagonalBlock_MPIAIJ; >> >> /* if no new nonzero locations are allowed in matrix then only set the >> matrix state the first time through */ >> if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || >> !((Mat_SeqAIJ*)(aij->A->data))->nonew) { >> PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate; >> ierr = >> MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >> I noticed that of the 8 MPI processes, 2 were stuck at >> ierr = >> MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); >> >> Other two were stuck at >> ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); >> >> And remaining four were under >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> >> Is it expected for processes to be at different stages in this function? >> >> -Manav >> >> >