I will have a look and report back to you. Thanks. --Junchao Zhang
On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <bhatiama...@gmail.com> wrote: > I have created a standalone test that demonstrates the problem at my end. > I have stored the indices, etc. from my problem in a text file for each > rank, which I use to initialize the matrix. > Please note that the test is specifically for 8 ranks. > > The .tgz file is on my google drive: > https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing > > > This contains a README file with instructions on running. Please note that > the work directory needs the index files. > > Please let me know if I can provide any further information. > > Thank you all for your help. > > Regards, > Manav > > On Aug 20, 2020, at 12:54 PM, Jed Brown <j...@jedbrown.org> wrote: > > Matthew Knepley <knep...@gmail.com> writes: > > On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiama...@gmail.com> > wrote: > > > > On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zamp...@gmail.com> > wrote: > > Can you add a MPI_Barrier before > > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > > > With a MPI_Barrier before this function call: > — three of the processes have already hit this barrier, > — the other 5 are inside MatStashScatterGetMesg_Private -> > MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 > processes) > > > This is not itself evidence of inconsistent state. You can use > > -build_twosided allreduce > > to avoid the nonblocking sparse algorithm. > > > Okay, you should run this with -matstash_legacy just to make sure it is not > a bug in your MPI implementation. But it looks like > there is inconsistency in the parallel state. This can happen because we > have a bug, or it could be that you called a collective > operation on a subset of the processes. Is there any way you could cut down > the example (say put all 1s in the matrix, etc) so > that you could give it to us to run? > > >