This problem has 11088 equations and the two processor run has 2 equations partitioned to process 0 and 11086 to process 1. So I think you are hitting a corner case in the code when it reduces the number of active processors. I will need to debug this.
I might be able to reason this out with your data. I'm on travel through next week so I'm not sure when I wlll be able to take a look at this. I might ask for a binary matrix file for this two processor run. So if this is easy for you to do, then maybe you could just do this. In the mean time I will try to think about what is going wrong here. This code should work now if you give it a more normal partitioning, but there is a bug here and I want to fix it. Mark On Jan 5, 2012, at 6:41 PM, Jed Brown wrote: > On Thu, Jan 5, 2012 at 17:13, Ravi Kannan <rxk at cfdrc.com> wrote: > Files are attached. > > Could you try attaching a debugger to get stack traces? > > It is reducing to a smaller communicator for the coarse level. The processes > are likely both hung later in gamg.c:createLevel(). Mark, the appearance is > that all procs that call MPI_Comm_create() are also doing things on the newly > created communicator, even though it will be MPI_COMM_NULL on processes that > are not part of the subgroup. Also, I'm skeptical that you can get correct > results with MatPartitioningSetAdjacency(mpart,adj) when mpart and adj are on > different communicators. Those other rows of adj are not moved by > MatPartitioningApply_Parmetis(). > > I must be confused about what is actually happening. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120105/9fd6b9b6/attachment.html>