This problem has 11088 equations and the two processor run has 2 equations 
partitioned to process 0 and 11086 to process 1.  So I think you are hitting a 
corner case in the code when it reduces the number of active processors.  I 
will need to debug this.  

I might be able to reason this out with your data.  I'm on travel through next 
week so I'm not sure when I wlll be able to take a look at this.  I might ask 
for a binary matrix file for this two processor run.  So if this is easy for 
you to do, then maybe you could just do this.  In the mean time I will try to 
think about what is going wrong here.

This code should work now if you give it a more normal partitioning, but there 
is a bug here and I want to fix it.

Mark

On Jan 5, 2012, at 6:41 PM, Jed Brown wrote:

> On Thu, Jan 5, 2012 at 17:13, Ravi Kannan <rxk at cfdrc.com> wrote:
> Files are attached.
> 
> Could you try attaching a debugger to get stack traces?
> 
> It is reducing to a smaller communicator for the coarse level. The processes 
> are likely both hung later in gamg.c:createLevel(). Mark, the appearance is 
> that all procs that call MPI_Comm_create() are also doing things on the newly 
> created communicator, even though it will be MPI_COMM_NULL on processes that 
> are not part of the subgroup. Also, I'm skeptical that you can get correct 
> results with MatPartitioningSetAdjacency(mpart,adj) when mpart and adj are on 
> different communicators. Those other rows of adj are not moved by 
> MatPartitioningApply_Parmetis().
> 
> I must be confused about what is actually happening.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120105/9fd6b9b6/attachment.html>

Reply via email to