Wolfgang and Timo -

Thank you very much for the advice, I am (for some reason that has not been sorted out) having trouble logging into the compute nodes, otherwise i would've gdb'ed the processes to see where they were. I seem to remember doing that before and seeing that one process was lagging behind - can't quite remember though.

Still, if that is indeed the cause, as i suspected, i thought that putting in a bunch of MPI_Barrier()'s leading up to the offending call or calls might solve the problem?

Thanks
Dan

Wolfgang Bangerth wrote:
I have not run in debug mode yet. That is something I need to try,
although I am not sure if this will result in the run ending before it
gets to the "good" part.

You should really try this first, it has in the past saved many an hour of debugging time for people.

As for possible causes if debug mode doesn't find it: the usual cause is that one processor thinks that something needs to be synchronised and sends a message then waiting for a reply. But another processor doesn't believe anything must be done and goes on with life until it wants to communicate about anything unrelated, sends a message and waits for a reply. Now both continue to wait forever. step-18, in the documentation in the code, discusses a couple of reasons why something like this could happen.

You need to figure out where in your code the two processors are hanging and see why they are where they are. One way to do that is to log into the various nodes on which your code is running and attaching a debugger to the running thread.

Best
 W.

-------------------------------------------------------------------------
Wolfgang Bangerth                email:            [email protected]
                                 www: http://www.math.tamu.edu/~bangerth/


_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii

Reply via email to