Re: [OMPI devel] UD BTL alltoall hangs

Andrew Friedley Wed, 29 Aug 2007 15:15:58 -0400

Thanks for the suggestion; though that appears to hang with no outputwhatsoever.


Andrew


Aurelien Bouteiller wrote:

You should try mpirun -np 2 -bynode totalview ./NPmpi

Aurelien
Le 29 août 07 à 13:05, Andrew Friedley a écrit :

OK, I've never used totalview before.  So doing some FAQ reading I got

an xterm on an Atlas node (odin doesn't have totalview AFAIK).Trying a

simple netpipe run just to get familiar with things results in this:

$ mpirun -debug -np 2 -bynode -debug-daemons ./NPmpi

--------------------------------------------------------------------------Internal error -- the orte_base_user_debugger MCA parameter was notable to

be found.  Please contact the Open RTE developers; this should not
happen.

--------------------------------------------------------------------------


Grepping for that param in ompi_info shows:

         MCA orte: parameter "orte_base_user_debugger" (current value:
                   "totalview @mpirun@ -a @mpirun_args@ : ddt -n @np@

-start @executable@ @executable_argv@@single_app@ :

                   fxp @mpirun@ -a @mpirun_args@")

What's going on? I also tried running totalview directly, using aline

like this:

totalview mpirun -a -np 2 -bynode -debug-daemons ./NPmpi

Totalview comes up and seems to be running debugging the mpirunprocess,with only one thread. Doesn't seem to be aware that this is an MPIjob

with other MPI processes.. any ideas?

Andrew

George Bosilca wrote:

The first step will be to figure out which version of the alltoall
you're using. I suppose you use the default parameters, and then the
decision function in the tuned component say it is using the linear
all to all. As the name state it, this means that every node will
post one receive from any other node and then will start sending to
every other node the respective fragment. This will lead to a lot of
outstanding sends and receives. I doubt that the receive can cause a
problem, so I expect the problem is coming from the send side.

Do you have TotalView installed on your odin ? If yes there is a
simple way to see how many sends are pending and where ... That might
pinpoint [at least] the process where you should look to see what'
wrong.

   george.

On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote:

I'm having a problem with the UD BTL and hoping someone might have
some
input to help solve it.

What I'm seeing is hangs when running alltoall benchmarks with
nbcbench

or an LLNL program called mpiBench -- both hang exactly the sameway.

With the code on the trunk running nbcbench on IU's odin using 32
nodes
and a command line like this:

mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p
128-128
-s 1-262144

hangs consistently when testing 256-byte messages.  There are two
things
I can do to make the hang go away until running at larger scale.
First
is to increase the 'btl_ofud_sd_num' MCA param from its default
value of

128. This allows you to run with more procs/nodes before hittingthehang, but AFAICT doesn't fix the actual problem. What thisparameterdoes is control the maximum number of outstanding send WQEsposted at

the IB level -- when the limit is reached, frags are queued on an
opal_list_t and later sent by progress as IB sends complete.

The other way I've found is to play games with calling
mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send
().  In
fact I replaced the CHECK_FRAG_QUEUES() macro used around
btl_ofud_endpoint.c:77 with a version that loops on progress until a

send WQE slot is available (as opposed to queueing). Same result-- I

can run at larger scale, but still hit the hang eventually.

It appears that when the job hangs, progress is being polled very

quickly, and after spinning for a while there are no outstandingsendWQEs or queued sends in the BTL. I'm not sure where further upthings

are spinning/blocking, as I can't produce the hang at less than 32
nodes
/ 128 procs and don't have a good way of debugging that (suggestions
appreciated).

Furthermore, both ob1 and dr PMLs result in the same behavior,except

that DR eventually trips a watchdog timeout, fails the BTL, and
terminates the job.

Other collectives such as allreduce and allgather do not hang --only

alltoall.  I can also reproduce the hang on LLNL's Atlas machine.

Can anyone else reproduce this (Torsten might have to make a copy of
nbcbench available)?  Anyone have any ideas as to what's wrong?

Andrew
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] UD BTL alltoall hangs

Reply via email to