Hello,

Le 2012-05-12 17:58, Keith Robison a écrit :
Hello!  I've run into a roadblock.

If I run the following command in the background, the assembler seems to stall, with the last output being the citation for the assembler

It is likely a problem with MPI, not Ray because there are no messages sent at this point, I think.


mpirun -hostfile hostfile.actinode34 -np 48 -stdin /dev/null /home/krobison/packages/Ray-v2.0-ReleaseCandidate5/Ray -i part.8.fasta -o ray.part.8.actinode34.c 1> ray.part.8.actinode34.c.out 2> ray.part.8.actinode34.c.err

Where hostfile.actinode34 reads:

actinode03 slots=24
actinode04 slots=24


if instead I run with a hostfile with only one host (either one of them) and -np 24, but otherwise the same command line, the assembler seems to be off and running.

So there seems to be a problem when establishing connections.


On which machine are you when launching mpirun/mpiexec ?


My .bashrc has

export PATH=$PATH:/act/openmpi/gnu/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/act/openmpi/gnu/lib

(the cluster vendor put the code in /act)
Any suggestions for what might be triggering this behavior?


In Open-MPI:

Communication between a core and itself is done with memory copying.

Communication between different cores within a machine is done with shared memory by default.

Communication between two cores on different machines can be done using various byte transfer layers.

If you have TCP/IP and nothing else, then Open-MPI will use tcp. In this case, one of the problems can be that you have more than 1 interface (excluding the loopback) on each host and that the wrong is used.


Can you ping actinode03 from actinode04 ?


Command:

ssh actinode03 ping actinode04


If you have Infiniband, then Open-MPI will use openib. In this case, one of the problems can be that the daemon that computes Infiniband routes between Infiniband communicators died or is
acting strangely.


Does it hangs too if you launch this command:

mpiexec -n 48-hostfile hostfile.actinode34 \
date


Can you provide the following output:


ompi_info -a


This is a network problem, I think.



                 Sébastien
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to