> 
> - Improved support for Cray
> 
> Cray's compilers, networks or the programming environment in general?

I can compile on our Cray XC30, but not run with the options I used previously 
with trunk. Is there some secret sauce I am missing here ?
I get an error with OOB on the node daemons. ESS PMI, RAS and PLM ALPS.


/lustre/medusa/bouteill/openmpi-1.8.5rc1/bin/mpirun -np 1   -mca btl 
ugni,sm,self -mca coll tuned,basic,self -mca orte_tmpdir_base /var/tmp -mca 
plm_base_strip_prefix_from_node_names 1 -nolocal -novm  --debug-daemons -mca 
oob_base_verbose 1000 -mca ras_alps_apstat_cmd $(which apstat) -mca ras alps  
-mca oob_tcp_if_include ipogif0  -map-by node hostname
[aprun6-darter:16915] mca: base: components_register: registering oob components
[aprun6-darter:16915] mca: base: components_register: found loaded component tcp
[aprun6-darter:16915] mca: base: components_register: component tcp register 
function successful
[aprun6-darter:16915] mca: base: components_open: opening oob components
[aprun6-darter:16915] mca: base: components_open: found loaded component tcp
[aprun6-darter:16915] mca: base: components_open: component tcp open function 
successful
[aprun6-darter:16915] mca:oob:select: checking available component tcp
[aprun6-darter:16915] mca:oob:select: Querying component [tcp]
[aprun6-darter:16915] oob:tcp: component_available called
[aprun6-darter:16915] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[aprun6-darter:16915] [[54804,0],0] oob:tcp:init rejecting interface lo (not in 
include list)
[aprun6-darter:16915] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4
[aprun6-darter:16915] [[54804,0],0] oob:tcp:init rejecting interface lo (not in 
include list)
[aprun6-darter:16915] WORKING INTERFACE 3 KERNEL INDEX 3 FAMILY: V4
[aprun6-darter:16915] [[54804,0],0] oob:tcp:init adding 10.128.2.134 to our 
list of V4 connections
[aprun6-darter:16915] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
[aprun6-darter:16915] [[54804,0],0] oob:tcp:init rejecting interface eth1 (not 
in include list)
[aprun6-darter:16915] [[54804,0],0] TCP STARTUP
[aprun6-darter:16915] [[54804,0],0] attempting to bind to IPv4 port 0
[aprun6-darter:16915] [[54804,0],0] assigned IPv4 port 57286
[aprun6-darter:16915] mca:oob:select: Adding component to end
[aprun6-darter:16915] mca:oob:select: Found 1 active transports
[nid00414:32573] mca: base: components_register: registering oob components
[nid00414:32573] mca: base: components_register: found loaded component tcp
[nid00414:32573] mca: base: components_register: component tcp register 
function successful
[nid00414:32573] mca: base: components_open: opening oob components
[nid00414:32573] mca: base: components_open: found loaded component tcp
[nid00414:32573] mca: base: components_open: component tcp open function 
successful
[nid00414:32573] mca:oob:select: checking available component tcp
[nid00414:32573] mca:oob:select: Querying component [tcp]
[nid00414:32573] oob:tcp: component_available called
[nid00414:32573] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[nid00414:32573] [[54804,0],1] oob:tcp:init rejecting interface lo (not in 
include list)
[nid00414:32573] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[nid00414:32573] [[54804,0],1] oob:tcp:init adding 10.128.1.161 to our list of 
V4 connections
[nid00414:32573] [[54804,0],1] TCP STARTUP
[nid00414:32573] [[54804,0],1] attempting to bind to IPv4 port 0
[nid00414:32573] [[54804,0],1] assigned IPv4 port 57372
[nid00414:32573] mca:oob:select: Adding component to end
[nid00414:32573] mca:oob:select: Found 1 active transports
Daemon [[54804,0],1] checking in as pid 32573 on host nid00414
[nid00414:32573] [[54804,0],1] orted: up and running - waiting for commands!
[nid00414:32573] [[54804,0],1] OOB_SEND: rml_oob_send.c:199
[nid00414:32573] [[54804,0],1] OOB_SEND: rml_oob_send.c:199
[nid00414:32573] [[54804,0],1]: set_addr to uri 
3591634944.0;tcp://10.128.2.134:57286
[nid00414:32573] [[54804,0],1]:set_addr checking if peer [[54804,0],0] is 
reachable via component tcp
[nid00414:32573] [[54804,0],1] oob:tcp: working peer [[54804,0],0] address 
tcp://10.128.2.134:57286
[nid00414:32573] [[54804,0],1] PASSING ADDR 10.128.2.134 TO MODULE
[nid00414:32573] [[54804,0],1]:tcp set addr for peer [[54804,0],0]
[nid00414:32573] [[54804,0],1]: peer [[54804,0],0] is reachable via component 
tcp
[nid00414:32573] [[54804,0],1] OOB_SEND: rml_oob_send.c:199
[nid00414:32573] [[54804,0],1] oob:base:send to target [[INVALID],INVALID]
[nid00414:32573] [[54804,0],1] oob:base:send unknown peer [[INVALID],INVALID]
[nid00414:32573] [[54804,0],1] is NOT reachable by TCP
Application 1329706 exit codes: 1
Application 1329706 resources: utime ~0s, stime ~0s, Rss ~5304, inblocks ~6404, 
outblocks ~28
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
[aprun6-darter:16915] [[54804,0],0] TCP SHUTDOWN
[aprun6-darter:16915] mca: base: close: component tcp closed
[aprun6-darter:16915] mca: base: close: unloading component tcp



--
Aurélien Bouteiller ~ https://icl.cs.utk.edu/~bouteill/


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to