I'm AFK but let me reply about the IB thing: double ports/multi rail is a good 
thing. It's not a good thing if they're on the same subnet.

Check the FAQ - http://www.open-mpi.org/faq/?category=openfabrics - I can't see 
it well enough on the small screen of my phone, but I think there's a q on 
there about how multi rail destinations are chosen.

Spoiler: put your ports in different subnets so that OMPI makes deterministic 
choices.

Sent from my phone. No type good.

On Jun 2, 2014, at 6:55 AM, "Gilles Gouaillardet" 
<gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote:

Jeff,

On Mon, Jun 2, 2014 at 7:26 PM, Jeff Squyres (jsquyres) 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
On Jun 2, 2014, at 5:03 AM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com<mailto:gilles.gouaillar...@gmail.com>> wrote:

> i faced a bit different problem, but that is 100% reproductible :
> - i launch mpirun (no batch manager) from a node with one IB port
> - i use -host node01,node02 where node01 and node02 both have two IB port on 
> the
>   same subnet

FWIW: 2 IB ports on the same subnet?  That's not a good idea.

could you please elaborate a bit ?
from what i saw, this basically doubles the bandwidth (imb PingPong benchmark) 
between two nodes (!) which is a not a bad thing.
i can only guess this might not scale (e.g. if 16 tasks is running on each 
host, the overhead associated with the use of two ports might void the extra 
bandwidth)

> by default, this will hang.

...but it still shouldn't hang.  I wonder if it's somehow related to 
https://svn.open-mpi.org/trac/ompi/ticket/4442...?

 i doubt it ...

here is my command line (from node0)
`which mpirun` -np 2 -host node1,node2 --mca rtc_freq_priority 0 --mca btl 
openib,self --mca btl_openib_if_include mlx4_0 ./abort
on top of that, the usnic btl is not built (nor installed)


> if this is a "feature" (e.g. openmpi does not support this kind of 
> configuration) i am fine with it.
>
> when i run mpirun --mca btl_openib_if_exclude mlx4_1, then if the application 
> is a success, then it works just fine.
>
> if the application calls MPI_Abort() /* and even if all tasks call 
> MPI_Abort() */ then it will hang 100% of the time.
> i do not see that as a feature but as a bug.

Yes, OMPI should never hang upon a call to MPI_Abort.

Can you get some stack traces to show where the hung process(es) are stuck?  
That would help Ralph pin down where things aren't working down in ORTE.

on node0 :

  \_ -bash
      \_ /.../local/ompi-trunk/bin/mpirun -np 2 -host node1,node2 --mca 
rtc_freq_priority 0 --mc
          \_ /usr/bin/ssh -x node1     PATH=/.../local/ompi-trunk/bin:$PATH ; 
export PATH ; LD_LIBRAR
          \_ /usr/bin/ssh -x node2     PATH=/.../local/ompi-trunk/bin:$PATH ; 
export PATH ; LD_LIBRAR


pstack (mpirun) :
$ pstack 10913
Thread 2 (Thread 0x7f0ecad35700 (LWP 10914)):
#0  0x0000003ba66e15e3 in select () from /lib64/libc.so.6
#1  0x00007f0ecad4391e in listen_thread () from 
/.../local/ompi-trunk/lib/openmpi/mca_oob_tcp.so
#2  0x0000003ba72079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003ba66e8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f0ecc601700 (LWP 10913)):
#0  0x0000003ba66df343 in poll () from /lib64/libc.so.6
#1  0x00007f0ecc6b1a05 in poll_dispatch () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#2  0x00007f0ecc6a641c in opal_libevent2021_event_base_loop () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#3  0x00000000004056a1 in orterun ()
#4  0x00000000004039f4 in main ()


on node 1 :

 sshd: gouaillardet@notty
  \_ bash -c     PATH=/.../local/ompi-trunk/bin:$PATH ; export PATH ; 
LD_LIBRARY_PATH=/...
      \_ /.../local/ompi-trunk/bin/orted -mca ess env -mca orte_ess_jobid 
3459448832 -mca orte_ess_vpid
          \_ [abort] <defunct>

$ pstack (orted)
#0  0x00007fe0ba6a0566 in vfprintf () from /lib64/libc.so.6
#1  0x00007fe0ba6c9a52 in vsnprintf () from /lib64/libc.so.6
#2  0x00007fe0ba6a9523 in snprintf () from /lib64/libc.so.6
#3  0x00007fe0bbc019b6 in orte_util_print_jobids () from 
/.../local/ompi-trunk/lib/libopen-rte.so.0
#4  0x00007fe0bbc01791 in orte_util_print_name_args () from 
/.../local/ompi-trunk/lib/libopen-rte.so.0
#5  0x00007fe0b8e16a8b in mca_oob_tcp_component_hop_unknown () from 
/.../local/ompi-trunk/lib/openmpi/mca_oob_tcp.so
#6  0x00007fe0bb94ab7a in event_process_active_single_queue () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#7  0x00007fe0bb94adf2 in event_process_active () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#8  0x00007fe0bb94b470 in opal_libevent2021_event_base_loop () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#9  0x00007fe0bbc1fa7b in orte_daemon () from 
/.../local/ompi-trunk/lib/libopen-rte.so.0
#10 0x000000000040093a in main ()


on node 2 :

 sshd: gouaillardet@notty
  \_ bash -c     PATH=/.../local/ompi-trunk/bin:$PATH ; export PATH ; 
LD_LIBRARY_PATH=/...
      \_ /.../local/ompi-trunk/bin/orted -mca ess env -mca orte_ess_jobid 
3459448832 -mca orte_ess_vpid
          \_ [abort] <defunct>

$ pstack (orted)
#0  0x00007fe8fc435e39 in strchrnul () from /lib64/libc.so.6
#1  0x00007fe8fc3ef8f5 in vfprintf () from /lib64/libc.so.6
#2  0x00007fe8fc41aa52 in vsnprintf () from /lib64/libc.so.6
#3  0x00007fe8fc3fa523 in snprintf () from /lib64/libc.so.6
#4  0x00007fe8fd9529b6 in orte_util_print_jobids () from 
/.../local/ompi-trunk/lib/libopen-rte.so.0
#5  0x00007fe8fd952791 in orte_util_print_name_args () from 
/.../local/ompi-trunk/lib/libopen-rte.so.0
#6  0x00007fe8fab6c1b5 in resend () from 
/.../local/ompi-trunk/lib/openmpi/mca_oob_tcp.so
#7  0x00007fe8fab67ce3 in mca_oob_tcp_component_hop_unknown () from 
/.../local/ompi-trunk/lib/openmpi/mca_oob_tcp.so
#8  0x00007fe8fd69bb7a in event_process_active_single_queue () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#9  0x00007fe8fd69bdf2 in event_process_active () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#10 0x00007fe8fd69c470 in opal_libevent2021_event_base_loop () from 
/.../local/ompi-trunk/lib/libopen-pal.so.0
#11 0x00007fe8fd970a7b in orte_daemon () from 
/...t/local/ompi-trunk/lib/libopen-rte.so.0
#12 0x000000000040093a in main ()


orted processes loop forever in event_process_active_single_queue
mca_oob_tcp_component_hop_unknown gets called again and again
mca_oob_tcp_component_hop_unknown (fd=-1, args=4, cbdata=0x99dc50) at 
../../../../../../src/ompi-trunk/orte/mca/oob/tcp/oob_tcp_component.c:1369

> in an other thread, Jeff mentionned that the usnic btl is doing stuff even if 
> there is no usnic hardware (this will be fixed shortly).
> Do you still see intermittent hang without listing usnic as a btl ?

Yeah, there's a definite race in the usnic BTL ATM.  If you care, here's what's 
happening:

thanks for the insights :-)

Cheers,

Gilles
_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/06/14943.php

Reply via email to