Ralph,

Sorry to be the bearer of more bad news.
The "good" news is I've seen the new warning regarding the lack of a
loopback interface.
The BAD news is that it is occurring on a Linux cluster that I'ver verified
DOES have 'lo' configured on the front-end and compute nodes (UP and
RUNNING according to ifconfig).

Though run with "-np 2" the warning appears FIVE times.
ADDITIONALLY, there is a SEGV at exit!

Unfortunately, despite configuring with --enable-debug, I didn't get line
numbers from the core (and there was no backtrace printed).

All of this appears below (and no, "-mca mtl psm" is not a typo or a joke).

Let me know what tracing flags to apply to gather the info needed to debug
this.

-Paul


$ mpirun -mca btl sm,self -np 2 -host n15,n16 -mca mtl psm examples/ring_c
--------------------------------------------------------------------------
WARNING: No loopback interface was found. This can cause problems
when we spawn processes as they are likely to be unable to connect
back to their host daemon. Sadly, it may take awhile for the connect
attempt to fail, so you may experience a significant hang time.

You may wish to ctrl-c out of your job and activate loopback support
on at least one interface before trying again.

--------------------------------------------------------------------------
[... above message FOUR more times ...]
Process 1 exiting
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node n15 exited on signal
11 (Segmentation fault).
--------------------------------------------------------------------------

$ /sbin/ifconfig lo
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:481228 errors:0 dropped:0 overruns:0 frame:0
          TX packets:481228 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:81039065 (77.2 MiB)  TX bytes:81039065 (77.2 MiB)

$ ssh n15 /sbin/ifconfig lo
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:24885 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24885 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1509940 (1.4 MiB)  TX bytes:1509940 (1.4 MiB)

$ ssh n16 /sbin/ifconfig lo
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:24938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24938 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1543408 (1.4 MiB)  TX bytes:1543408 (1.4 MiB)

$ gdb examples/ring_c core.29728
[...]
(gdb) where
#0  0x0000002a97a19980 in ?? ()
#1  <signal handler called>
#2  0x0000003a6d40607c in _Unwind_FindEnclosingFunction () from
/lib64/libgcc_s.so.1
#3  0x0000003a6d406b57 in _Unwind_RaiseException () from
/lib64/libgcc_s.so.1
#4  0x0000003a6d406c4c in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#5  0x0000003a6c30ac50 in __pthread_unwind () from
/lib64/tls/libpthread.so.0
#6  0x0000003a6c305202 in sigcancel_handler () from
/lib64/tls/libpthread.so.0
#7  <signal handler called>
#8  0x0000003a6b6bd9a2 in poll () from /lib64/tls/libc.so.6
#9  0x0000002a978f8f7d in ?? ()
#10 0x002000010000000e in ?? ()
#11 0x0000000000000000 in ?? ()

-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to