I am unable to replicate the segfault. However, I was able to get the job to hang. I fixed that behavior with r18044.
Perhaps you can test this again and let me know what you see. A gdb stack trace would be more helpful. Thanks Ralph On 3/31/08 5:13 AM, "Lenny Verkhovsky" <len...@voltaire.com> wrote: > > > > I accidently run job on the hostfile where one of hosts was not properly > mounted. As a result I got an error and a segfault. > > > /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun -np 29 -hostfile hostfile > ./mpi_p01 -t lt > bash: /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/orted: No such file or > directory > ------------------------------------------------------------------------ > -- > A daemon (pid 9753) died unexpectedly with status 127 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have > the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > ------------------------------------------------------------------------ > -- > ------------------------------------------------------------------------ > -- > mpirun was unable to start the specified application as it encountered > an error. > More information may be available above. > ------------------------------------------------------------------------ > -- > [witch1:09745] *** Process received signal *** > [witch1:09745] Signal: Segmentation fault (11) > [witch1:09745] Signal code: Address not mapped (1) > [witch1:09745] Failing at address: 0x3c > [witch1:09745] [ 0] /lib64/libpthread.so.0 [0x2aff223ebc10] > [witch1:09745] [ 1] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0 [0x2aff21cdfe21] > [witch1:09745] [ 2] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_rml_oob.so > [0x2aff22c398f1] > [witch1:09745] [ 3] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so > [0x2aff22d426ee] > [witch1:09745] [ 4] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so > [0x2aff22d433fb] > [witch1:09745] [ 5] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so > [0x2aff22d4485b] > [witch1:09745] [ 6] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b] > [witch1:09745] [ 7] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > [0x403203] > [witch1:09745] [ 8] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b] > [witch1:09745] [ 9] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0(opal_progress+0x > 8b) [0x2aff21e060cb] > [witch1:09745] [10] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_trigger_eve > nt+0x20) [0x2aff21cc6940] > [witch1:09745] [11] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_wakeup+0x2d > ) [0x2aff21cc776d] > [witch1:09745] [12] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_plm_rsh.so > [0x2aff22b34756] > [witch1:09745] [13] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0 [0x2aff21cc6ea7] > [witch1:09745] [14] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b] > [witch1:09745] [15] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0(opal_progress+0x > 8b) [0x2aff21e060cb] > [witch1:09745] [16] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_plm_base_da > emon_callback+0xad) [0x2aff21ce068d] > [witch1:09745] [17] > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_plm_rsh.so > [0x2aff22b34e5e] > [witch1:09745] [18] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > [0x402e13] > [witch1:09745] [19] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > [0x402873] > [witch1:09745] [20] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x2aff22512154] > [witch1:09745] [21] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > [0x4027c9] > [witch1:09745] *** End of error message *** > Segmentation fault (core dumped) > > > Best Regards, > Lenny. > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel