yes, it seems to be fixed. thanks. On Mon, Mar 31, 2008 at 9:17 PM, Ralph H Castain <r...@lanl.gov> wrote:
> I am unable to replicate the segfault. However, I was able to get the job > to > hang. I fixed that behavior with r18044. > > Perhaps you can test this again and let me know what you see. A gdb stack > trace would be more helpful. > > Thanks > Ralph > > > > On 3/31/08 5:13 AM, "Lenny Verkhovsky" <len...@voltaire.com> wrote: > > > > > > > > > I accidently run job on the hostfile where one of hosts was not properly > > mounted. As a result I got an error and a segfault. > > > > > > /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun -np 29 -hostfile hostfile > > ./mpi_p01 -t lt > > bash: /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/orted: No such file or > > directory > > ------------------------------------------------------------------------ > > -- > > A daemon (pid 9753) died unexpectedly with status 127 while attempting > > to launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have > > the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > ------------------------------------------------------------------------ > > -- > > ------------------------------------------------------------------------ > > -- > > mpirun was unable to start the specified application as it encountered > > an error. > > More information may be available above. > > ------------------------------------------------------------------------ > > -- > > [witch1:09745] *** Process received signal *** > > [witch1:09745] Signal: Segmentation fault (11) > > [witch1:09745] Signal code: Address not mapped (1) > > [witch1:09745] Failing at address: 0x3c > > [witch1:09745] [ 0] /lib64/libpthread.so.0 [0x2aff223ebc10] > > [witch1:09745] [ 1] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0 [0x2aff21cdfe21] > > [witch1:09745] [ 2] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_rml_oob.so > > [0x2aff22c398f1] > > [witch1:09745] [ 3] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so > > [0x2aff22d426ee] > > [witch1:09745] [ 4] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so > > [0x2aff22d433fb] > > [witch1:09745] [ 5] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_oob_tcp.so > > [0x2aff22d4485b] > > [witch1:09745] [ 6] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b] > > [witch1:09745] [ 7] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > > [0x403203] > > [witch1:09745] [ 8] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b] > > [witch1:09745] [ 9] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0(opal_progress+0x > > 8b) [0x2aff21e060cb] > > [witch1:09745] [10] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_trigger_eve > > nt+0x20) [0x2aff21cc6940] > > [witch1:09745] [11] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_wakeup+0x2d > > ) [0x2aff21cc776d] > > [witch1:09745] [12] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_plm_rsh.so > > [0x2aff22b34756] > > [witch1:09745] [13] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0 [0x2aff21cc6ea7] > > [witch1:09745] [14] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0 [0x2aff21e1242b] > > [witch1:09745] [15] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-pal.so.0(opal_progress+0x > > 8b) [0x2aff21e060cb] > > [witch1:09745] [16] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/libopen-rte.so.0(orte_plm_base_da > > emon_callback+0xad) [0x2aff21ce068d] > > [witch1:09745] [17] > > /home/USERS/lenny/OMPI_ORTE_TRUNK//lib/openmpi/mca_plm_rsh.so > > [0x2aff22b34e5e] > > [witch1:09745] [18] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > > [0x402e13] > > [witch1:09745] [19] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > > [0x402873] > > [witch1:09745] [20] /lib64/libc.so.6(__libc_start_main+0xf4) > > [0x2aff22512154] > > [witch1:09745] [21] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun > > [0x4027c9] > > [witch1:09745] *** End of error message *** > > Segmentation fault (core dumped) > > > > > > Best Regards, > > Lenny. > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >