Hello Greg, I don't know, whether it's segfaulting at that particular line, but could You please print the argv, since I guess, that might be the local_exec_index into the argv being wrong?
Thanks, Rainer On Saturday 17 December 2005 19:16, Greg Watson wrote: > Here's the stacktrace: > > #0 0x00ae1fe8 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:714 > 714 if (mca_pls_rsh_component.debug) { > (gdb) where > #0 0x00ae1fe8 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:714 > #1 0x00a29642 in orte_rmgr_urm_spawn () > from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so > #2 0x0804a0d4 in orterun (argc=4, argv=0xbff88594) at orterun.c:373 > #3 0x08049b16 in main (argc=4, argv=0xbff88594) at main.c:13 > > And the contents of mca_pls_rsh_component: > > (gdb) p mca_pls_rsh_component > $2 = {super = {pls_version = {mca_major_version = 1, > mca_minor_version = 0, > mca_release_version = 0, mca_type_name = "pls", '\0' <repeats > 28 times>, > mca_type_major_version = 1, mca_type_minor_version = 0, > mca_type_release_version = 0, > mca_component_name = "rsh", '\0' <repeats 60 times>, > mca_component_major_version = 1, mca_component_minor_version = 0, > mca_component_release_version = 1, > mca_open_component = 0xae0a80 <orte_pls_rsh_component_open>, > mca_close_component = 0xae09a0 <orte_pls_rsh_component_close>}, > pls_data = {mca_is_checkpointable = true}, > pls_init = 0xae093c <orte_pls_rsh_component_init>}, debug = false, > reap = true, assume_same_shell = true, delay = 1, priority = 10, > argv = 0x90e0418, argc = 2, orted = 0x90de438 "orted", > path = 0x90e0960 "/usr/bin/ssh", num_children = 0, num_concurrent > = 128, > lock = {super = {obj_class = 0x804ec38, obj_reference_count = 1}, > m_lock_pthread = {__data = {__lock = 0, __count = 0, __owner = 0, > __kind = 0, __nusers = 0, __spins = 0}, > __size = '\0' <repeats 23 times>, __align = 0}, m_lock_atomic > = {u = { > lock = 0, sparc_lock = 0 '\0', padding = "\000\000\000"}}}, > cond = { > super = {obj_class = 0x804ec18, obj_reference_count = 1}, > c_waiting = 0, > c_signaled = 0, c_cond = {__data = {__lock = 0, __futex = 0, > __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex > = 0x0, > __nwaiters = 0, __broadcast_seq = 0}, > __size = '\0' <repeats 47 times>, __align = 0}}} > > I can't see why it is segfaulting at this particular line. > > Greg > > On Dec 16, 2005, at 5:55 PM, Jeff Squyres wrote: > > On Dec 16, 2005, at 10:47 AM, Greg Watson wrote: > >> I finally worked out why I couldn't reproduce the problem. You're not > >> going to like it though. > > > > You're right -- this kind of buglet is among the most un-fun. :-( > > > >> Here's the stacktracefrom the core file: > >> > >> #0 0x00e93fe8 in orte_pls_rsh_launch () > >> from /usr/local/ompi/lib/openmpi/mca_pls_rsh.so > >> #1 0x0023c642 in orte_rmgr_urm_spawn () > >> from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so > >> #2 0x0804a0d4 in orterun (argc=5, argv=0xbfab2e84) at orterun.c:373 > >> #3 0x08049b16 in main (argc=5, argv=0xbfab2e84) at main.c:13 > > > > Can you recompile this one file with -g? Specifically, cd into the > > orte/mca/pla/rsh dir and "make clean". Then "make". Then cut-n- > > paste the compile line for that one file to a shell prompt, and put > > in a -g. > > > > Then either re-install that component (it looks like you're doing a > > dynamic build with separate components, so you can do "make install" > > right from the rsh dir) or re-link liborte and re-install that and re- > > run. The corefile might give something a little more meaningful in > > this case...? > > > > -- > > {+} Jeff Squyres > > {+} The Open MPI Project > > {+} http://www.open-mpi.org/ > > > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- --------------------------------------------------------------------- Dipl.-Inf. Rainer Keller email: kel...@hlrs.de High Performance Computing Tel: ++49 (0)711-685 5858 Center Stuttgart (HLRS) Fax: ++49 (0)711-678 7626 POSTAL:Nobelstrasse 19 http://www.hlrs.de/people/keller ACTUAL:Allmandring 30, R. O.030 70550 Stuttgart