On Dec 16, 2005, at 10:47 AM, Greg Watson wrote:
I finally worked out why I couldn't reproduce the problem. You're not
going to like it though.
You're right -- this kind of buglet is among the most un-fun. :-(
Here's the stacktracefrom the core file:
#0 0x00e93fe8 in orte_pls_rsh_launch ()
from /usr/local/ompi/lib/openmpi/mca_pls_rsh.so
#1 0x0023c642 in orte_rmgr_urm_spawn ()
from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so
#2 0x0804a0d4 in orterun (argc=5, argv=0xbfab2e84) at orterun.c:373
#3 0x08049b16 in main (argc=5, argv=0xbfab2e84) at main.c:13
Can you recompile this one file with -g? Specifically, cd into the
orte/mca/pla/rsh dir and "make clean". Then "make". Then cut-n-
paste the compile line for that one file to a shell prompt, and put
in a -g.
Then either re-install that component (it looks like you're doing a
dynamic build with separate components, so you can do "make install"
right from the rsh dir) or re-link liborte and re-install that and re-
run. The corefile might give something a little more meaningful in
this case...?
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/