On Dec 30, 2005, at 4:15 AM, Graziano Giuliani wrote:
#0 0xb7ca2599 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c:
716
716 if (mca_pls_rsh_component.debug) {
which means we have a memory corruption somewhere else...
Agreed.
Investigating from outside on what may cause the problem, I have
found that I
can make the job run also changing the hostname in my hostfile.
-) No localhost in hostfile -> run
-) "wowbagger" or "localhost" in hostfile -> run
-) FQDN wowbagger.cluster in hostfile -> SIGSEGV
LOL -- I did a double take there because one of our machines is named
wowbagger; I had a horrid moment where I was wondering if that name
somehow accidentally got hard-coded in the OMPI code base. :-)
Ok, I think that I am able to reproduce this -- got to love these
Heisenbugs. :-(
Let me see what I can dig up...
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/