On Dec 30, 2005, at 4:15 AM, Graziano Giuliani wrote:

#0 0xb7ca2599 in orte_pls_rsh_launch (jobid=1) at pls_rsh_module.c: 716
716                         if (mca_pls_rsh_component.debug) {

which means we have a memory corruption somewhere else...

Agreed.

Investigating from outside on what may cause the problem, I have found that I
can make the job run also changing the hostname in my hostfile.

-) No localhost in hostfile -> run
-) "wowbagger" or "localhost" in hostfile -> run
-) FQDN wowbagger.cluster in hostfile -> SIGSEGV

LOL -- I did a double take there because one of our machines is named wowbagger; I had a horrid moment where I was wondering if that name somehow accidentally got hard-coded in the OMPI code base. :-)

Ok, I think that I am able to reproduce this -- got to love these Heisenbugs. :-(

Let me see what I can dig up...

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/


Reply via email to