On Jan 18, 2008, at 2:17 PM, Dean Dauger, Ph. D. wrote:

I'm developing an mca_pls module, intending to drop it into a
preexisting Open MPI build (in its lib/openmpi directory) and have
orterun pick it up, but orterun kept crashing on me even though it
correctly calls my module.  To help isolate the issue I separately
recompiled the mca_pls_rsh module from a given Open MPI source
checkout and dropping that didn't work either.   Any pointers?

Which source checkout did you use? Note that the pls structures have likely changed between the OMPI SVN trunk and the v1.2 branch. So if you didn't use a checkout from the v1.2 branch, I would expect Random Bad Things (RBT's) to occur.

pingpong was compiled with the existing Open MPI, and it runs with
the built-in rsh module, but not when I replace the pls_rsh module
with a recompiled one.  When I add printf's in the pls_rsh module in
its _open and _init, I can show each of its subroutines return
without problem, but _launch is not yet called.  I'm running Mac OS X
10.5.1, which ships with Open MPI at /usr, on a MacBook Pro with an
Intel Core Duo.  ("Rotarran X.5" is the name of the computer.)  I
first attempted the 1.3.0 source code via svn, then went back to the
1.2.3 source code from Open MPI, but both gave the above bus error.
Then I went to Apple's copy of Open MPI 1.2.3 at opensource.apple.com
guessing Apple changed things, but that still doesn't work.  I've
tried their take on ./configure options too to no avail.  Other than
debugging orterun, what else can I try?

Hmm -- are you saying that you tried compiling the Apple copy of the rsh pls and/or the OMPI SVN v1.2.3 rsh pls and neither of them worked?

I don't rightly know why that wouldn't work -- is there a way to know with what compiler flags Apple built Open MPI? Can you step through mpirun with a debugger to see where it dies? I suspect it may not have any debugging symbols, so you might not, but at least you might be able to see which pls rsh functions are invoked...? (and more importantly, if something is invoked "wrong" in the pls rsh)

--
Jeff Squyres
Cisco Systems

Reply via email to