Appreciate the clarification. I am unaware of anyone attempting that
procedure in the past, but I'm not terribly surprised to hear it would
encounter problems and/or fail. Given the myriad of configuration options in
the code base, it would seem almost miraculous that you could either (a) hit
the same config options used by Apple (whatever they were), or (b) manage to
find a combination that matched enough to let you do this without problem.

Frankly, I'm surprised even this small a fix would let you work around the
problems... ;-)

Unless you have some overriding reason to use the shipped binaries for
everything other than this special component, you're probably going to have
a lot more success just rebuilding from source.

But that's just an opinion - either way, good luck with your efforts!
Ralph


On 1/24/08 10:54 AM, "Dean Dauger, Ph. D." <d...@daugerresearch.com> wrote:

>> I'm sorry, but now I am totally confused. Are you saying that you
>> are having
>> problems with the default rsh component in the distributed 1.2.3
>> code??
> 
> Yes ...
> 
>> Or are you having a problem with your customized version?
> 
> and yes.  Each exhibited the same problem, a bus error.
> 
>> What compiler are you using? If it's your customized version, did
>> you make sure to change the
>> names of the data structures and modules as I pointed out?
> 
> gcc 4.0.1, the default of Leopard.  Yes, in the customized version, I
> did change the names of the data structures, subroutines, support
> file names, and where it says "rsh" just like you said.
> 
>> We regularly work on Macs, both PPC and Intel based (I develop and
>> test on
>> both every day), and I have -never- seen this problem in our code
>> base.
>> Hence my confusion.
> 
> I'm sorry to confuse.  I'm starting with the shipping Mac OS X 10.5.1
> "Leopard", which contains its own build of Open MPI (v1.2.3 according
> to "orterun -version").  So I assumed that the v1.2.3 branch from
> svn.open-mpi.org was the same code Apple used to build the Open MPI
> that ships in Leopard.
> 
> My motivation was to build a new pls module based on pls_rsh module's
> source code, substituting the rsh with my own name like you said, but
> I encountered a bus error.  So to be sure I didn't screw up somewhere
> in my custom module I rebuilt the unmodified pls_rsh module and
> discovered the same problem.
> 
> Then, after downloading the Open MPI from opensource.apple.com
> (suspecting it was different), I tried recompiling the pls_rsh module
> from that source code, dropped in just the resulting mca_pls_rsh.la
> and mca_pls_rsh.so into the existing /usr/lib/openmpi of Leopard,
> overwriting Leopard's versions, and the bus error happened the same
> as before.
> 
> That's where I was with my first post to this list.
> 
> My last post regards the discovery that rearranging the elements of
> orte_pls_rsh_component_t, without changing anything else about the
> pls_rsh code, affects the bus error outcome.  Then I padded out
> orte_pls_rsh_component_t and my "orte_pls_dean_component_t" by hand
> so that it would be "data alignment agnostic", if you will.
> Consequently the bus error no longer occurs and both pls modules now
> run as they should.
> 
> My hypothesis: Apple's procedure to build Open MPI into Leopard had a
> side effect requiring shared object code structures to follow a data
> alignment different than if I simply recompile Open MPI straight from
> its source.
> 
> I'm not saying anyone is to blame, but I'm recognizing that those
> builds have different timelines.  I predict that if I overwrite all
> of Leopard's Open MPI object code, then it would all run too.
> 
> For my needs, I have a sufficient workaround: realign my data
> structures to be "agnostic".  I'm sharing this little discovery just
> in case it might help somebody else out there; for all I know it
> could happen on non-Macs too.
> 
> Thanks,
>    Dean
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to