I'm sorry, but now I am totally confused. Are you saying that you are having
problems with the default rsh component in the distributed 1.2.3 code?? Or
are you having a problem with your customized version? What compiler are you
using? If it's your customized version, did you make sure to change the
names of the data structures and modules as I pointed out?

We regularly work on Macs, both PPC and Intel based (I develop and test on
both every day), and I have -never- seen this problem in our code base.
Hence my confusion.

Thanks
Ralph



On 1/23/08 8:08 PM, "Dean Dauger, Ph. D." <d...@daugerresearch.com> wrote:

> Hi All,
> 
> I think I have a possible explanation for this problem.  Previously
> orterun was jumping to 0x00000000:
> 
>> [Rotarran-X-5:04475] Failing at address: 0x0
>> [ 1] [0xbffff828, 0x00000000] (-P-)
> 
> On a hunch I tried changing the number of bool's in the
> orte_pls_rsh_component_t data structure of pls_rsh.h. Another bus
> error occurred with orterun jumping to 0x80000000 instead.  So I went
> further and changed the layout of the orte_pls_rsh_component_t struct
> from something like this:
> 
>      bool reap;
>      bool assume_same_shell;
>      bool force_rsh;
>      char** agent_argv;
>      int agent_argc;
>      char* agent_path;
> 
> to this:
> 
>      char** agent_argv;
>      char* agent_path;
>      int agent_argc;
>      int unusedInt;
>      bool reap;
>      bool assume_same_shell;
>      bool force_rsh;
>      bool unusedB;
> 
> recompiled, dropped the new .la and .so pieces in, and then it all
> worked.
> 
> My hunch is that I'm having a data alignment problem.  Perhaps the
> pointer reference to _launch of the pls module is stored after the
> orte_pls_rsh_component_t struct, but then alignment that given build
> assumes is different from that of my newly compiled pls module.
> Apple usually compiles with every type on its "natural" alignment in
> memory (PowerPC always liked it that way and the habit has stuck) and
> looking at 3 bools followed by a char** tells me there could be padding.
> 
> The problem, rather than whether or not to have padding, is what do
> we agree on.  I don't know who put what memory align compiler flag in
> what makefile or ./configure line, but if I rearrange the struct into
> the latter example above then I have no ambiguity, so orterun() calls
> _launch just fine in the rsh module and my own.
> 
> Thanks for your help,
>     Dean
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to