I'm sorry, but now I am totally confused. Are you saying that you are having problems with the default rsh component in the distributed 1.2.3 code?? Or are you having a problem with your customized version? What compiler are you using? If it's your customized version, did you make sure to change the names of the data structures and modules as I pointed out?
We regularly work on Macs, both PPC and Intel based (I develop and test on both every day), and I have -never- seen this problem in our code base. Hence my confusion. Thanks Ralph On 1/23/08 8:08 PM, "Dean Dauger, Ph. D." <d...@daugerresearch.com> wrote: > Hi All, > > I think I have a possible explanation for this problem. Previously > orterun was jumping to 0x00000000: > >> [Rotarran-X-5:04475] Failing at address: 0x0 >> [ 1] [0xbffff828, 0x00000000] (-P-) > > On a hunch I tried changing the number of bool's in the > orte_pls_rsh_component_t data structure of pls_rsh.h. Another bus > error occurred with orterun jumping to 0x80000000 instead. So I went > further and changed the layout of the orte_pls_rsh_component_t struct > from something like this: > > bool reap; > bool assume_same_shell; > bool force_rsh; > char** agent_argv; > int agent_argc; > char* agent_path; > > to this: > > char** agent_argv; > char* agent_path; > int agent_argc; > int unusedInt; > bool reap; > bool assume_same_shell; > bool force_rsh; > bool unusedB; > > recompiled, dropped the new .la and .so pieces in, and then it all > worked. > > My hunch is that I'm having a data alignment problem. Perhaps the > pointer reference to _launch of the pls module is stored after the > orte_pls_rsh_component_t struct, but then alignment that given build > assumes is different from that of my newly compiled pls module. > Apple usually compiles with every type on its "natural" alignment in > memory (PowerPC always liked it that way and the habit has stuck) and > looking at 3 bools followed by a char** tells me there could be padding. > > The problem, rather than whether or not to have padding, is what do > we agree on. I don't know who put what memory align compiler flag in > what makefile or ./configure line, but if I rearrange the struct into > the latter example above then I have no ambiguity, so orterun() calls > _launch just fine in the rsh module and my own. > > Thanks for your help, > Dean > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel