On Aug 19, 2013, at 8:02 PM, Ralph Castain <r...@open-mpi.org> wrote:
> That's how it works now. My concern is with the error message scenario. IIRC, > Jeff's issue was that the error message only contains the hostname of the > proc that generates it - it doesn't tell you the hostname of the remote proc. > Hence, we included that info in the proc_t. This is quite important for getting useful error messages. > However, IIRC we also provided an option to *not* send that info due to > scaling concerns way back when. I wonder if we can resolve this simply by > having Nathan set that option in his platform .conf files, and then removing > ompi_proc_get_hostname completely. Since the IP-based comm channels will call > modex_recv anyway, we'll get the hostname at that time. Otherwise, the errors > print "NULL" for proc->hostname. > > Yes, that means that users of direct-launched apps on Nathan's systems will > get less informative error messages - but they can always override Nathan's > default param if they want better info. After all, the vast majority of users > aren't running such big jobs as to care about this optimization. I'm good with it. It could also be (might already be) a run-time MCA param...? We could also default the value to -1 (vs. 0 or 1), meaning: with np <= N procs, send the hostname around, otherwise, don't send it (we can argue over the value of N -- e.g., 1024 or 2048). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/