Yeah, I have some concerns about it too...been trying to test it out some more. Would be good to see just how much that one change makes - maybe restoring just the hostname wouldn't have that big an impact.
I'm leery of trying to ensure we strip all the opal_output loops if we don't find the hostname. On Aug 19, 2013, at 2:41 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > As a result of this patch the first decode of a peer host name might happen > in the middle of a debug message (on the first call to > ompi_proc_get_hostname). Such a behavior might generate deadlocks based on > the level of output verbosity, and has significant potential to reintroduce > the recursive behavior the new state machine was supposed to remove. > > George. > > > On Aug 17, 2013, at 02:49 , svn-commit-mai...@open-mpi.org wrote: > >> Author: rhc (Ralph Castain) >> Date: 2013-08-16 20:49:18 EDT (Fri, 16 Aug 2013) >> New Revision: 29040 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/29040 >> >> Log: >> When we direct launch an application, we rely on PMI for wireup support. In >> doing so, we lose the de facto data compression we get from the ORTE modex >> since we no longer get all the wireup info from every proc in a single blob. >> Instead, we have to iterate over all the procs, calling PMI_KVS_get for >> every value we require. >> >> This creates a really bad scaling behavior. Users have found a nearly 20% >> launch time differential between mpirun and PMI, with PMI being the slower >> method. Some of the problem is attributable to poor exchange algorithms in >> RM's like Slurm and Alps, but we make things worse by calling "get" so many >> times. >> >> Nathan (with a tad advice from me) has attempted to alleviate this problem >> by reducing the number of "get" calls. This required the following changes: >> >> * upon first request for data, have the OPAL db pmi component fetch and >> decode *all* the info from a given remote proc. It turned out we weren't >> caching the info, so we would continually request it and only decode the >> piece we needed for the immediate request. We now decode all the info and >> push it into the db hash component for local storage - and then all >> subsequent retrievals are fulfilled locally >> >> * reduced the amount of data by eliminating the exchange of the OMPI_ARCH >> value if heterogeneity is not enabled. This was used solely as a check so we >> would error out if the system wasn't actually homogeneous, which was fine >> when we thought there was no cost in doing the check. Unfortunately, at >> large scale and with direct launch, there is a non-zero cost of making this >> test. We are open to finding a compromise (perhaps turning the test off if >> requested?), if people feel strongly about performing the test >> >> * reduced the amount of RTE data being automatically fetched, and fetched >> the rest only upon request. In particular, we no longer immediately fetch >> the hostname (which is only used for error reporting), but instead get it >> when needed. Likewise for the RML uri as that info is only required for some >> (not all) environments. In addition, we no longer fetch the locality unless >> required, relying instead on the PMI clique info to tell us who is on our >> local node (if additional info is required, the fetch is performed when a >> modex_recv is issued). >> >> Again, all this only impacts direct launch - all the info is provided when >> launched via mpirun as there is no added cost to getting it >> >> Barring objections, we may move this (plus any required other pieces) to the >> 1.7 branch once it soaks for an appropriate time. > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel