It would require a db read from every rank which is what we are trying to avoid. This scales quadratic at best on Cray systems.
-Nathan On Mon, Aug 19, 2013 at 02:48:18PM -0700, Ralph Castain wrote: > Yeah, I have some concerns about it too...been trying to test it out some > more. Would be good to see just how much that one change makes - maybe > restoring just the hostname wouldn't have that big an impact. > > I'm leery of trying to ensure we strip all the opal_output loops if we don't > find the hostname. > > On Aug 19, 2013, at 2:41 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > As a result of this patch the first decode of a peer host name might happen > > in the middle of a debug message (on the first call to > > ompi_proc_get_hostname). Such a behavior might generate deadlocks based on > > the level of output verbosity, and has significant potential to reintroduce > > the recursive behavior the new state machine was supposed to remove. > > > > George. > > > > > > On Aug 17, 2013, at 02:49 , svn-commit-mai...@open-mpi.org wrote: > > > >> Author: rhc (Ralph Castain) > >> Date: 2013-08-16 20:49:18 EDT (Fri, 16 Aug 2013) > >> New Revision: 29040 > >> URL: https://svn.open-mpi.org/trac/ompi/changeset/29040 > >> > >> Log: > >> When we direct launch an application, we rely on PMI for wireup support. > >> In doing so, we lose the de facto data compression we get from the ORTE > >> modex since we no longer get all the wireup info from every proc in a > >> single blob. Instead, we have to iterate over all the procs, calling > >> PMI_KVS_get for every value we require. > >> > >> This creates a really bad scaling behavior. Users have found a nearly 20% > >> launch time differential between mpirun and PMI, with PMI being the slower > >> method. Some of the problem is attributable to poor exchange algorithms in > >> RM's like Slurm and Alps, but we make things worse by calling "get" so > >> many times. > >> > >> Nathan (with a tad advice from me) has attempted to alleviate this problem > >> by reducing the number of "get" calls. This required the following changes: > >> > >> * upon first request for data, have the OPAL db pmi component fetch and > >> decode *all* the info from a given remote proc. It turned out we weren't > >> caching the info, so we would continually request it and only decode the > >> piece we needed for the immediate request. We now decode all the info and > >> push it into the db hash component for local storage - and then all > >> subsequent retrievals are fulfilled locally > >> > >> * reduced the amount of data by eliminating the exchange of the OMPI_ARCH > >> value if heterogeneity is not enabled. This was used solely as a check so > >> we would error out if the system wasn't actually homogeneous, which was > >> fine when we thought there was no cost in doing the check. Unfortunately, > >> at large scale and with direct launch, there is a non-zero cost of making > >> this test. We are open to finding a compromise (perhaps turning the test > >> off if requested?), if people feel strongly about performing the test > >> > >> * reduced the amount of RTE data being automatically fetched, and fetched > >> the rest only upon request. In particular, we no longer immediately fetch > >> the hostname (which is only used for error reporting), but instead get it > >> when needed. Likewise for the RML uri as that info is only required for > >> some (not all) environments. In addition, we no longer fetch the locality > >> unless required, relying instead on the PMI clique info to tell us who is > >> on our local node (if additional info is required, the fetch is performed > >> when a modex_recv is issued). > >> > >> Again, all this only impacts direct launch - all the info is provided when > >> launched via mpirun as there is no added cost to getting it > >> > >> Barring objections, we may move this (plus any required other pieces) to > >> the 1.7 branch once it soaks for an appropriate time. > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel