That solution is fine with me. -Nathan
On Tue, Aug 20, 2013 at 12:41:49AM +0200, George Bosilca wrote: > If your offer is between quadratic and non-deterministic, I'll take the > former. > > I would advocate for a middle-ground solution. Clearly document in the header > file that the ompi_proc_get_hostname is __not__ safe to be used in all > contexts as it might exhibit recursive behavior due to communications. Then > revert all its uses in the context of opal_output, opal_output_verbose and > all variants back to using "->proc_hostname". We might get a (null) instead > of the peer name, but this removes the potential loops. > > George. > > On Aug 19, 2013, at 23:52 , Nathan Hjelm <hje...@lanl.gov> wrote: > > > It would require a db read from every rank which is what we are trying > > to avoid. This scales quadratic at best on Cray systems. > > > > -Nathan > > > > On Mon, Aug 19, 2013 at 02:48:18PM -0700, Ralph Castain wrote: > >> Yeah, I have some concerns about it too...been trying to test it out some > >> more. Would be good to see just how much that one change makes - maybe > >> restoring just the hostname wouldn't have that big an impact. > >> > >> I'm leery of trying to ensure we strip all the opal_output loops if we > >> don't find the hostname. > >> > >> On Aug 19, 2013, at 2:41 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> > >>> As a result of this patch the first decode of a peer host name might > >>> happen in the middle of a debug message (on the first call to > >>> ompi_proc_get_hostname). Such a behavior might generate deadlocks based > >>> on the level of output verbosity, and has significant potential to > >>> reintroduce the recursive behavior the new state machine was supposed to > >>> remove. > >>> > >>> George. > >>> > >>> > >>> On Aug 17, 2013, at 02:49 , svn-commit-mai...@open-mpi.org wrote: > >>> > >>>> Author: rhc (Ralph Castain) > >>>> Date: 2013-08-16 20:49:18 EDT (Fri, 16 Aug 2013) > >>>> New Revision: 29040 > >>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29040 > >>>> > >>>> Log: > >>>> When we direct launch an application, we rely on PMI for wireup support. > >>>> In doing so, we lose the de facto data compression we get from the ORTE > >>>> modex since we no longer get all the wireup info from every proc in a > >>>> single blob. Instead, we have to iterate over all the procs, calling > >>>> PMI_KVS_get for every value we require. > >>>> > >>>> This creates a really bad scaling behavior. Users have found a nearly > >>>> 20% launch time differential between mpirun and PMI, with PMI being the > >>>> slower method. Some of the problem is attributable to poor exchange > >>>> algorithms in RM's like Slurm and Alps, but we make things worse by > >>>> calling "get" so many times. > >>>> > >>>> Nathan (with a tad advice from me) has attempted to alleviate this > >>>> problem by reducing the number of "get" calls. This required the > >>>> following changes: > >>>> > >>>> * upon first request for data, have the OPAL db pmi component fetch and > >>>> decode *all* the info from a given remote proc. It turned out we weren't > >>>> caching the info, so we would continually request it and only decode the > >>>> piece we needed for the immediate request. We now decode all the info > >>>> and push it into the db hash component for local storage - and then all > >>>> subsequent retrievals are fulfilled locally > >>>> > >>>> * reduced the amount of data by eliminating the exchange of the > >>>> OMPI_ARCH value if heterogeneity is not enabled. This was used solely as > >>>> a check so we would error out if the system wasn't actually homogeneous, > >>>> which was fine when we thought there was no cost in doing the check. > >>>> Unfortunately, at large scale and with direct launch, there is a > >>>> non-zero cost of making this test. We are open to finding a compromise > >>>> (perhaps turning the test off if requested?), if people feel strongly > >>>> about performing the test > >>>> > >>>> * reduced the amount of RTE data being automatically fetched, and > >>>> fetched the rest only upon request. In particular, we no longer > >>>> immediately fetch the hostname (which is only used for error reporting), > >>>> but instead get it when needed. Likewise for the RML uri as that info is > >>>> only required for some (not all) environments. In addition, we no longer > >>>> fetch the locality unless required, relying instead on the PMI clique > >>>> info to tell us who is on our local node (if additional info is > >>>> required, the fetch is performed when a modex_recv is issued). > >>>> > >>>> Again, all this only impacts direct launch - all the info is provided > >>>> when launched via mpirun as there is no added cost to getting it > >>>> > >>>> Barring objections, we may move this (plus any required other pieces) to > >>>> the 1.7 branch once it soaks for an appropriate time. > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel