Hmmm...but then proc->hostname will *never* be filled in, because it is only 
ever accessed in an error message - i.e., in opal_output and its variants.

If we are not going to retrieve it be default, then we need another solution 
*if* we want hostnames for error messages under direct launch. If we don't 
care, then we can ignore this issue and follow the proposal.

I suppose one could ask why we are even bothering with hostname since the 
opal_output message includes the hostname in its prefix anyway.

Jeff: this was your baby - what do you think?


On Aug 19, 2013, at 3:43 PM, Nathan Hjelm <hje...@lanl.gov> wrote:

> That solution is fine with me.
> 
> -Nathan
> 
> On Tue, Aug 20, 2013 at 12:41:49AM +0200, George Bosilca wrote:
>> If your offer is between quadratic and non-deterministic, I'll take the 
>> former.
>> 
>> I would advocate for a middle-ground solution. Clearly document in the 
>> header file that the ompi_proc_get_hostname is __not__ safe to be used in 
>> all contexts as it might exhibit recursive behavior due to communications. 
>> Then revert all its uses in the context of opal_output, opal_output_verbose 
>> and all variants back to using "->proc_hostname". We might get a (null) 
>> instead of the peer name, but this removes the potential loops.
>> 
>>  George.
>> 
>> On Aug 19, 2013, at 23:52 , Nathan Hjelm <hje...@lanl.gov> wrote:
>> 
>>> It would require a db read from every rank which is what we are trying
>>> to avoid. This scales quadratic at best on Cray systems.
>>> 
>>> -Nathan
>>> 
>>> On Mon, Aug 19, 2013 at 02:48:18PM -0700, Ralph Castain wrote:
>>>> Yeah, I have some concerns about it too...been trying to test it out some 
>>>> more. Would be good to see just how much that one change makes - maybe 
>>>> restoring just the hostname wouldn't have that big an impact.
>>>> 
>>>> I'm leery of trying to ensure we strip all the opal_output loops if we 
>>>> don't find the hostname.
>>>> 
>>>> On Aug 19, 2013, at 2:41 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>> 
>>>>> As a result of this patch the first decode of a peer host name might 
>>>>> happen in the middle of a debug message (on the first call to 
>>>>> ompi_proc_get_hostname). Such a behavior might generate deadlocks based 
>>>>> on the level of output verbosity, and has significant potential to 
>>>>> reintroduce the recursive behavior the new state machine was supposed to 
>>>>> remove.
>>>>> 
>>>>> George.
>>>>> 
>>>>> 
>>>>> On Aug 17, 2013, at 02:49 , svn-commit-mai...@open-mpi.org wrote:
>>>>> 
>>>>>> Author: rhc (Ralph Castain)
>>>>>> Date: 2013-08-16 20:49:18 EDT (Fri, 16 Aug 2013)
>>>>>> New Revision: 29040
>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29040
>>>>>> 
>>>>>> Log:
>>>>>> When we direct launch an application, we rely on PMI for wireup support. 
>>>>>> In doing so, we lose the de facto data compression we get from the ORTE 
>>>>>> modex since we no longer get all the wireup info from every proc in a 
>>>>>> single blob. Instead, we have to iterate over all the procs, calling 
>>>>>> PMI_KVS_get for every value we require.
>>>>>> 
>>>>>> This creates a really bad scaling behavior. Users have found a nearly 
>>>>>> 20% launch time differential between mpirun and PMI, with PMI being the 
>>>>>> slower method. Some of the problem is attributable to poor exchange 
>>>>>> algorithms in RM's like Slurm and Alps, but we make things worse by 
>>>>>> calling "get" so many times.
>>>>>> 
>>>>>> Nathan (with a tad advice from me) has attempted to alleviate this 
>>>>>> problem by reducing the number of "get" calls. This required the 
>>>>>> following changes:
>>>>>> 
>>>>>> * upon first request for data, have the OPAL db pmi component fetch and 
>>>>>> decode *all* the info from a given remote proc. It turned out we weren't 
>>>>>> caching the info, so we would continually request it and only decode the 
>>>>>> piece we needed for the immediate request. We now decode all the info 
>>>>>> and push it into the db hash component for local storage - and then all 
>>>>>> subsequent retrievals are fulfilled locally
>>>>>> 
>>>>>> * reduced the amount of data by eliminating the exchange of the 
>>>>>> OMPI_ARCH value if heterogeneity is not enabled. This was used solely as 
>>>>>> a check so we would error out if the system wasn't actually homogeneous, 
>>>>>> which was fine when we thought there was no cost in doing the check. 
>>>>>> Unfortunately, at large scale and with direct launch, there is a 
>>>>>> non-zero cost of making this test. We are open to finding a compromise 
>>>>>> (perhaps turning the test off if requested?), if people feel strongly 
>>>>>> about performing the test
>>>>>> 
>>>>>> * reduced the amount of RTE data being automatically fetched, and 
>>>>>> fetched the rest only upon request. In particular, we no longer 
>>>>>> immediately fetch the hostname (which is only used for error reporting), 
>>>>>> but instead get it when needed. Likewise for the RML uri as that info is 
>>>>>> only required for some (not all) environments. In addition, we no longer 
>>>>>> fetch the locality unless required, relying instead on the PMI clique 
>>>>>> info to tell us who is on our local node (if additional info is 
>>>>>> required, the fetch is performed when a modex_recv is issued).
>>>>>> 
>>>>>> Again, all this only impacts direct launch - all the info is provided 
>>>>>> when launched via mpirun as there is no added cost to getting it
>>>>>> 
>>>>>> Barring objections, we may move this (plus any required other pieces) to 
>>>>>> the 1.7 branch once it soaks for an appropriate time.
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to