Hi Ralph,

That solves the several mtt failures involving collective and 
intercommunicators (allgather_inter and friends), that occured when running 
with --mca mpi_procs_cutoff 0.
I could reproduce the issue with 8 tasks or more and two nodes 
(Not sure two nodes matter here...)

In this case proc_list[i] might be a sentinel, so it is not always possible to 
simply access proc_list[i]->super.proc_name

Note this commit was incomplete and i pushed a second one when i figured it out.

Cheers,

Gilles

Ralph Castain <[email protected]> wrote:
>Hi Gilles
>
>Could you please explain this one - I honestly don’t understand the change, 
>and haven’t encountered a problem.
>
>Thanks
>Ralph
>
>
>> On Jan 5, 2016, at 11:22 PM, [email protected] wrote:
>> 
>> This is an automated email from the git hooks/post-receive script. It was
>> generated because a ref change was pushed to the repository containing
>> the project "open-mpi/ompi".
>> 
>> The branch, master has been updated
>>       via  213b2abde47cf02ba3152a301d3ec0ffeec54438 (commit)
>>      from  e4bdad09c1bf7f11dada5ae6ac32e052b553ce4b (commit)
>> 
>> Those revisions listed above that are new to this repository have
>> not appeared on any other notification email; so we list those
>> revisions in full, below.
>> 
>> - Log -----------------------------------------------------------------
>> https://github.com/open-mpi/ompi/commit/213b2abde47cf02ba3152a301d3ec0ffeec54438
>> 
>> commit 213b2abde47cf02ba3152a301d3ec0ffeec54438
>> Author: Gilles Gouaillardet <[email protected]>
>> Date:   Wed Jan 6 16:21:13 2016 +0900
>> 
>>    dpm: correctly handle procs_cutoff in ompi_dpm_connect_accept()
>> 
>> diff --git a/ompi/dpm/dpm.c b/ompi/dpm/dpm.c
>> index 9a236d0..b1c562e 100644
>> --- a/ompi/dpm/dpm.c
>> +++ b/ompi/dpm/dpm.c
>> @@ -16,7 +16,7 @@
>>  * Copyright (c) 2011-2015 Los Alamos National Security, LLC.  All rights
>>  *                         reserved.
>>  * Copyright (c) 2013-2015 Intel, Inc. All rights reserved
>> - * Copyright (c) 2014-2015 Research Organization for Information Science
>> + * Copyright (c) 2014-2016 Research Organization for Information Science
>>  *                         and Technology (RIST). All rights reserved.
>>  * $COPYRIGHT$
>>  *
>> @@ -167,7 +167,13 @@ int ompi_dpm_connect_accept(ompi_communicator_t *comm, 
>> int root,
>>             dense = false;
>>         }
>>         for (i=0; i < size; i++) {
>> -            rc = opal_convert_process_name_to_string(&nstring, 
>> &(proc_list[i]->super.proc_name));
>> +            opal_process_name_t proc_name;
>> +            if (ompi_proc_is_sentinel (proc_list[i])) {
>> +                proc_name = ompi_proc_sentinel_to_name ((intptr_t) 
>> proc_list[i]);
>> +            } else {
>> +                proc_name = proc_list[i]->super.proc_name;
>> +            }
>> +            rc = opal_convert_process_name_to_string(&nstring, &proc_name);
>>             if (OPAL_SUCCESS != rc) {
>>                 if (!dense) {
>>                     free(proc_list);
>> 
>> 
>> -----------------------------------------------------------------------
>> 
>> Summary of changes:
>> ompi/dpm/dpm.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> 
>> hooks/post-receive
>> -- 
>> open-mpi/ompi
>> _______________________________________________
>> ompi-commits mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits
>
>_______________________________________________
>devel mailing list
>[email protected]
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2016/01/18473.php

Reply via email to