Re: [OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-24 Thread Gilles Gouaillardet
Hi Ralph,

On 2014/06/25 2:51, Ralph Castain wrote:
> Had a chance to review this with folks here, and we think that having
> oversubscribe automatically set overload makes some sense. However, we do
> want to retain the ability to separately specify oversubscribe and overload
> as well since these two terms don't mean quite the same thing.
>
> Our proposal, therefore, is to have the --oversubscribe flag set both the
> --map-by :oversubscribe and --bind-to :overload-allowed properties. If
> someone specifies both the --oversubscribe flag and a conflicting directive
> for one or both of the individual properties, then we'll error out with a
> "bozo" message.
i fully agree.
> The use-cases you describe are (minus the crash) correct as the warning
> only is emitted when you are overloaded (i.e., trying to bind to more cpus
> than you have). So you won't get any warning when running on three nodes as
> you have enough cpus for all the procs, etc.
>
> I'll investigate the crash once I get home and have access to a cluster
> again. The problem likely has to do with not properly responding to the
> failure to spawn.
humm

because you already made the change described above(r32072), the crash
does not occur any more.

about the crash, i see things the other way around : spawn should have
not failed.
/* or spawn should have failed when running on a single node, at least
for the sake of consistency */

but like i said, it works now, so it might be just pedantic to point a
bug that is still here but that cannot be triggered ...

Cheers,

Gilles


Re: [OMPI devel] OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Gilles Gouaillardet
Ralph,

i pushed the change (r32079) and updated the wiki.

the RFC can be now closed and the consensus is semantic of
opal_hwloc_base_get_relative_locality
will not be changed since this is not needed : the hang is a coll/ml
bug, so it will be fixed within coll/ml.

Cheers,

Gilles

On 2014/06/25 1:12, Ralph Castain wrote:
> Yeah, we should make that change, if you wouldn't mind doing it.
>
>
>
> On Tue, Jun 24, 2014 at 9:43 AM, Gilles GOUAILLARDET <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Ralph,
>>
>> That makes perfect sense.
>>
>> What about FCA_IS_LOCAL_PROCESS ?
>> Shall we keep it or shall we use directly OPAL_PROC_ON_LOCAL_NODE directly
>> ?
>>
>> Cheers
>>
>> Gilles
>>
>> Ralph Castain  wrote:
>> Hi Gilles
>>
>> We discussed this at the devel conference this morning. The root cause of
>> the problem is a test in coll/ml that we feel is incorrect - it basically
>> checks to see if the proc itself is bound, and then assumes that all other
>> procs are similarly bound. This in fact is never guaranteed to be true as
>> someone could use the rank_file method to specify that some procs are to be
>> left unbound, while others are to be bound to specified cpus.
>>
>> Nathan has looked at that check before and believes it isn't necessary.
>> All coll/ml really needs to know is that the two procs share the same node,
>> and the current locality algorithm will provide that information. We have
>> asked him to "fix" the coll/ml selection logic to resolve that situation.
>>
>> After then discussing the various locality definitions, it was our feeling
>> that the current definition is probably the better one unless you have a
>> reason for changing it other than coll/ml. If so, we'd be happy to revisit
>> the proposal.
>>
>> Make sense?
>> Ralph
>>
>>
>>
>> On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> wrote:
>>
>>> WHAT: semantic change of opal_hwloc_base_get_relative_locality
>>>
>>> WHY:  make is closer to what coll/ml expects.
>>>
>>>   Currently, opal_hwloc_base_get_relative_locality means "at what
>>> level do these procs share cpus"
>>>   however, coll/ml is using it as "at what level are these procs
>>> commonly bound".
>>>
>>>   it is important to note that if a task is bound to all the
>>> available cpus, locality should
>>>   be set to OPAL_PROC_ON_NODE only.
>>>   /* e.g. on a single socket Sandy Bridge system, use
>>> OPAL_PROC_ON_NODE instead of OPAL_PROC_ON_L3CACHE */
>>>
>>>   This has been initially discussed in the devel mailing list
>>>   http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>>>
>>>   as advised by Ralph, i browsed the source code looking for how the
>>> (ompi_proc_t *)->proc_flags is used.
>>>   so far, it is mainly used to figure out wether the proc is on the
>>> same node or not.
>>>
>>>   notable exceptions are :
>>>a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c :
>>> OPAL_PROC_ON_LOCAL_SOCKET
>>>b) ompi/mca/coll/fca/coll_fca_module.c and
>>> oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>>>
>>>   about a) the new definition fixes a hang in coll/ml
>>>   about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only
>>> found OMPI_PROC_FLAG_LOCAL in v1.3 */
>>>   so this macro can be simply removed and replaced with
>>> OPAL_PROC_ON_LOCAL_NODE
>>>
>>>   at this stage, i cannot find any objection not to do the described
>>> change.
>>>   please report if any and/or feel free to comment.
>>>
>>> WHERE: see the two attached patches
>>>
>>> TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago,
>>> June 24-26.
>>>  The RFC will become final only after the meeting.
>>>  /* Ralph already added this topic to the agenda */
>>>
>>> Thanks
>>>
>>> Gilles
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/06/15049.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15050.php



Re: [OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-24 Thread Ralph Castain
Hi Gilles

Had a chance to review this with folks here, and we think that having
oversubscribe automatically set overload makes some sense. However, we do
want to retain the ability to separately specify oversubscribe and overload
as well since these two terms don't mean quite the same thing.

Our proposal, therefore, is to have the --oversubscribe flag set both the
--map-by :oversubscribe and --bind-to :overload-allowed properties. If
someone specifies both the --oversubscribe flag and a conflicting directive
for one or both of the individual properties, then we'll error out with a
"bozo" message.

The use-cases you describe are (minus the crash) correct as the warning
only is emitted when you are overloaded (i.e., trying to bind to more cpus
than you have). So you won't get any warning when running on three nodes as
you have enough cpus for all the procs, etc.

I'll investigate the crash once I get home and have access to a cluster
again. The problem likely has to do with not properly responding to the
failure to spawn.


On Tue, Jun 24, 2014 at 5:40 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Folks,
>
> this issue is related to the failures reported by mtt on the trunk when
> the ibm test suite invokes MPI_Comm_spawn.
>
> my test bed is made of 3 (virtual) machines with 2 sockets and 8 cpus
> per socket each.
>
> if i run on one host (without any batch manager)
>
> mpirun -np 16 --host slurm1 --oversubscribe --mca coll ^ml
> ./intercomm_create
>
> then the test is a success with the following warning  :
>
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
>
>Bind to: CORE
>Node:slurm2
>#processes:  2
>#cpus:   1
>
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --
>
>
> now if i run on three hosts
>
> mpirun -np 16 --host slurm1,slurm2,slurm3 --oversubscribe --mca coll ^ml
> ./intercomm_create
>
> then the test is a success without any warning
>
>
> but now, if i run on two hosts
>
> mpirun -np 16 --host slurm1,slurm2 --oversubscribe --mca coll ^ml
> ./intercomm_create
>
> then the test is a failure.
>
> first, i get the following same warning :
>
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
>
>Bind to: CORE
>Node:slurm2
>#processes:  2
>#cpus:   1
>
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --
>
> followed by a crash
>
> [slurm1:2482] *** An error occurred in MPI_Comm_spawn
> [slurm1:2482] *** reported by process [2068512769,0]
> [slurm1:2482] *** on communicator MPI_COMM_WORLD
> [slurm1:2482] *** MPI_ERR_SPAWN: could not spawn processes
> [slurm1:2482] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
> will now abort,
> [slurm1:2482] ***and potentially your MPI job)
>
>
> that being said, i the following command works :
>
> mpirun -np 16 --host slurm1,slurm2 --mca coll ^ml --bind-to none
> ./intercomm_create
>
>
> 1) what does the first message means ?
> is it a warning ? /* if yes, why does mpirun on two hosts fail ? */
> is it a fatal error ? /* if yes, why does mpirun on one host success
> ? */
>
> 2) generally speaking, and assuming the first message is a warning,
> should --oversubscribe automatically set overload-allowed ?
> /* as far as i am concerned, that would be much more intuitive */
>
> Cheers,
>
> Gilles
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15047.php
>


Re: [OMPI devel] OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Ralph Castain
Yeah, we should make that change, if you wouldn't mind doing it.



On Tue, Jun 24, 2014 at 9:43 AM, Gilles GOUAILLARDET <
gilles.gouaillar...@gmail.com> wrote:

> Ralph,
>
> That makes perfect sense.
>
> What about FCA_IS_LOCAL_PROCESS ?
> Shall we keep it or shall we use directly OPAL_PROC_ON_LOCAL_NODE directly
> ?
>
> Cheers
>
> Gilles
>
> Ralph Castain  wrote:
> Hi Gilles
>
> We discussed this at the devel conference this morning. The root cause of
> the problem is a test in coll/ml that we feel is incorrect - it basically
> checks to see if the proc itself is bound, and then assumes that all other
> procs are similarly bound. This in fact is never guaranteed to be true as
> someone could use the rank_file method to specify that some procs are to be
> left unbound, while others are to be bound to specified cpus.
>
> Nathan has looked at that check before and believes it isn't necessary.
> All coll/ml really needs to know is that the two procs share the same node,
> and the current locality algorithm will provide that information. We have
> asked him to "fix" the coll/ml selection logic to resolve that situation.
>
> After then discussing the various locality definitions, it was our feeling
> that the current definition is probably the better one unless you have a
> reason for changing it other than coll/ml. If so, we'd be happy to revisit
> the proposal.
>
> Make sense?
> Ralph
>
>
>
> On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>> WHAT: semantic change of opal_hwloc_base_get_relative_locality
>>
>> WHY:  make is closer to what coll/ml expects.
>>
>>   Currently, opal_hwloc_base_get_relative_locality means "at what
>> level do these procs share cpus"
>>   however, coll/ml is using it as "at what level are these procs
>> commonly bound".
>>
>>   it is important to note that if a task is bound to all the
>> available cpus, locality should
>>   be set to OPAL_PROC_ON_NODE only.
>>   /* e.g. on a single socket Sandy Bridge system, use
>> OPAL_PROC_ON_NODE instead of OPAL_PROC_ON_L3CACHE */
>>
>>   This has been initially discussed in the devel mailing list
>>   http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>>
>>   as advised by Ralph, i browsed the source code looking for how the
>> (ompi_proc_t *)->proc_flags is used.
>>   so far, it is mainly used to figure out wether the proc is on the
>> same node or not.
>>
>>   notable exceptions are :
>>a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c :
>> OPAL_PROC_ON_LOCAL_SOCKET
>>b) ompi/mca/coll/fca/coll_fca_module.c and
>> oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>>
>>   about a) the new definition fixes a hang in coll/ml
>>   about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only
>> found OMPI_PROC_FLAG_LOCAL in v1.3 */
>>   so this macro can be simply removed and replaced with
>> OPAL_PROC_ON_LOCAL_NODE
>>
>>   at this stage, i cannot find any objection not to do the described
>> change.
>>   please report if any and/or feel free to comment.
>>
>> WHERE: see the two attached patches
>>
>> TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago,
>> June 24-26.
>>  The RFC will become final only after the meeting.
>>  /* Ralph already added this topic to the agenda */
>>
>> Thanks
>>
>> Gilles
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15049.php
>


Re: [OMPI devel] OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Gilles GOUAILLARDET
Ralph,

That makes perfect sense.

What about FCA_IS_LOCAL_PROCESS ?
Shall we keep it or shall we use directly OPAL_PROC_ON_LOCAL_NODE directly ?

Cheers

Gilles

Ralph Castain  wrote:
>Hi Gilles
>
>
>We discussed this at the devel conference this morning. The root cause of the 
>problem is a test in coll/ml that we feel is incorrect - it basically checks 
>to see if the proc itself is bound, and then assumes that all other procs are 
>similarly bound. This in fact is never guaranteed to be true as someone could 
>use the rank_file method to specify that some procs are to be left unbound, 
>while others are to be bound to specified cpus.
>
>
>Nathan has looked at that check before and believes it isn't necessary. All 
>coll/ml really needs to know is that the two procs share the same node, and 
>the current locality algorithm will provide that information. We have asked 
>him to "fix" the coll/ml selection logic to resolve that situation.
>
>
>After then discussing the various locality definitions, it was our feeling 
>that the current definition is probably the better one unless you have a 
>reason for changing it other than coll/ml. If so, we'd be happy to revisit the 
>proposal.
>
>
>Make sense?
>
>Ralph
>
>
>
>
>On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet 
> wrote:
>
>WHAT: semantic change of opal_hwloc_base_get_relative_locality
>
>WHY:  make is closer to what coll/ml expects.
>
>      Currently, opal_hwloc_base_get_relative_locality means "at what level do 
>these procs share cpus"
>      however, coll/ml is using it as "at what level are these procs commonly 
>bound".
>
>      it is important to note that if a task is bound to all the available 
>cpus, locality should
>      be set to OPAL_PROC_ON_NODE only.
>      /* e.g. on a single socket Sandy Bridge system, use OPAL_PROC_ON_NODE 
>instead of OPAL_PROC_ON_L3CACHE */
>
>      This has been initially discussed in the devel mailing list
>      http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>
>      as advised by Ralph, i browsed the source code looking for how the 
>(ompi_proc_t *)->proc_flags is used.
>      so far, it is mainly used to figure out wether the proc is on the same 
>node or not.
>
>      notable exceptions are :
>       a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c : 
>OPAL_PROC_ON_LOCAL_SOCKET
>       b) ompi/mca/coll/fca/coll_fca_module.c and 
>oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>
>      about a) the new definition fixes a hang in coll/ml
>      about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only 
>found OMPI_PROC_FLAG_LOCAL in v1.3 */
>      so this macro can be simply removed and replaced with 
>OPAL_PROC_ON_LOCAL_NODE
>
>      at this stage, i cannot find any objection not to do the described 
>change.
>      please report if any and/or feel free to comment.
>
>WHERE: see the two attached patches
>
>TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago, June 
>24-26.
>         The RFC will become final only after the meeting.
>         /* Ralph already added this topic to the agenda */
>
>Thanks
>
>Gilles
>
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>
>


Re: [OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Ralph Castain
Hi Gilles

We discussed this at the devel conference this morning. The root cause of
the problem is a test in coll/ml that we feel is incorrect - it basically
checks to see if the proc itself is bound, and then assumes that all other
procs are similarly bound. This in fact is never guaranteed to be true as
someone could use the rank_file method to specify that some procs are to be
left unbound, while others are to be bound to specified cpus.

Nathan has looked at that check before and believes it isn't necessary. All
coll/ml really needs to know is that the two procs share the same node, and
the current locality algorithm will provide that information. We have asked
him to "fix" the coll/ml selection logic to resolve that situation.

After then discussing the various locality definitions, it was our feeling
that the current definition is probably the better one unless you have a
reason for changing it other than coll/ml. If so, we'd be happy to revisit
the proposal.

Make sense?
Ralph



On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> WHAT: semantic change of opal_hwloc_base_get_relative_locality
>
> WHY:  make is closer to what coll/ml expects.
>
>   Currently, opal_hwloc_base_get_relative_locality means "at what
> level do these procs share cpus"
>   however, coll/ml is using it as "at what level are these procs
> commonly bound".
>
>   it is important to note that if a task is bound to all the available
> cpus, locality should
>   be set to OPAL_PROC_ON_NODE only.
>   /* e.g. on a single socket Sandy Bridge system, use
> OPAL_PROC_ON_NODE instead of OPAL_PROC_ON_L3CACHE */
>
>   This has been initially discussed in the devel mailing list
>   http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>
>   as advised by Ralph, i browsed the source code looking for how the
> (ompi_proc_t *)->proc_flags is used.
>   so far, it is mainly used to figure out wether the proc is on the
> same node or not.
>
>   notable exceptions are :
>a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c :
> OPAL_PROC_ON_LOCAL_SOCKET
>b) ompi/mca/coll/fca/coll_fca_module.c and
> oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>
>   about a) the new definition fixes a hang in coll/ml
>   about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only
> found OMPI_PROC_FLAG_LOCAL in v1.3 */
>   so this macro can be simply removed and replaced with
> OPAL_PROC_ON_LOCAL_NODE
>
>   at this stage, i cannot find any objection not to do the described
> change.
>   please report if any and/or feel free to comment.
>
> WHERE: see the two attached patches
>
> TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago, June
> 24-26.
>  The RFC will become final only after the meeting.
>  /* Ralph already added this topic to the agenda */
>
> Thanks
>
> Gilles
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>


[OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-24 Thread Gilles Gouaillardet
Folks,

this issue is related to the failures reported by mtt on the trunk when
the ibm test suite invokes MPI_Comm_spawn.

my test bed is made of 3 (virtual) machines with 2 sockets and 8 cpus
per socket each.

if i run on one host (without any batch manager)

mpirun -np 16 --host slurm1 --oversubscribe --mca coll ^ml
./intercomm_create

then the test is a success with the following warning  :

--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:slurm2
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--


now if i run on three hosts

mpirun -np 16 --host slurm1,slurm2,slurm3 --oversubscribe --mca coll ^ml
./intercomm_create

then the test is a success without any warning


but now, if i run on two hosts

mpirun -np 16 --host slurm1,slurm2 --oversubscribe --mca coll ^ml
./intercomm_create

then the test is a failure.

first, i get the following same warning :

--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:slurm2
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

followed by a crash

[slurm1:2482] *** An error occurred in MPI_Comm_spawn
[slurm1:2482] *** reported by process [2068512769,0]
[slurm1:2482] *** on communicator MPI_COMM_WORLD
[slurm1:2482] *** MPI_ERR_SPAWN: could not spawn processes
[slurm1:2482] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[slurm1:2482] ***and potentially your MPI job)


that being said, i the following command works :

mpirun -np 16 --host slurm1,slurm2 --mca coll ^ml --bind-to none
./intercomm_create


1) what does the first message means ?
is it a warning ? /* if yes, why does mpirun on two hosts fail ? */
is it a fatal error ? /* if yes, why does mpirun on one host success
? */

2) generally speaking, and assuming the first message is a warning,
should --oversubscribe automatically set overload-allowed ?
/* as far as i am concerned, that would be much more intuitive */

Cheers,

Gilles



[OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Gilles Gouaillardet
WHAT: semantic change of opal_hwloc_base_get_relative_locality

WHY:  make is closer to what coll/ml expects.

  Currently, opal_hwloc_base_get_relative_locality means "at what level do 
these procs share cpus"
  however, coll/ml is using it as "at what level are these procs commonly 
bound".

  it is important to note that if a task is bound to all the available 
cpus, locality should
  be set to OPAL_PROC_ON_NODE only.
  /* e.g. on a single socket Sandy Bridge system, use OPAL_PROC_ON_NODE 
instead of OPAL_PROC_ON_L3CACHE */

  This has been initially discussed in the devel mailing list
  http://www.open-mpi.org/community/lists/devel/2014/06/15030.php

  as advised by Ralph, i browsed the source code looking for how the 
(ompi_proc_t *)->proc_flags is used.
  so far, it is mainly used to figure out wether the proc is on the same 
node or not.

  notable exceptions are :
   a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c : 
OPAL_PROC_ON_LOCAL_SOCKET
   b) ompi/mca/coll/fca/coll_fca_module.c and 
oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS

  about a) the new definition fixes a hang in coll/ml
  about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only found 
OMPI_PROC_FLAG_LOCAL in v1.3 */
  so this macro can be simply removed and replaced with 
OPAL_PROC_ON_LOCAL_NODE

  at this stage, i cannot find any objection not to do the described change.
  please report if any and/or feel free to comment.

WHERE: see the two attached patches

TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago, June 
24-26.
 The RFC will become final only after the meeting.
 /* Ralph already added this topic to the agenda */

Thanks

Gilles

Index: opal/mca/hwloc/base/hwloc_base_util.c
===
--- opal/mca/hwloc/base/hwloc_base_util.c   (revision 32067)
+++ opal/mca/hwloc/base/hwloc_base_util.c   (working copy)
@@ -13,6 +13,8 @@
  * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -1315,8 +1317,7 @@
 hwloc_cpuset_t avail;
 bool shared;
 hwloc_obj_type_t type;
-int sect1, sect2;
-hwloc_cpuset_t loc1, loc2;
+hwloc_cpuset_t loc1, loc2, loc;

 /* start with what we know - they share a node on a cluster
  * NOTE: we may alter that latter part as hwloc's ability to
@@ -1337,6 +1338,19 @@
 hwloc_bitmap_list_sscanf(loc1, cpuset1);
 loc2 = hwloc_bitmap_alloc();
 hwloc_bitmap_list_sscanf(loc2, cpuset2);
+loc = hwloc_bitmap_alloc();
+hwloc_bitmap_or(loc, loc1, loc2);
+
+width = hwloc_get_nbobjs_by_depth(topo, 0);
+for (w = 0; w < width; w++) {
+obj = hwloc_get_obj_by_depth(topo, 0, w);
+avail = opal_hwloc_base_get_available_cpus(topo, obj);
+if ( hwloc_bitmap_isequal(avail, loc) ) {
+/* the task is bound to all the node cpus,
+   return without digging further */
+goto out;
+}
+}

 /* start at the first depth below the top machine level */
 for (d=1; d < depth; d++) {
@@ -1362,11 +1376,8 @@
 obj = hwloc_get_obj_by_depth(topo, d, w);
 /* get the available cpuset for this obj */
 avail = opal_hwloc_base_get_available_cpus(topo, obj);
-/* see if our locations intersect with it */
-sect1 = hwloc_bitmap_intersects(avail, loc1);
-sect2 = hwloc_bitmap_intersects(avail, loc2);
-/* if both intersect, then we share this level */
-if (sect1 && sect2) {
+/* see if our locations is included */
+if ( hwloc_bitmap_isincluded(loc, avail) ) {
 shared = true;
 switch(obj->type) {
 case HWLOC_OBJ_NODE:
@@ -1410,9 +1421,11 @@
 }
 }

+out:
 opal_output_verbose(5, opal_hwloc_base_framework.framework_output,
 "locality: %s",
 opal_hwloc_base_print_locality(locality));
+hwloc_bitmap_free(loc);
 hwloc_bitmap_free(loc1);
 hwloc_bitmap_free(loc2);

Index: oshmem/mca/scoll/fca/scoll_fca.h
===
--- oshmem/mca/scoll/fca/scoll_fca.h(revision 32067)
+++ oshmem/mca/scoll/fca/scoll_fca.h(working copy)
@@ -1,12 +1,14 @@
 /**
- *   Copyright (c) 2013  Mellanox Technologies, Inc.
- *   All rights reserved.
- * $COPYRIGHT$
+ * Copyright (c) 2013  Mellanox Technologies, Inc.
+ * All rights reserved.
+ * Copyright (c) 2014  Research Organiza