Yeah, we should make that change, if you wouldn't mind doing it.


On Tue, Jun 24, 2014 at 9:43 AM, Gilles GOUAILLARDET <
gilles.gouaillar...@gmail.com> wrote:

> Ralph,
>
> That makes perfect sense.
>
> What about FCA_IS_LOCAL_PROCESS ?
> Shall we keep it or shall we use directly OPAL_PROC_ON_LOCAL_NODE directly
> ?
>
> Cheers
>
> Gilles
>
> Ralph Castain <r...@open-mpi.org> wrote:
> Hi Gilles
>
> We discussed this at the devel conference this morning. The root cause of
> the problem is a test in coll/ml that we feel is incorrect - it basically
> checks to see if the proc itself is bound, and then assumes that all other
> procs are similarly bound. This in fact is never guaranteed to be true as
> someone could use the rank_file method to specify that some procs are to be
> left unbound, while others are to be bound to specified cpus.
>
> Nathan has looked at that check before and believes it isn't necessary.
> All coll/ml really needs to know is that the two procs share the same node,
> and the current locality algorithm will provide that information. We have
> asked him to "fix" the coll/ml selection logic to resolve that situation.
>
> After then discussing the various locality definitions, it was our feeling
> that the current definition is probably the better one unless you have a
> reason for changing it other than coll/ml. If so, we'd be happy to revisit
> the proposal.
>
> Make sense?
> Ralph
>
>
>
> On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>> WHAT: semantic change of opal_hwloc_base_get_relative_locality
>>
>> WHY:  make is closer to what coll/ml expects.
>>
>>       Currently, opal_hwloc_base_get_relative_locality means "at what
>> level do these procs share cpus"
>>       however, coll/ml is using it as "at what level are these procs
>> commonly bound".
>>
>>       it is important to note that if a task is bound to all the
>> available cpus, locality should
>>       be set to OPAL_PROC_ON_NODE only.
>>       /* e.g. on a single socket Sandy Bridge system, use
>> OPAL_PROC_ON_NODE instead of OPAL_PROC_ON_L3CACHE */
>>
>>       This has been initially discussed in the devel mailing list
>>       http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>>
>>       as advised by Ralph, i browsed the source code looking for how the
>> (ompi_proc_t *)->proc_flags is used.
>>       so far, it is mainly used to figure out wether the proc is on the
>> same node or not.
>>
>>       notable exceptions are :
>>        a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c :
>> OPAL_PROC_ON_LOCAL_SOCKET
>>        b) ompi/mca/coll/fca/coll_fca_module.c and
>> oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>>
>>       about a) the new definition fixes a hang in coll/ml
>>       about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only
>> found OMPI_PROC_FLAG_LOCAL in v1.3 */
>>       so this macro can be simply removed and replaced with
>> OPAL_PROC_ON_LOCAL_NODE
>>
>>       at this stage, i cannot find any objection not to do the described
>> change.
>>       please report if any and/or feel free to comment.
>>
>> WHERE: see the two attached patches
>>
>> TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago,
>> June 24-26.
>>          The RFC will become final only after the meeting.
>>          /* Ralph already added this topic to the agenda */
>>
>> Thanks
>>
>> Gilles
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/06/15049.php
>

Reply via email to