Yeah, I think that's the right solution. We'll have to check the impact on the 
rest of the code, but I -think- it will be okay - else we'll have to make some 
tweaks here and there. Either way, it's still the right answer, I think.

On Feb 9, 2012, at 6:14 AM, Jeff Squyres wrote:

> Should we just do this, then:
> 
> Index: mca/hwloc/base/hwloc_base_util.c
> ===================================================================
> --- mca/hwloc/base/hwloc_base_util.c  (revision 25885)
> +++ mca/hwloc/base/hwloc_base_util.c  (working copy)
> @@ -173,6 +173,9 @@
>                          "hwloc:base:get_topology"));
> 
>     if (0 != hwloc_topology_init(&opal_hwloc_topology) ||
> +        0 != hwloc_topology_set_flags(opal_hwloc_topology, 
> +                                      (HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM |
> +                                       HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) ||
>         0 != hwloc_topology_load(opal_hwloc_topology)) {
>         return OPAL_ERR_NOT_SUPPORTED;
>     }
> 
> 
> 
> On Feb 9, 2012, at 8:04 AM, Ralph Castain wrote:
> 
>> Yes, I missed that point before - too early in the morning :-/
>> 
>> As I said in my last note, it would be nice to either have a flag indicating 
>> we are bound, or see all the cpu info so we can compute that we are bound. 
>> Either way, we still need to have a complete picture of all I/O devices so 
>> you can compute the distance.
>> 
>> 
>> On Feb 9, 2012, at 6:01 AM, nadia.der...@bull.net wrote:
>> 
>>> 
>>> 
>>> devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM:
>>> 
>>>> De : Ralph Castain <r...@open-mpi.org> 
>>>> A : Open MPI Developers <de...@open-mpi.org> 
>>>> Date : 02/09/2012 01:32 PM 
>>>> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
>>>> processes as bound if the job has been launched by srun 
>>>> Envoyé par : devel-boun...@open-mpi.org 
>>>> 
>>>> Hi Nadia 
>>>> 
>>>> I'm wondering what value there is in showing the full topology, or 
>>>> using it in any of our components, if the process is restricted to a
>>>> specific set of cpus? Does it really help to know that there are 
>>>> other cpus out there that are unreachable? 
>>> 
>>> Ralph, 
>>> 
>>> The intention here is not to show cpus that are unreachable, but to fix an 
>>> issue we have at least in get_ib_dev_distance() in the openib btl. 
>>> 
>>> The problem is that if a process is restricted to a single CPU, the 
>>> algorithm used in get_ib_dev_distance doesn't work at all: 
>>> I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want the 
>>> openib btl to select the one that is the closest to my rank. 
>>> 
>>> As I said in my first e-mail, here is what is done today: 
>>>   . opal_paffinity_base_get_processor_info() is called to get the number of 
>>> logical processors (we get 1 due to the singleton cpuset)
>>>  . we loop over that # of processors to check whether our process is bound 
>>> to one of them. In our case the loop will be executed only once and we will 
>>> never get the correct binding information.
>>>  . if the process is bound actually get the distance to the device.
>>>       in our case, the distance won't be computed and mlx4_0 will be seen 
>>> as "equivalent" to mlx4_1 in terms of distances. This is what I definitely 
>>> want to avoid. 
>>> 
>>> Regards, 
>>> Nadia 
>>> 
>>>> 
>>>> On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote: 
>>>> 
>>>> 
>>>> 
>>>> devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
>>>> 
>>>>> De : Brice Goglin <brice.gog...@inria.fr> 
>>>>> A : Open MPI Developers <de...@open-mpi.org> 
>>>>> Date : 02/09/2012 12:20 PM 
>>>>> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
>>>>> processes as bound if the job has been launched by srun 
>>>>> Envoyé par : devel-boun...@open-mpi.org 
>>>>> 
>>>>> By default, hwloc only shows what's inside the current cpuset. There's
>>>>> an option to show everything instead (topology flag). 
>>>> 
>>>> So may be using that flag inside 
>>>> opal_paffinity_base_get_processor_info() would be a better fix than 
>>>> the one I'm proposing in my patch. 
>>>> 
>>>> I found a bunch of other places where things are managed as in 
>>>> get_ib_dev_distance(). 
>>>> 
>>>> Just doing a grep in the sources, I could find: 
>>>>  . init_maffinity() in btl/sm/btl_sm.c 
>>>>  . vader_init_maffinity() in btl/vader/btl_vader.c 
>>>>  . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
>>>> 
>>>> So I think the flag Brice is talking about should definitely be the fix. 
>>>> 
>>>> Regards, 
>>>> Nadia 
>>>> 
>>>>> 
>>>>> Brice
>>>>> 
>>>>> 
>>>>> 
>>>>> Le 09/02/2012 12:18, Jeff Squyres a écrit :
>>>>>> Just so that I understand this better -- if a process is bound in 
>>>>> a cpuset, will tools like hwloc's lstopo only show the Linux 
>>>>> processors *in that cpuset*?  I.e., does it not have any visibility 
>>>>> of the processors outside of its cpuset?
>>>>>> 
>>>>>> 
>>>>>> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
>>>>>>> is configured with:
>>>>>>>  TaskPlugin=task/affinity
>>>>>>>  TaskPluginParam=Cpusets
>>>>>>> 
>>>>>>> each rank of that job is in a cpuset that contains a single CPU.
>>>>>>> 
>>>>>>> Now, if we use carto on top of this, the following happens in
>>>>>>> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
>>>>>>>  . opal_paffinity_base_get_processor_info() is called to get the
>>>>>>>    number of logical processors (we get 1 due to the singleton cpuset)
>>>>>>>  . we loop over that # of processors to check whether our process is
>>>>>>>    bound to one of them. In our case the loop will be executed only
>>>>>>>    once and we will never get the correct binding information.
>>>>>>>  . if the process is bound actually get the distance to the device.
>>>>>>>    in our case we won't execute that part of the code.
>>>>>>> 
>>>>>>> The attached patch is a proposal to fix the issue.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Nadia
>>>>>>> 
>>>> <get_ib_dev_distance.patch>_______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to