Yeah, I think that's the right solution. We'll have to check the impact on the rest of the code, but I -think- it will be okay - else we'll have to make some tweaks here and there. Either way, it's still the right answer, I think.
On Feb 9, 2012, at 6:14 AM, Jeff Squyres wrote: > Should we just do this, then: > > Index: mca/hwloc/base/hwloc_base_util.c > =================================================================== > --- mca/hwloc/base/hwloc_base_util.c (revision 25885) > +++ mca/hwloc/base/hwloc_base_util.c (working copy) > @@ -173,6 +173,9 @@ > "hwloc:base:get_topology")); > > if (0 != hwloc_topology_init(&opal_hwloc_topology) || > + 0 != hwloc_topology_set_flags(opal_hwloc_topology, > + (HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM | > + HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) || > 0 != hwloc_topology_load(opal_hwloc_topology)) { > return OPAL_ERR_NOT_SUPPORTED; > } > > > > On Feb 9, 2012, at 8:04 AM, Ralph Castain wrote: > >> Yes, I missed that point before - too early in the morning :-/ >> >> As I said in my last note, it would be nice to either have a flag indicating >> we are bound, or see all the cpu info so we can compute that we are bound. >> Either way, we still need to have a complete picture of all I/O devices so >> you can compute the distance. >> >> >> On Feb 9, 2012, at 6:01 AM, nadia.der...@bull.net wrote: >> >>> >>> >>> devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM: >>> >>>> De : Ralph Castain <r...@open-mpi.org> >>>> A : Open MPI Developers <de...@open-mpi.org> >>>> Date : 02/09/2012 01:32 PM >>>> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see >>>> processes as bound if the job has been launched by srun >>>> Envoyé par : devel-boun...@open-mpi.org >>>> >>>> Hi Nadia >>>> >>>> I'm wondering what value there is in showing the full topology, or >>>> using it in any of our components, if the process is restricted to a >>>> specific set of cpus? Does it really help to know that there are >>>> other cpus out there that are unreachable? >>> >>> Ralph, >>> >>> The intention here is not to show cpus that are unreachable, but to fix an >>> issue we have at least in get_ib_dev_distance() in the openib btl. >>> >>> The problem is that if a process is restricted to a single CPU, the >>> algorithm used in get_ib_dev_distance doesn't work at all: >>> I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want the >>> openib btl to select the one that is the closest to my rank. >>> >>> As I said in my first e-mail, here is what is done today: >>> . opal_paffinity_base_get_processor_info() is called to get the number of >>> logical processors (we get 1 due to the singleton cpuset) >>> . we loop over that # of processors to check whether our process is bound >>> to one of them. In our case the loop will be executed only once and we will >>> never get the correct binding information. >>> . if the process is bound actually get the distance to the device. >>> in our case, the distance won't be computed and mlx4_0 will be seen >>> as "equivalent" to mlx4_1 in terms of distances. This is what I definitely >>> want to avoid. >>> >>> Regards, >>> Nadia >>> >>>> >>>> On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote: >>>> >>>> >>>> >>>> devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM: >>>> >>>>> De : Brice Goglin <brice.gog...@inria.fr> >>>>> A : Open MPI Developers <de...@open-mpi.org> >>>>> Date : 02/09/2012 12:20 PM >>>>> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see >>>>> processes as bound if the job has been launched by srun >>>>> Envoyé par : devel-boun...@open-mpi.org >>>>> >>>>> By default, hwloc only shows what's inside the current cpuset. There's >>>>> an option to show everything instead (topology flag). >>>> >>>> So may be using that flag inside >>>> opal_paffinity_base_get_processor_info() would be a better fix than >>>> the one I'm proposing in my patch. >>>> >>>> I found a bunch of other places where things are managed as in >>>> get_ib_dev_distance(). >>>> >>>> Just doing a grep in the sources, I could find: >>>> . init_maffinity() in btl/sm/btl_sm.c >>>> . vader_init_maffinity() in btl/vader/btl_vader.c >>>> . get_ib_dev_distance() in btl/wv/btl_wv_component.c >>>> >>>> So I think the flag Brice is talking about should definitely be the fix. >>>> >>>> Regards, >>>> Nadia >>>> >>>>> >>>>> Brice >>>>> >>>>> >>>>> >>>>> Le 09/02/2012 12:18, Jeff Squyres a écrit : >>>>>> Just so that I understand this better -- if a process is bound in >>>>> a cpuset, will tools like hwloc's lstopo only show the Linux >>>>> processors *in that cpuset*? I.e., does it not have any visibility >>>>> of the processors outside of its cpuset? >>>>>> >>>>>> >>>>>> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm >>>>>>> is configured with: >>>>>>> TaskPlugin=task/affinity >>>>>>> TaskPluginParam=Cpusets >>>>>>> >>>>>>> each rank of that job is in a cpuset that contains a single CPU. >>>>>>> >>>>>>> Now, if we use carto on top of this, the following happens in >>>>>>> get_ib_dev_distance() (in btl/openib/btl_openib_component.c): >>>>>>> . opal_paffinity_base_get_processor_info() is called to get the >>>>>>> number of logical processors (we get 1 due to the singleton cpuset) >>>>>>> . we loop over that # of processors to check whether our process is >>>>>>> bound to one of them. In our case the loop will be executed only >>>>>>> once and we will never get the correct binding information. >>>>>>> . if the process is bound actually get the distance to the device. >>>>>>> in our case we won't execute that part of the code. >>>>>>> >>>>>>> The attached patch is a proposal to fix the issue. >>>>>>> >>>>>>> Regards, >>>>>>> Nadia >>>>>>> >>>> <get_ib_dev_distance.patch>_______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel