devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM:

> De : Ralph Castain <r...@open-mpi.org>
> A : Open MPI Developers <de...@open-mpi.org>
> Date : 02/09/2012 01:32 PM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-boun...@open-mpi.org
> 
> Hi Nadia
> 
> I'm wondering what value there is in showing the full topology, or 
> using it in any of our components, if the process is restricted to a
> specific set of cpus? Does it really help to know that there are 
> other cpus out there that are unreachable?

Ralph,

The intention here is not to show cpus that are unreachable, but to fix an 
issue we have at least in get_ib_dev_distance() in the openib btl.

The problem is that if a process is restricted to a single CPU, the 
algorithm used in get_ib_dev_distance doesn't work at all:
I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want 
the openib btl to select the one that is the closest to my rank.

As I said in my first e-mail, here is what is done today:
   . opal_paffinity_base_get_processor_info() is called to get the number 
of logical processors (we get 1 due to the singleton cpuset)
   . we loop over that # of processors to check whether our process is 
bound to one of them. In our case the loop will be executed only once and 
we will never get the correct binding information.
   . if the process is bound actually get the distance to the device.
        in our case, the distance won't be computed and mlx4_0 will be 
seen as "equivalent" to mlx4_1 in terms of distances. This is what I 
definitely want to avoid.

Regards,
Nadia

> 
> On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote:
> 
> 
> 
> devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
> 
> > De : Brice Goglin <brice.gog...@inria.fr> 
> > A : Open MPI Developers <de...@open-mpi.org> 
> > Date : 02/09/2012 12:20 PM 
> > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> > processes as bound if the job has been launched by srun 
> > Envoyé par : devel-boun...@open-mpi.org 
> > 
> > By default, hwloc only shows what's inside the current cpuset. There's
> > an option to show everything instead (topology flag). 
> 
> So may be using that flag inside 
> opal_paffinity_base_get_processor_info() would be a better fix than 
> the one I'm proposing in my patch. 
> 
> I found a bunch of other places where things are managed as in 
> get_ib_dev_distance(). 
> 
> Just doing a grep in the sources, I could find: 
>   . init_maffinity() in btl/sm/btl_sm.c 
>   . vader_init_maffinity() in btl/vader/btl_vader.c 
>   . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
> 
> So I think the flag Brice is talking about should definitely be the fix. 

> 
> Regards, 
> Nadia 
> 
> > 
> > Brice
> > 
> > 
> > 
> > Le 09/02/2012 12:18, Jeff Squyres a écrit :
> > > Just so that I understand this better -- if a process is bound in 
> > a cpuset, will tools like hwloc's lstopo only show the Linux 
> > processors *in that cpuset*?  I.e., does it not have any visibility 
> > of the processors outside of its cpuset?
> > >
> > >
> > > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
> > >
> > >> Hi,
> > >>
> > >> If a job is launched using "srun --resv-ports --cpu_bind:..." and 
slurm
> > >> is configured with:
> > >>   TaskPlugin=task/affinity
> > >>   TaskPluginParam=Cpusets
> > >>
> > >> each rank of that job is in a cpuset that contains a single CPU.
> > >>
> > >> Now, if we use carto on top of this, the following happens in
> > >> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> > >>   . opal_paffinity_base_get_processor_info() is called to get the
> > >>     number of logical processors (we get 1 due to the singleton 
cpuset)
> > >>   . we loop over that # of processors to check whether our process 
is
> > >>     bound to one of them. In our case the loop will be executed 
only
> > >>     once and we will never get the correct binding information.
> > >>   . if the process is bound actually get the distance to the 
device.
> > >>     in our case we won't execute that part of the code.
> > >>
> > >> The attached patch is a proposal to fix the issue.
> > >>
> > >> Regards,
> > >> Nadia
> > >> 
> 
<get_ib_dev_distance.patch>_______________________________________________
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to