Resending, as i didn't get any answer... Regards, Nadia -- Nadia Derbey
devel-boun...@open-mpi.org wrote on 01/27/2012 05:38:34 PM: > De : "nadia.derbey" <nadia.der...@bull.net> > A : Open MPI Developers <de...@open-mpi.org> > Date : 01/27/2012 05:35 PM > Objet : [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@open-mpi.org > > Hi, > > If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm > is configured with: > TaskPlugin=task/affinity > TaskPluginParam=Cpusets > > each rank of that job is in a cpuset that contains a single CPU. > > Now, if we use carto on top of this, the following happens in > get_ib_dev_distance() (in btl/openib/btl_openib_component.c): > . opal_paffinity_base_get_processor_info() is called to get the > number of logical processors (we get 1 due to the singleton cpuset) > . we loop over that # of processors to check whether our process is > bound to one of them. In our case the loop will be executed only > once and we will never get the correct binding information. > . if the process is bound actually get the distance to the device. > in our case we won't execute that part of the code. > > The attached patch is a proposal to fix the issue. > > Regards, > Nadia > [attachment "get_ib_dev_distance.patch" deleted by Nadia Derbey/FR/ > BULL] _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
get_ib_dev_distance.patch
Description: Binary data