[OMPI devel] [PATCH]Segmentation Fault occurs when the function called from MPI_Comm_spawn_multiple fails

2012-02-09 Thread Y.MATSUMOTO
Dear All,

Next feedback is "MPI_Comm_spawn_multiple".

When the function called from MPI_Comm_spawn_multiple failed,
Segmentation fault occurs.
In that condition, "newcomp" sets NULL.
But member of "newcomp" is referred at following part.
(ompi/mpi/c/comm_spawn_multiple.c)
176 /* set array of errorcodes */
177 if (MPI_ERRCODES_IGNORE != array_of_errcodes) {
178 for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
179 array_of_errcodes[i]=rc;
180 }
181 }
Attached patch fixes it. (Patch is for V1.4.x).

Best regards,
Yuki MATSUMOTO
MPI development team,
Fujitsu

Index: ompi/mpi/c/comm_spawn_multiple.c
===
--- ompi/mpi/c/comm_spawn_multiple.c(revision 25723)
+++ ompi/mpi/c/comm_spawn_multiple.c(working copy)
@@ -42,7 +42,7 @@
 int root, MPI_Comm comm, MPI_Comm *intercomm,
 int *array_of_errcodes) 
 {
-int i=0, rc=0, rank=0, flag;
+int i=0, rc=0, rank=0, size=0, flag;
 ompi_communicator_t *newcomp=NULL;
 bool send_first=false; /* they are contacting us first */
 char port_name[MPI_MAX_PORT_NAME];
@@ -175,8 +175,18 @@
 
 /* set array of errorcodes */
 if (MPI_ERRCODES_IGNORE != array_of_errcodes) {
-for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
-array_of_errcodes[i]=rc;
+if (NULL != newcomp) {
+for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
+array_of_errcodes[i]=rc;
+}
+} else {
+for ( i=0; i < count; i++) {
+size = size + array_of_maxprocs[i];
+}
+
+for ( i=0; i < size; i++) {
+array_of_errcodes[i]=rc;
+}
 }
 }
 


Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Jeff Squyres
Just so that I understand this better -- if a process is bound in a cpuset, 
will tools like hwloc's lstopo only show the Linux processors *in that cpuset*? 
 I.e., does it not have any visibility of the processors outside of its cpuset?


On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:

> Hi,
> 
> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
> is configured with:
>   TaskPlugin=task/affinity
>   TaskPluginParam=Cpusets
> 
> each rank of that job is in a cpuset that contains a single CPU.
> 
> Now, if we use carto on top of this, the following happens in
> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
>   . opal_paffinity_base_get_processor_info() is called to get the
> number of logical processors (we get 1 due to the singleton cpuset)
>   . we loop over that # of processors to check whether our process is
> bound to one of them. In our case the loop will be executed only
> once and we will never get the correct binding information.
>   . if the process is bound actually get the distance to the device.
> in our case we won't execute that part of the code.
> 
> The attached patch is a proposal to fix the issue.
> 
> Regards,
> Nadia
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin
By default, hwloc only shows what's inside the current cpuset. There's
an option to show everything instead (topology flag).

Brice



Le 09/02/2012 12:18, Jeff Squyres a écrit :
> Just so that I understand this better -- if a process is bound in a cpuset, 
> will tools like hwloc's lstopo only show the Linux processors *in that 
> cpuset*?  I.e., does it not have any visibility of the processors outside of 
> its cpuset?
>
>
> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>
>> Hi,
>>
>> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
>> is configured with:
>>   TaskPlugin=task/affinity
>>   TaskPluginParam=Cpusets
>>
>> each rank of that job is in a cpuset that contains a single CPU.
>>
>> Now, if we use carto on top of this, the following happens in
>> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
>>   . opal_paffinity_base_get_processor_info() is called to get the
>> number of logical processors (we get 1 due to the singleton cpuset)
>>   . we loop over that # of processors to check whether our process is
>> bound to one of them. In our case the loop will be executed only
>> once and we will never get the correct binding information.
>>   . if the process is bound actually get the distance to the device.
>> in our case we won't execute that part of the code.
>>
>> The attached patch is a proposal to fix the issue.
>>
>> Regards,
>> Nadia
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread nadia . derbey
 devel-boun...@open-mpi.org wrote on 02/09/2012 12:18:20 PM:

> De : Jeff Squyres 
> A : Open MPI Developers 
> Date : 02/09/2012 12:18 PM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-boun...@open-mpi.org
> 
> Just so that I understand this better -- if a process is bound in a 
> cpuset, will tools like hwloc's lstopo only show the Linux 
> processors *in that cpuset*?  I.e., does it not have any visibility 
> of the processors outside of its cpuset?

Yes, looks like. At least this is what is returned by 
opal_paffinity_base_get_processor_info().

Regards,
Nadia

> 
> 
> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
> 
> > Hi,
> > 
> > If a job is launched using "srun --resv-ports --cpu_bind:..." and 
slurm
> > is configured with:
> >   TaskPlugin=task/affinity
> >   TaskPluginParam=Cpusets
> > 
> > each rank of that job is in a cpuset that contains a single CPU.
> > 
> > Now, if we use carto on top of this, the following happens in
> > get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> >   . opal_paffinity_base_get_processor_info() is called to get the
> > number of logical processors (we get 1 due to the singleton 
cpuset)
> >   . we loop over that # of processors to check whether our process is
> > bound to one of them. In our case the loop will be executed only
> > once and we will never get the correct binding information.
> >   . if the process is bound actually get the distance to the device.
> > in our case we won't execute that part of the code.
> > 
> > The attached patch is a proposal to fix the issue.
> > 
> > Regards,
> > Nadia
> > 
___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread nadia . derbey
 devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:

> De : Brice Goglin 
> A : Open MPI Developers 
> Date : 02/09/2012 12:20 PM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-boun...@open-mpi.org
> 
> By default, hwloc only shows what's inside the current cpuset. There's
> an option to show everything instead (topology flag).

So may be using that flag inside opal_paffinity_base_get_processor_info() 
would be a better fix than the one I'm proposing in my patch.

I found a bunch of other places where things are managed as in 
get_ib_dev_distance().

Just doing a grep in the sources, I could find:
  . init_maffinity() in btl/sm/btl_sm.c
  . vader_init_maffinity() in btl/vader/btl_vader.c
  . get_ib_dev_distance() in btl/wv/btl_wv_component.c

So I think the flag Brice is talking about should definitely be the fix.

Regards,
Nadia

> 
> Brice
> 
> 
> 
> Le 09/02/2012 12:18, Jeff Squyres a écrit :
> > Just so that I understand this better -- if a process is bound in 
> a cpuset, will tools like hwloc's lstopo only show the Linux 
> processors *in that cpuset*?  I.e., does it not have any visibility 
> of the processors outside of its cpuset?
> >
> >
> > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
> >
> >> Hi,
> >>
> >> If a job is launched using "srun --resv-ports --cpu_bind:..." and 
slurm
> >> is configured with:
> >>   TaskPlugin=task/affinity
> >>   TaskPluginParam=Cpusets
> >>
> >> each rank of that job is in a cpuset that contains a single CPU.
> >>
> >> Now, if we use carto on top of this, the following happens in
> >> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> >>   . opal_paffinity_base_get_processor_info() is called to get the
> >> number of logical processors (we get 1 due to the singleton 
cpuset)
> >>   . we loop over that # of processors to check whether our process is
> >> bound to one of them. In our case the loop will be executed only
> >> once and we will never get the correct binding information.
> >>   . if the process is bound actually get the distance to the device.
> >> in our case we won't execute that part of the code.
> >>
> >> The attached patch is a proposal to fix the issue.
> >>
> >> Regards,
> >> Nadia
> >> 
___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
Hi Nadia

I'm wondering what value there is in showing the full topology, or using it in 
any of our components, if the process is restricted to a specific set of cpus? 
Does it really help to know that there are other cpus out there that are 
unreachable?

On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote:

>   
> 
> devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
> 
> > De : Brice Goglin  
> > A : Open MPI Developers  
> > Date : 02/09/2012 12:20 PM 
> > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> > processes as bound if the job has been launched by srun 
> > Envoyé par : devel-boun...@open-mpi.org 
> > 
> > By default, hwloc only shows what's inside the current cpuset. There's
> > an option to show everything instead (topology flag). 
> 
> So may be using that flag inside opal_paffinity_base_get_processor_info() 
> would be a better fix than the one I'm proposing in my patch. 
> 
> I found a bunch of other places where things are managed as in 
> get_ib_dev_distance(). 
> 
> Just doing a grep in the sources, I could find: 
>   . init_maffinity() in btl/sm/btl_sm.c 
>   . vader_init_maffinity() in btl/vader/btl_vader.c 
>   . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
> 
> So I think the flag Brice is talking about should definitely be the fix. 
> 
> Regards, 
> Nadia 
> 
> > 
> > Brice
> > 
> > 
> > 
> > Le 09/02/2012 12:18, Jeff Squyres a écrit :
> > > Just so that I understand this better -- if a process is bound in 
> > a cpuset, will tools like hwloc's lstopo only show the Linux 
> > processors *in that cpuset*?  I.e., does it not have any visibility 
> > of the processors outside of its cpuset?
> > >
> > >
> > > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
> > >
> > >> Hi,
> > >>
> > >> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
> > >> is configured with:
> > >>   TaskPlugin=task/affinity
> > >>   TaskPluginParam=Cpusets
> > >>
> > >> each rank of that job is in a cpuset that contains a single CPU.
> > >>
> > >> Now, if we use carto on top of this, the following happens in
> > >> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> > >>   . opal_paffinity_base_get_processor_info() is called to get the
> > >> number of logical processors (we get 1 due to the singleton cpuset)
> > >>   . we loop over that # of processors to check whether our process is
> > >> bound to one of them. In our case the loop will be executed only
> > >> once and we will never get the correct binding information.
> > >>   . if the process is bound actually get the distance to the device.
> > >> in our case we won't execute that part of the code.
> > >>
> > >> The attached patch is a proposal to fix the issue.
> > >>
> > >> Regards,
> > >> Nadia
> > >> ___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Jeff Squyres
On Feb 9, 2012, at 7:15 AM, nadia.der...@bull.net wrote:

> > By default, hwloc only shows what's inside the current cpuset. There's
> > an option to show everything instead (topology flag). 
> 
> So may be using that flag inside opal_paffinity_base_get_processor_info() 
> would be a better fix than the one I'm proposing in my patch. 

Is this trunk, or v1.5/1.6?  (or both?)

Perhaps the "good enough" fix for v1.5/1.6 is what you suggested.

But a better fix for the trunk is to use hwloc directly -- after all, 
paffinity/maffinity is going to go away in the not-distant future (in favor of 
100% using hwloc's API).  

That being said, it looks like opal_hwloc_topology is *not* loaded with 
HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM.  I think the assumption was that we wanted to 
look at our little foxhole to see exactly where we were bound.

I honestly forget -- if we don't set WHOLE_SYSTEM, does the reported tree only 
include PUs/etc. in the current cpuset?  I.e., some objects may be not in the 
tree altogether?  The hwloc docs talk about what happens to the cpuset fields 
in a given object when WHOLE_SYSTEM is set/not set, but it isn't entirely clear 
on this point.

FWIW, it looks like we're not setting any topology IO flags, either (most 
likely due to the fact that we brought in hwloc when it was 1.2.x; i.e., before 
it supported PCI devices).  I'm guessing we should probably set 
HWLOC_TOPOLOGY_FLAG_WHOLE_IO in all cases.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Chris Samuel
On Thursday 09 February 2012 22:18:20 Jeff Squyres wrote:

> Just so that I understand this better -- if a process is bound in a
> cpuset, will tools like hwloc's lstopo only show the Linux
> processors *in that cpuset*?  I.e., does it not have any
> visibility of the processors outside of its cpuset?

I believe that was the intention - there's no real benefit to knowing 
about resources that you can't access or use (and will likely get an 
error if you do) to my mind.

cheers!
Chris
-- 
   Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/


Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Jeff Squyres
On Feb 9, 2012, at 7:50 AM, Chris Samuel wrote:

>> Just so that I understand this better -- if a process is bound in a
>> cpuset, will tools like hwloc's lstopo only show the Linux
>> processors *in that cpuset*?  I.e., does it not have any
>> visibility of the processors outside of its cpuset?
> 
> I believe that was the intention - there's no real benefit to knowing 
> about resources that you can't access or use (and will likely get an 
> error if you do) to my mind.

The real question, however, is how are IO devices represented if you don't do 
WHOLE_SUBSYSTEM?  I.e., what about IO devices that are not local to the socket 
of your cpuset, for example?

Take this sample image, for example:

http://www.open-mpi.org/projects/hwloc/devel09-pci.png

What if my cpuset is only on Socket P#0?  What exactly will be reported via 
(WHOLE_SUBSYSTEM | HWLOC_TOPOLOGY_FLAG_WHOLE_IO)?

IO devices is something that we do have an interest in reporting so that we can 
tell the "distance" to them, for example.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread nadia . derbey
 devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM:

> De : Ralph Castain 
> A : Open MPI Developers 
> Date : 02/09/2012 01:32 PM
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun
> Envoyé par : devel-boun...@open-mpi.org
> 
> Hi Nadia
> 
> I'm wondering what value there is in showing the full topology, or 
> using it in any of our components, if the process is restricted to a
> specific set of cpus? Does it really help to know that there are 
> other cpus out there that are unreachable?

Ralph,

The intention here is not to show cpus that are unreachable, but to fix an 
issue we have at least in get_ib_dev_distance() in the openib btl.

The problem is that if a process is restricted to a single CPU, the 
algorithm used in get_ib_dev_distance doesn't work at all:
I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want 
the openib btl to select the one that is the closest to my rank.

As I said in my first e-mail, here is what is done today:
   . opal_paffinity_base_get_processor_info() is called to get the number 
of logical processors (we get 1 due to the singleton cpuset)
   . we loop over that # of processors to check whether our process is 
bound to one of them. In our case the loop will be executed only once and 
we will never get the correct binding information.
   . if the process is bound actually get the distance to the device.
in our case, the distance won't be computed and mlx4_0 will be 
seen as "equivalent" to mlx4_1 in terms of distances. This is what I 
definitely want to avoid.

Regards,
Nadia

> 
> On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote:
> 
> 
> 
> devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
> 
> > De : Brice Goglin  
> > A : Open MPI Developers  
> > Date : 02/09/2012 12:20 PM 
> > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> > processes as bound if the job has been launched by srun 
> > Envoyé par : devel-boun...@open-mpi.org 
> > 
> > By default, hwloc only shows what's inside the current cpuset. There's
> > an option to show everything instead (topology flag). 
> 
> So may be using that flag inside 
> opal_paffinity_base_get_processor_info() would be a better fix than 
> the one I'm proposing in my patch. 
> 
> I found a bunch of other places where things are managed as in 
> get_ib_dev_distance(). 
> 
> Just doing a grep in the sources, I could find: 
>   . init_maffinity() in btl/sm/btl_sm.c 
>   . vader_init_maffinity() in btl/vader/btl_vader.c 
>   . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
> 
> So I think the flag Brice is talking about should definitely be the fix. 

> 
> Regards, 
> Nadia 
> 
> > 
> > Brice
> > 
> > 
> > 
> > Le 09/02/2012 12:18, Jeff Squyres a écrit :
> > > Just so that I understand this better -- if a process is bound in 
> > a cpuset, will tools like hwloc's lstopo only show the Linux 
> > processors *in that cpuset*?  I.e., does it not have any visibility 
> > of the processors outside of its cpuset?
> > >
> > >
> > > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
> > >
> > >> Hi,
> > >>
> > >> If a job is launched using "srun --resv-ports --cpu_bind:..." and 
slurm
> > >> is configured with:
> > >>   TaskPlugin=task/affinity
> > >>   TaskPluginParam=Cpusets
> > >>
> > >> each rank of that job is in a cpuset that contains a single CPU.
> > >>
> > >> Now, if we use carto on top of this, the following happens in
> > >> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> > >>   . opal_paffinity_base_get_processor_info() is called to get the
> > >> number of logical processors (we get 1 due to the singleton 
cpuset)
> > >>   . we loop over that # of processors to check whether our process 
is
> > >> bound to one of them. In our case the loop will be executed 
only
> > >> once and we will never get the correct binding information.
> > >>   . if the process is bound actually get the distance to the 
device.
> > >> in our case we won't execute that part of the code.
> > >>
> > >> The attached patch is a proposal to fix the issue.
> > >>
> > >> Regards,
> > >> Nadia
> > >> 
> 
___
> > >> devel mailing list
> > >> de...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >
> > 
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
There is another aspect, though - I had missed it in the thread, but the 
question Nadia was addressing is: how to tell I am bound? The way we currently 
do it is to compare our cpuset against the local cpuset - if we are on a 
subset, then we know we are bound.

So if all hwloc returns to us is our cpuset, then we cannot make that 
determination. Yet I do see a utility as well in only showing our own cpus.

Would it make sense to add a field to the hwloc_obj_t that contains the 
"accessible" cpus? Or a flag indicating "you are bound to a subset of all 
available cpus"?

Really, all we need is the flag - but we could compute it ourselves if we had 
the larger scope info.

On Feb 9, 2012, at 5:53 AM, Jeff Squyres wrote:

> On Feb 9, 2012, at 7:50 AM, Chris Samuel wrote:
> 
>>> Just so that I understand this better -- if a process is bound in a
>>> cpuset, will tools like hwloc's lstopo only show the Linux
>>> processors *in that cpuset*?  I.e., does it not have any
>>> visibility of the processors outside of its cpuset?
>> 
>> I believe that was the intention - there's no real benefit to knowing 
>> about resources that you can't access or use (and will likely get an 
>> error if you do) to my mind.
> 
> The real question, however, is how are IO devices represented if you don't do 
> WHOLE_SUBSYSTEM?  I.e., what about IO devices that are not local to the 
> socket of your cpuset, for example?
> 
> Take this sample image, for example:
> 
>http://www.open-mpi.org/projects/hwloc/devel09-pci.png
> 
> What if my cpuset is only on Socket P#0?  What exactly will be reported via 
> (WHOLE_SUBSYSTEM | HWLOC_TOPOLOGY_FLAG_WHOLE_IO)?
> 
> IO devices is something that we do have an interest in reporting so that we 
> can tell the "distance" to them, for example.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
Yes, I missed that point before - too early in the morning :-/

As I said in my last note, it would be nice to either have a flag indicating we 
are bound, or see all the cpu info so we can compute that we are bound. Either 
way, we still need to have a complete picture of all I/O devices so you can 
compute the distance.


On Feb 9, 2012, at 6:01 AM, nadia.der...@bull.net wrote:

>   
> 
> devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM:
> 
> > De : Ralph Castain  
> > A : Open MPI Developers  
> > Date : 02/09/2012 01:32 PM 
> > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> > processes as bound if the job has been launched by srun 
> > Envoyé par : devel-boun...@open-mpi.org 
> > 
> > Hi Nadia 
> > 
> > I'm wondering what value there is in showing the full topology, or 
> > using it in any of our components, if the process is restricted to a
> > specific set of cpus? Does it really help to know that there are 
> > other cpus out there that are unreachable? 
> 
> Ralph, 
> 
> The intention here is not to show cpus that are unreachable, but to fix an 
> issue we have at least in get_ib_dev_distance() in the openib btl. 
> 
> The problem is that if a process is restricted to a single CPU, the algorithm 
> used in get_ib_dev_distance doesn't work at all: 
> I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want the 
> openib btl to select the one that is the closest to my rank. 
> 
> As I said in my first e-mail, here is what is done today: 
>. opal_paffinity_base_get_processor_info() is called to get the number of 
> logical processors (we get 1 due to the singleton cpuset)
>   . we loop over that # of processors to check whether our process is bound 
> to one of them. In our case the loop will be executed only once and we will 
> never get the correct binding information.
>   . if the process is bound actually get the distance to the device.
>in our case, the distance won't be computed and mlx4_0 will be seen as 
> "equivalent" to mlx4_1 in terms of distances. This is what I definitely want 
> to avoid. 
> 
> Regards, 
> Nadia 
> 
> > 
> > On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote: 
> > 
> >   
> > 
> > devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
> > 
> > > De : Brice Goglin  
> > > A : Open MPI Developers  
> > > Date : 02/09/2012 12:20 PM 
> > > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> > > processes as bound if the job has been launched by srun 
> > > Envoyé par : devel-boun...@open-mpi.org 
> > > 
> > > By default, hwloc only shows what's inside the current cpuset. There's
> > > an option to show everything instead (topology flag). 
> > 
> > So may be using that flag inside 
> > opal_paffinity_base_get_processor_info() would be a better fix than 
> > the one I'm proposing in my patch. 
> > 
> > I found a bunch of other places where things are managed as in 
> > get_ib_dev_distance(). 
> > 
> > Just doing a grep in the sources, I could find: 
> >   . init_maffinity() in btl/sm/btl_sm.c 
> >   . vader_init_maffinity() in btl/vader/btl_vader.c 
> >   . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
> > 
> > So I think the flag Brice is talking about should definitely be the fix. 
> > 
> > Regards, 
> > Nadia 
> > 
> > > 
> > > Brice
> > > 
> > > 
> > > 
> > > Le 09/02/2012 12:18, Jeff Squyres a écrit :
> > > > Just so that I understand this better -- if a process is bound in 
> > > a cpuset, will tools like hwloc's lstopo only show the Linux 
> > > processors *in that cpuset*?  I.e., does it not have any visibility 
> > > of the processors outside of its cpuset?
> > > >
> > > >
> > > > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> If a job is launched using "srun --resv-ports --cpu_bind:..." and slurm
> > > >> is configured with:
> > > >>   TaskPlugin=task/affinity
> > > >>   TaskPluginParam=Cpusets
> > > >>
> > > >> each rank of that job is in a cpuset that contains a single CPU.
> > > >>
> > > >> Now, if we use carto on top of this, the following happens in
> > > >> get_ib_dev_distance() (in btl/openib/btl_openib_component.c):
> > > >>   . opal_paffinity_base_get_processor_info() is called to get the
> > > >> number of logical processors (we get 1 due to the singleton cpuset)
> > > >>   . we loop over that # of processors to check whether our process is
> > > >> bound to one of them. In our case the loop will be executed only
> > > >> once and we will never get the correct binding information.
> > > >>   . if the process is bound actually get the distance to the device.
> > > >> in our case we won't execute that part of the code.
> > > >>
> > > >> The attached patch is a proposal to fix the issue.
> > > >>
> > > >> Regards,
> > > >> Nadia
> > > >> 
> > ___
> > > >> devel mailing list
> > > >> de...@open-mpi.org
> > > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > >

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin



Jeff Squyres  a écrit :

>On Feb 9, 2012, at 7:50 AM, Chris Samuel wrote:
>
>>> Just so that I understand this better -- if a process is bound in a
>>> cpuset, will tools like hwloc's lstopo only show the Linux
>>> processors *in that cpuset*?  I.e., does it not have any
>>> visibility of the processors outside of its cpuset?
>> 
>> I believe that was the intention - there's no real benefit to knowing
>
>> about resources that you can't access or use (and will likely get an 
>> error if you do) to my mind.
>
>The real question, however, is how are IO devices represented if you
>don't do WHOLE_SUBSYSTEM?  I.e., what about IO devices that are not
>local to the socket of your cpuset, for example?
>
>Take this sample image, for example:
>
>http://www.open-mpi.org/projects/hwloc/devel09-pci.png
>
>What if my cpuset is only on Socket P#0?  What exactly will be reported
>via (WHOLE_SUBSYSTEM | HWLOC_TOPOLOGY_FLAG_WHOLE_IO)?

I actually fixed something related to this case in 1.3.2. The device will be 
attached to the root object in this case iirc.

Brice


Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Jeff Squyres
Should we just do this, then:

Index: mca/hwloc/base/hwloc_base_util.c
===
--- mca/hwloc/base/hwloc_base_util.c(revision 25885)
+++ mca/hwloc/base/hwloc_base_util.c(working copy)
@@ -173,6 +173,9 @@
  "hwloc:base:get_topology"));

 if (0 != hwloc_topology_init(&opal_hwloc_topology) ||
+0 != hwloc_topology_set_flags(opal_hwloc_topology, 
+  (HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM |
+   HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) ||
 0 != hwloc_topology_load(opal_hwloc_topology)) {
 return OPAL_ERR_NOT_SUPPORTED;
 }



On Feb 9, 2012, at 8:04 AM, Ralph Castain wrote:

> Yes, I missed that point before - too early in the morning :-/
> 
> As I said in my last note, it would be nice to either have a flag indicating 
> we are bound, or see all the cpu info so we can compute that we are bound. 
> Either way, we still need to have a complete picture of all I/O devices so 
> you can compute the distance.
> 
> 
> On Feb 9, 2012, at 6:01 AM, nadia.der...@bull.net wrote:
> 
>>   
>> 
>> devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM:
>> 
>> > De : Ralph Castain  
>> > A : Open MPI Developers  
>> > Date : 02/09/2012 01:32 PM 
>> > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
>> > processes as bound if the job has been launched by srun 
>> > Envoyé par : devel-boun...@open-mpi.org 
>> > 
>> > Hi Nadia 
>> > 
>> > I'm wondering what value there is in showing the full topology, or 
>> > using it in any of our components, if the process is restricted to a
>> > specific set of cpus? Does it really help to know that there are 
>> > other cpus out there that are unreachable? 
>> 
>> Ralph, 
>> 
>> The intention here is not to show cpus that are unreachable, but to fix an 
>> issue we have at least in get_ib_dev_distance() in the openib btl. 
>> 
>> The problem is that if a process is restricted to a single CPU, the 
>> algorithm used in get_ib_dev_distance doesn't work at all: 
>> I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want the 
>> openib btl to select the one that is the closest to my rank. 
>> 
>> As I said in my first e-mail, here is what is done today: 
>>. opal_paffinity_base_get_processor_info() is called to get the number of 
>> logical processors (we get 1 due to the singleton cpuset)
>>   . we loop over that # of processors to check whether our process is bound 
>> to one of them. In our case the loop will be executed only once and we will 
>> never get the correct binding information.
>>   . if the process is bound actually get the distance to the device.
>>in our case, the distance won't be computed and mlx4_0 will be seen 
>> as "equivalent" to mlx4_1 in terms of distances. This is what I definitely 
>> want to avoid. 
>> 
>> Regards, 
>> Nadia 
>> 
>> > 
>> > On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote: 
>> > 
>> >   
>> > 
>> > devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
>> > 
>> > > De : Brice Goglin  
>> > > A : Open MPI Developers  
>> > > Date : 02/09/2012 12:20 PM 
>> > > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
>> > > processes as bound if the job has been launched by srun 
>> > > Envoyé par : devel-boun...@open-mpi.org 
>> > > 
>> > > By default, hwloc only shows what's inside the current cpuset. There's
>> > > an option to show everything instead (topology flag). 
>> > 
>> > So may be using that flag inside 
>> > opal_paffinity_base_get_processor_info() would be a better fix than 
>> > the one I'm proposing in my patch. 
>> > 
>> > I found a bunch of other places where things are managed as in 
>> > get_ib_dev_distance(). 
>> > 
>> > Just doing a grep in the sources, I could find: 
>> >   . init_maffinity() in btl/sm/btl_sm.c 
>> >   . vader_init_maffinity() in btl/vader/btl_vader.c 
>> >   . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
>> > 
>> > So I think the flag Brice is talking about should definitely be the fix. 
>> > 
>> > Regards, 
>> > Nadia 
>> > 
>> > > 
>> > > Brice
>> > > 
>> > > 
>> > > 
>> > > Le 09/02/2012 12:18, Jeff Squyres a écrit :
>> > > > Just so that I understand this better -- if a process is bound in 
>> > > a cpuset, will tools like hwloc's lstopo only show the Linux 
>> > > processors *in that cpuset*?  I.e., does it not have any visibility 
>> > > of the processors outside of its cpuset?
>> > > >
>> > > >
>> > > > On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>> > > >
>> > > >> Hi,
>> > > >>
>> > > >> If a job is launched using "srun --resv-ports --cpu_bind:..." and 
>> > > >> slurm
>> > > >> is configured with:
>> > > >>   TaskPlugin=task/affinity
>> > > >>   TaskPluginParam=Cpusets
>> > > >>
>> > > >> each rank of that job is in a cpuset that contains a single CPU.
>> > > >>
>> > > >> Now, if we use carto on top of this, the following happens in
>> > > >>

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Jeff Squyres
On Feb 9, 2012, at 8:06 AM, Brice Goglin wrote:

>> What if my cpuset is only on Socket P#0?  What exactly will be reported
>> via (WHOLE_SUBSYSTEM | HWLOC_TOPOLOGY_FLAG_WHOLE_IO)?
> 
> I actually fixed something related to this case in 1.3.2. The device will be 
> attached to the root object in this case iirc.

Ah, gotcha.

That doesn't seem too attractive from an OMPI perspective, though.  We'd want 
to know where the PCI devices are actually rooted.

Another reason OMPI wants the whole system: be able to tell the memory 
characteristics of other processes on the same server as me (e.g., be able to 
tell that it's on a different numa node, socket, ...etc.).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
Yeah, I think that's the right solution. We'll have to check the impact on the 
rest of the code, but I -think- it will be okay - else we'll have to make some 
tweaks here and there. Either way, it's still the right answer, I think.

On Feb 9, 2012, at 6:14 AM, Jeff Squyres wrote:

> Should we just do this, then:
> 
> Index: mca/hwloc/base/hwloc_base_util.c
> ===
> --- mca/hwloc/base/hwloc_base_util.c  (revision 25885)
> +++ mca/hwloc/base/hwloc_base_util.c  (working copy)
> @@ -173,6 +173,9 @@
>  "hwloc:base:get_topology"));
> 
> if (0 != hwloc_topology_init(&opal_hwloc_topology) ||
> +0 != hwloc_topology_set_flags(opal_hwloc_topology, 
> +  (HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM |
> +   HWLOC_TOPOLOGY_FLAG_WHOLE_IO)) ||
> 0 != hwloc_topology_load(opal_hwloc_topology)) {
> return OPAL_ERR_NOT_SUPPORTED;
> }
> 
> 
> 
> On Feb 9, 2012, at 8:04 AM, Ralph Castain wrote:
> 
>> Yes, I missed that point before - too early in the morning :-/
>> 
>> As I said in my last note, it would be nice to either have a flag indicating 
>> we are bound, or see all the cpu info so we can compute that we are bound. 
>> Either way, we still need to have a complete picture of all I/O devices so 
>> you can compute the distance.
>> 
>> 
>> On Feb 9, 2012, at 6:01 AM, nadia.der...@bull.net wrote:
>> 
>>> 
>>> 
>>> devel-boun...@open-mpi.org wrote on 02/09/2012 01:32:31 PM:
>>> 
 De : Ralph Castain  
 A : Open MPI Developers  
 Date : 02/09/2012 01:32 PM 
 Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
 processes as bound if the job has been launched by srun 
 Envoyé par : devel-boun...@open-mpi.org 
 
 Hi Nadia 
 
 I'm wondering what value there is in showing the full topology, or 
 using it in any of our components, if the process is restricted to a
 specific set of cpus? Does it really help to know that there are 
 other cpus out there that are unreachable? 
>>> 
>>> Ralph, 
>>> 
>>> The intention here is not to show cpus that are unreachable, but to fix an 
>>> issue we have at least in get_ib_dev_distance() in the openib btl. 
>>> 
>>> The problem is that if a process is restricted to a single CPU, the 
>>> algorithm used in get_ib_dev_distance doesn't work at all: 
>>> I have 2 ib interfaces on my victim (say mlx4_0 and mlx4_1), and I want the 
>>> openib btl to select the one that is the closest to my rank. 
>>> 
>>> As I said in my first e-mail, here is what is done today: 
>>>   . opal_paffinity_base_get_processor_info() is called to get the number of 
>>> logical processors (we get 1 due to the singleton cpuset)
>>>  . we loop over that # of processors to check whether our process is bound 
>>> to one of them. In our case the loop will be executed only once and we will 
>>> never get the correct binding information.
>>>  . if the process is bound actually get the distance to the device.
>>>   in our case, the distance won't be computed and mlx4_0 will be seen 
>>> as "equivalent" to mlx4_1 in terms of distances. This is what I definitely 
>>> want to avoid. 
>>> 
>>> Regards, 
>>> Nadia 
>>> 
 
 On Feb 9, 2012, at 5:15 AM, nadia.der...@bull.net wrote: 
 
 
 
 devel-boun...@open-mpi.org wrote on 02/09/2012 12:20:41 PM:
 
> De : Brice Goglin  
> A : Open MPI Developers  
> Date : 02/09/2012 12:20 PM 
> Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see
> processes as bound if the job has been launched by srun 
> Envoyé par : devel-boun...@open-mpi.org 
> 
> By default, hwloc only shows what's inside the current cpuset. There's
> an option to show everything instead (topology flag). 
 
 So may be using that flag inside 
 opal_paffinity_base_get_processor_info() would be a better fix than 
 the one I'm proposing in my patch. 
 
 I found a bunch of other places where things are managed as in 
 get_ib_dev_distance(). 
 
 Just doing a grep in the sources, I could find: 
  . init_maffinity() in btl/sm/btl_sm.c 
  . vader_init_maffinity() in btl/vader/btl_vader.c 
  . get_ib_dev_distance() in btl/wv/btl_wv_component.c 
 
 So I think the flag Brice is talking about should definitely be the fix. 
 
 Regards, 
 Nadia 
 
> 
> Brice
> 
> 
> 
> Le 09/02/2012 12:18, Jeff Squyres a écrit :
>> Just so that I understand this better -- if a process is bound in 
> a cpuset, will tools like hwloc's lstopo only show the Linux 
> processors *in that cpuset*?  I.e., does it not have any visibility 
> of the processors outside of its cpuset?
>> 
>> 
>> On Jan 27, 2012, at 11:38 AM, nadia.derbey wrote:
>> 
>>> Hi,
>>> 
>>> If a job is launched using "srun --resv-ports --

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
I'm not sure I understand this comment. A PCI device is attached to the node, 
not to any specific location within the node, isn't it? Can you really say that 
a PCI device is "attached" to a specific NUMA location, for example?


On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:

> That doesn't seem too attractive from an OMPI perspective, though.  We'd want 
> to know where the PCI devices are actually rooted.



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin
The bios usually tells you which numa location is close to each host-to-pci 
bridge. So the answer is yes.
Brice


Ralph Castain  a écrit :

I'm not sure I understand this comment. A PCI device is attached to the node, 
not to any specific location within the node, isn't it? Can you really say that 
a PCI device is "attached" to a specific NUMA location, for example?



On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:


That doesn't seem too attractive from an OMPI perspective, though.  We'd want 
to know where the PCI devices are actually rooted.




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
Ah, okay - in that case, having the I/O device attached to the "closest" object 
at each depth would be ideal from an OMPI perspective.

On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:

> The bios usually tells you which numa location is close to each host-to-pci 
> bridge. So the answer is yes.
> Brice
> 
> 
> Ralph Castain  a écrit :
> I'm not sure I understand this comment. A PCI device is attached to the node, 
> not to any specific location within the node, isn't it? Can you really say 
> that a PCI device is "attached" to a specific NUMA location, for example?
> 
> 
> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
> 
>> That doesn't seem too attractive from an OMPI perspective, though.  We'd 
>> want to know where the PCI devices are actually rooted.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [PATCH]Segmentation Fault occurs when the function called from MPI_Comm_spawn_multiple fails

2012-02-09 Thread Ralph Castain
Thanks! I added the patch to the trunk and submitted it for the 1.6 update.

On Feb 8, 2012, at 10:20 PM, Y.MATSUMOTO wrote:

> Dear All,
> 
> Next feedback is "MPI_Comm_spawn_multiple".
> 
> When the function called from MPI_Comm_spawn_multiple failed,
> Segmentation fault occurs.
> In that condition, "newcomp" sets NULL.
> But member of "newcomp" is referred at following part.
> (ompi/mpi/c/comm_spawn_multiple.c)
> 176 /* set array of errorcodes */
> 177 if (MPI_ERRCODES_IGNORE != array_of_errcodes) {
> 178 for ( i=0; i < newcomp->c_remote_group->grp_proc_count; i++ ) {
> 179 array_of_errcodes[i]=rc;
> 180 }
> 181 }
> Attached patch fixes it. (Patch is for V1.4.x).
> 
> Best regards,
> Yuki MATSUMOTO
> MPI development team,
> Fujitsu
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Jeff Squyres
Nadia --

I committed the fix in the trunk to use HWLOC_WHOLE_SYSTEM and IO_DEVICES.

Do you want to revise your patch to use hwloc APIs with opal_hwloc_topology 
(instead of paffinity)?  We could use that as a basis for the other places you 
identified that are doing similar things.


On Feb 9, 2012, at 8:34 AM, Ralph Castain wrote:

> Ah, okay - in that case, having the I/O device attached to the "closest" 
> object at each depth would be ideal from an OMPI perspective.
> 
> On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
> 
>> The bios usually tells you which numa location is close to each host-to-pci 
>> bridge. So the answer is yes.
>> Brice
>> 
>> 
>> Ralph Castain  a écrit :
>> I'm not sure I understand this comment. A PCI device is attached to the 
>> node, not to any specific location within the node, isn't it? Can you really 
>> say that a PCI device is "attached" to a specific NUMA location, for example?
>> 
>> 
>> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
>> 
>>> That doesn't seem too attractive from an OMPI perspective, though.  We'd 
>>> want to know where the PCI devices are actually rooted.
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin
That doesn't really work with the hwloc model unfortunately. Also, when
you get to smaller objects (cores, threads, ...) there are multiple
"closest" objects at each depth.

We have one "closest" object at some depth (usually Machine or NUMA
node). If you need something higher, you just walk the parent links. If
you need something smaller, you look at children.

Also, each I/O device isn't directly attached to such a closest object.
It's usually attached under some bridge objects. There's a tree of hwloc
PCI bus objects exactly like you have a tree of hwloc
sockets/cores/threads/etc. At the top of the I/O tree, one (bridge)
object is attached to a regular object as explained earlier. So, when
you have a random hwloc PCI object, you get its locality by walking up
its parent link until you find a non-I/O object (one whose cpuset isn't
NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj() to do that.

Brice



Le 09/02/2012 14:34, Ralph Castain a écrit :
> Ah, okay - in that case, having the I/O device attached to the
> "closest" object at each depth would be ideal from an OMPI perspective.
>
> On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
>
>> The bios usually tells you which numa location is close to each
>> host-to-pci bridge. So the answer is yes.
>> Brice
>>
>>
>> Ralph Castain mailto:r...@open-mpi.org>> a écrit :
>>
>> I'm not sure I understand this comment. A PCI device is attached
>> to the node, not to any specific location within the node, isn't
>> it? Can you really say that a PCI device is "attached" to a
>> specific NUMA location, for example?
>>
>>
>> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
>>
>>> That doesn't seem too attractive from an OMPI perspective,
>>> though.  We'd want to know where the PCI devices are actually
>>> rooted.
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
Hmmm….guess we'll have to play with it. Our need is to start with a core or 
some similar object, and quickly determine the closest IO device of a certain 
type. We wound up having to write "summarizer" code to parse the hwloc tree 
into a more OMPI-usable form, so we can always do that with the IO tree as well 
if necessary.


On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:

> That doesn't really work with the hwloc model unfortunately. Also, when you 
> get to smaller objects (cores, threads, ...) there are multiple "closest" 
> objects at each depth.
> 
> We have one "closest" object at some depth (usually Machine or NUMA node). If 
> you need something higher, you just walk the parent links. If you need 
> something smaller, you look at children.
> 
> Also, each I/O device isn't directly attached to such a closest object. It's 
> usually attached under some bridge objects. There's a tree of hwloc PCI bus 
> objects exactly like you have a tree of hwloc sockets/cores/threads/etc. At 
> the top of the I/O tree, one (bridge) object is attached to a regular object 
> as explained earlier. So, when you have a random hwloc PCI object, you get 
> its locality by walking up its parent link until you find a non-I/O object 
> (one whose cpuset isn't NULL). hwloc/helper.h gives you 
> hwloc_get_non_io_ancestor_obj() to do that.
> 
> Brice
> 
> 
> 
> Le 09/02/2012 14:34, Ralph Castain a écrit :
>> 
>> Ah, okay - in that case, having the I/O device attached to the "closest" 
>> object at each depth would be ideal from an OMPI perspective.
>> 
>> On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
>> 
>>> The bios usually tells you which numa location is close to each host-to-pci 
>>> bridge. So the answer is yes.
>>> Brice
>>> 
>>> 
>>> Ralph Castain  a écrit :
>>> I'm not sure I understand this comment. A PCI device is attached to the 
>>> node, not to any specific location within the node, isn't it? Can you 
>>> really say that a PCI device is "attached" to a specific NUMA location, for 
>>> example?
>>> 
>>> 
>>> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
>>> 
 That doesn't seem too attractive from an OMPI perspective, though.  We'd 
 want to know where the PCI devices are actually rooted.
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin
Le 09/02/2012 14:00, Ralph Castain a écrit :
> There is another aspect, though - I had missed it in the thread, but the 
> question Nadia was addressing is: how to tell I am bound? The way we 
> currently do it is to compare our cpuset against the local cpuset - if we are 
> on a subset, then we know we are bound.
>
> So if all hwloc returns to us is our cpuset, then we cannot make that 
> determination. Yet I do see a utility as well in only showing our own cpus.

Each hwloc object has several "cpuset" fields describing whether CPUs
are online or not, and accessible or not. Here are their meaning when
the WHOLE_SYSTEM flag is NOT set:
* "cpuset" only contains CPUs that are online and accessible
* "online_cpuset" is "cpuset" + CPUs that are online but not accessible
* "allowed_cpuset" is "cpuset" + CPUs that are accessible but not online
* "complete_cpuset" is everything

So you can find out that you are "bound" by a Linux cgroup (I am not
saying Linux "cpuset" to avoid confusion) by comparing root->cpuset and
root->online_cpuset.

Brice


> Would it make sense to add a field to the hwloc_obj_t that contains the 
> "accessible" cpus? Or a flag indicating "you are bound to a subset of all 
> available cpus"?
>
> Really, all we need is the flag - but we could compute it ourselves if we had 
> the larger scope info.




Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Brice Goglin
Here's what I would do:
During init, walk the list of hwloc PCI devices
(hwloc_get_next_pcidev()) and keep an array of pointers to the
interesting onces + their locality (the hwloc cpuset of the parent
non-IO object).
When you want the I/O device near a core, walk the array and find one
whose locality contains your core hwloc cpuset.

If you need help, feel free to contact me offline.

Brice



Le 09/02/2012 22:14, Ralph Castain a écrit :
> Hmmm….guess we'll have to play with it. Our need is to start with a
> core or some similar object, and quickly determine the closest IO
> device of a certain type. We wound up having to write "summarizer"
> code to parse the hwloc tree into a more OMPI-usable form, so we can
> always do that with the IO tree as well if necessary.
>
>
> On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:
>
>> That doesn't really work with the hwloc model unfortunately. Also,
>> when you get to smaller objects (cores, threads, ...) there are
>> multiple "closest" objects at each depth.
>>
>> We have one "closest" object at some depth (usually Machine or NUMA
>> node). If you need something higher, you just walk the parent links.
>> If you need something smaller, you look at children.
>>
>> Also, each I/O device isn't directly attached to such a closest
>> object. It's usually attached under some bridge objects. There's a
>> tree of hwloc PCI bus objects exactly like you have a tree of hwloc
>> sockets/cores/threads/etc. At the top of the I/O tree, one (bridge)
>> object is attached to a regular object as explained earlier. So, when
>> you have a random hwloc PCI object, you get its locality by walking
>> up its parent link until you find a non-I/O object (one whose cpuset
>> isn't NULL). hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj()
>> to do that.
>>
>> Brice
>>
>>
>>
>> Le 09/02/2012 14:34, Ralph Castain a écrit :
>>> Ah, okay - in that case, having the I/O device attached to the
>>> "closest" object at each depth would be ideal from an OMPI perspective.
>>>
>>> On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
>>>
 The bios usually tells you which numa location is close to each
 host-to-pci bridge. So the answer is yes.
 Brice


 Ralph Castain mailto:r...@open-mpi.org>> a écrit :

 I'm not sure I understand this comment. A PCI device is
 attached to the node, not to any specific location within the
 node, isn't it? Can you really say that a PCI device is
 "attached" to a specific NUMA location, for example?


 On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:

> That doesn't seem too attractive from an OMPI perspective,
> though.  We'd want to know where the PCI devices are actually
> rooted.

 ___
 devel mailing list
 de...@open-mpi.org 
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Paul H. Hargrove



On 2/9/2012 1:19 PM, Brice Goglin wrote:

So you can find out that you are "bound" by a Linux cgroup (I am not
saying Linux "cpuset" to avoid confusion) by comparing root->cpuset and
root->online_cpuset.


If I understood the problem as stated earlier in this thread the current 
code was looping over a (singleton) cpuset and not finding finding the 
current process to be bound to any of the cpus in the set.  For that 
case the fact that the cpuset is a singleton should already have been 
enough information to know that one is effectively bound.  Is there 
really more to this than a need for special-casing the singleton?


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Ralph Castain
That's pretty much what I had in mind too - will have to play with it a bit 
until we find the best solution, but it shouldn't be all that hard.

On Feb 9, 2012, at 2:23 PM, Brice Goglin wrote:

> Here's what I would do:
> During init, walk the list of hwloc PCI devices (hwloc_get_next_pcidev()) and 
> keep an array of pointers to the interesting onces + their locality (the 
> hwloc cpuset of the parent non-IO object).
> When you want the I/O device near a core, walk the array and find one whose 
> locality contains your core hwloc cpuset.
> 
> If you need help, feel free to contact me offline.
> 
> Brice
> 
> 
> 
> Le 09/02/2012 22:14, Ralph Castain a écrit :
>> 
>> Hmmm….guess we'll have to play with it. Our need is to start with a core or 
>> some similar object, and quickly determine the closest IO device of a 
>> certain type. We wound up having to write "summarizer" code to parse the 
>> hwloc tree into a more OMPI-usable form, so we can always do that with the 
>> IO tree as well if necessary.
>> 
>> 
>> On Feb 9, 2012, at 2:09 PM, Brice Goglin wrote:
>> 
>>> That doesn't really work with the hwloc model unfortunately. Also, when you 
>>> get to smaller objects (cores, threads, ...) there are multiple "closest" 
>>> objects at each depth.
>>> 
>>> We have one "closest" object at some depth (usually Machine or NUMA node). 
>>> If you need something higher, you just walk the parent links. If you need 
>>> something smaller, you look at children.
>>> 
>>> Also, each I/O device isn't directly attached to such a closest object. 
>>> It's usually attached under some bridge objects. There's a tree of hwloc 
>>> PCI bus objects exactly like you have a tree of hwloc 
>>> sockets/cores/threads/etc. At the top of the I/O tree, one (bridge) object 
>>> is attached to a regular object as explained earlier. So, when you have a 
>>> random hwloc PCI object, you get its locality by walking up its parent link 
>>> until you find a non-I/O object (one whose cpuset isn't NULL). 
>>> hwloc/helper.h gives you hwloc_get_non_io_ancestor_obj() to do that.
>>> 
>>> Brice
>>> 
>>> 
>>> 
>>> Le 09/02/2012 14:34, Ralph Castain a écrit :
 
 Ah, okay - in that case, having the I/O device attached to the "closest" 
 object at each depth would be ideal from an OMPI perspective.
 
 On Feb 9, 2012, at 6:30 AM, Brice Goglin wrote:
 
> The bios usually tells you which numa location is close to each 
> host-to-pci bridge. So the answer is yes.
> Brice
> 
> 
> Ralph Castain  a écrit :
> I'm not sure I understand this comment. A PCI device is attached to the 
> node, not to any specific location within the node, isn't it? Can you 
> really say that a PCI device is "attached" to a specific NUMA location, 
> for example?
> 
> 
> On Feb 9, 2012, at 6:15 AM, Jeff Squyres wrote:
> 
>> That doesn't seem too attractive from an OMPI perspective, though.  We'd 
>> want to know where the PCI devices are actually rooted.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel