Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Sharon Melamed
Jeff,

The new PLPA fails in compilation. there is a need to change the
paffinity API's:
1. max_processor_id with one parameter --> get_processor_info with 2 parameters.
2. max_socket with one parameter --> get_socket_info with 2 parameters.
3. max_core with 2 parameters --> get_core_info with 3 parameters.

I changed these API's internally in my copy of the trunk and tested
the new PLPA.
it works properly.

Do you have an idea how to integrate the new PLPA with the new API's ?

Sharon.



On Feb 19, 2008 4:31 AM, Jeff Squyres  wrote:
> Sharon/Lenny --
>
> Could you try out the newest PLPA RC for me?  I think it's ready.  I
> just posted rc4 to the web site (I posted that rc3 was available, and
> then found a small bug that necessitated rc4): 
> http://www.open-mpi.org/software/plpa/v1.1/
>
> You should be able to do this to test it within an OMPI SVN checkout:
>
> cd opal/mca/paffinity/linux
> mv plpa bogus
> tar zxf plpa-1.1rc4.tar.gz
> ln -s plpa-1.1rc4 plpa
> cd ../../../..
> ./autogen && ./configure .. && make -j 4 ..
>
> Let me know if it works for you properly (configure, build, and
> function).  If so, I think it's ready for release.  I'll then do the
> SVN magic to bring it to the OMPI trunk.
>
> Thanks.
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Jeff Squyres

$%@#$%  Sorry.

I saw that and fixed it in my local OMPI SVN copy last night as well.  
Here's a patch to make it go (I obviously didn't want to commit this  
until the new PLPA goes in).  We *may* want to revise the paffinity  
API to match PLPA, not because Linux is the one-and-only-way, but  
because we actually took some effort in PLPA to make a fairly neutral  
API.



On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote:


Jeff,

The new PLPA fails in compilation. there is a need to change the
paffinity API's:
1. max_processor_id with one parameter --> get_processor_info with 2  
parameters.
2. max_socket with one parameter --> get_socket_info with 2  
parameters.

3. max_core with 2 parameters --> get_core_info with 3 parameters.

I changed these API's internally in my copy of the trunk and tested
the new PLPA.
it works properly.

Do you have an idea how to integrate the new PLPA with the new API's ?

Sharon.



On Feb 19, 2008 4:31 AM, Jeff Squyres  wrote:

Sharon/Lenny --

Could you try out the newest PLPA RC for me?  I think it's ready.  I
just posted rc4 to the web site (I posted that rc3 was available, and
then found a small bug that necessitated rc4): 
http://www.open-mpi.org/software/plpa/v1.1/

You should be able to do this to test it within an OMPI SVN checkout:

cd opal/mca/paffinity/linux
mv plpa bogus
tar zxf plpa-1.1rc4.tar.gz
ln -s plpa-1.1rc4 plpa
cd ../../../..
./autogen && ./configure .. && make -j 4 ..

Let me know if it works for you properly (configure, build, and
function).  If so, I think it's ready for release.  I'll then do the
SVN magic to bring it to the OMPI trunk.

Thanks.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems


linux-paffinity.patch
Description: Binary data




Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Terry Dontje

Jeff Squyres wrote:

$%@#$%  Sorry.

I saw that and fixed it in my local OMPI SVN copy last night as well. 
Here's a patch to make it go (I obviously didn't want to commit this 
until the new PLPA goes in).  We *may* want to revise the paffinity 
API to match PLPA, not because Linux is the one-and-only-way, but 
because we actually took some effort in PLPA to make a fairly neutral 
API.


Jeff can you work with Pak to make sure this doesn't completely mess up 
Solaris' processor affinity methods in OMPI.


--td



On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote:


Jeff,

The new PLPA fails in compilation. there is a need to change the
paffinity API's:
1. max_processor_id with one parameter --> get_processor_info with 2 
parameters.

2. max_socket with one parameter --> get_socket_info with 2 parameters.
3. max_core with 2 parameters --> get_core_info with 3 parameters.

I changed these API's internally in my copy of the trunk and tested
the new PLPA.
it works properly.

Do you have an idea how to integrate the new PLPA with the new API's ?

Sharon.



On Feb 19, 2008 4:31 AM, Jeff Squyres  wrote:

Sharon/Lenny --

Could you try out the newest PLPA RC for me?  I think it's ready.  I
just posted rc4 to the web site (I posted that rc3 was available, and
then found a small bug that necessitated rc4): 
http://www.open-mpi.org/software/plpa/v1.1/


You should be able to do this to test it within an OMPI SVN checkout:

cd opal/mca/paffinity/linux
mv plpa bogus
tar zxf plpa-1.1rc4.tar.gz
ln -s plpa-1.1rc4 plpa
cd ../../../..
./autogen && ./configure .. && make -j 4 ..

Let me know if it works for you properly (configure, build, and
function).  If so, I think it's ready for release.  I'll then do the
SVN magic to bring it to the OMPI trunk.

Thanks.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  




Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Jeff Squyres
Will do.  I stress that it *might* be worthwhile -- I think it at  
least partially depends on what Voltaire does and whether they think  
it should change (since they're the first ones using the paffinity API  
in a meaningful way).


If we want to change it, it would probably be good to do so before 1.3  
so that the interface can be [at least pseudo-]stable for the 1.3.x  
series.


Just my $0.02...



On Feb 19, 2008, at 11:47 AM, Terry Dontje wrote:


Jeff Squyres wrote:

$%@#$%  Sorry.

I saw that and fixed it in my local OMPI SVN copy last night as well.
Here's a patch to make it go (I obviously didn't want to commit this
until the new PLPA goes in).  We *may* want to revise the paffinity
API to match PLPA, not because Linux is the one-and-only-way, but
because we actually took some effort in PLPA to make a fairly neutral
API.

Jeff can you work with Pak to make sure this doesn't completely mess  
up

Solaris' processor affinity methods in OMPI.

--td



On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote:


Jeff,

The new PLPA fails in compilation. there is a need to change the
paffinity API's:
1. max_processor_id with one parameter --> get_processor_info with 2
parameters.
2. max_socket with one parameter --> get_socket_info with 2  
parameters.

3. max_core with 2 parameters --> get_core_info with 3 parameters.

I changed these API's internally in my copy of the trunk and tested
the new PLPA.
it works properly.

Do you have an idea how to integrate the new PLPA with the new  
API's ?


Sharon.



On Feb 19, 2008 4:31 AM, Jeff Squyres  wrote:

Sharon/Lenny --

Could you try out the newest PLPA RC for me?  I think it's  
ready.  I
just posted rc4 to the web site (I posted that rc3 was available,  
and

then found a small bug that necessitated rc4):
http://www.open-mpi.org/software/plpa/v1.1/

You should be able to do this to test it within an OMPI SVN  
checkout:


cd opal/mca/paffinity/linux
mv plpa bogus
tar zxf plpa-1.1rc4.tar.gz
ln -s plpa-1.1rc4 plpa
cd ../../../..
./autogen && ./configure .. && make -j 4 ..

Let me know if it works for you properly (configure, build, and
function).  If so, I think it's ready for release.  I'll then do  
the

SVN magic to bring it to the OMPI trunk.

Thanks.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Pak Lui
I am guessing it will not messing us up because these are the functions 
that Solaris doesn't really implement yet, right? Last time I check we 
are still hunting for some stable interfaces in Solaris to implement them.


Terry Dontje wrote:

Jeff Squyres wrote:

$%@#$%  Sorry.

I saw that and fixed it in my local OMPI SVN copy last night as well. 
Here's a patch to make it go (I obviously didn't want to commit this 
until the new PLPA goes in).  We *may* want to revise the paffinity 
API to match PLPA, not because Linux is the one-and-only-way, but 
because we actually took some effort in PLPA to make a fairly neutral 
API.


Jeff can you work with Pak to make sure this doesn't completely mess up 
Solaris' processor affinity methods in OMPI.


--td


On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote:


Jeff,

The new PLPA fails in compilation. there is a need to change the
paffinity API's:
1. max_processor_id with one parameter --> get_processor_info with 2 
parameters.

2. max_socket with one parameter --> get_socket_info with 2 parameters.
3. max_core with 2 parameters --> get_core_info with 3 parameters.

I changed these API's internally in my copy of the trunk and tested
the new PLPA.
it works properly.

Do you have an idea how to integrate the new PLPA with the new API's ?

Sharon.



On Feb 19, 2008 4:31 AM, Jeff Squyres  wrote:

Sharon/Lenny --

Could you try out the newest PLPA RC for me?  I think it's ready.  I
just posted rc4 to the web site (I posted that rc3 was available, and
then found a small bug that necessitated rc4): 
http://www.open-mpi.org/software/plpa/v1.1/


You should be able to do this to test it within an OMPI SVN checkout:

cd opal/mca/paffinity/linux
mv plpa bogus
tar zxf plpa-1.1rc4.tar.gz
ln -s plpa-1.1rc4 plpa
cd ../../../..
./autogen && ./configure .. && make -j 4 ..

Let me know if it works for you properly (configure, build, and
function).  If so, I think it's ready for release.  I'll then do the
SVN magic to bring it to the OMPI trunk.

Thanks.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

- Pak Lui
pak@sun.com


Re: [OMPI devel] PLPA ready?

2008-02-19 Thread Sharon Melamed
Jeff,

In the patch you sent the variables: num_processors, num_sockets and
num_cores are lost outside the paffinity framework.
I need those in the ODLS framework. what do think about the attached patch?

Sharon.

2008/2/19 Jeff Squyres :
> $%@#$%  Sorry.
>
> I saw that and fixed it in my local OMPI SVN copy last night as well.
> Here's a patch to make it go (I obviously didn't want to commit this
> until the new PLPA goes in).  We *may* want to revise the paffinity
> API to match PLPA, not because Linux is the one-and-only-way, but
> because we actually took some effort in PLPA to make a fairly neutral
> API.
>
>
>
> On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote:
>
> > Jeff,
> >
> > The new PLPA fails in compilation. there is a need to change the
> > paffinity API's:
> > 1. max_processor_id with one parameter --> get_processor_info with 2
> > parameters.
> > 2. max_socket with one parameter --> get_socket_info with 2
> > parameters.
> > 3. max_core with 2 parameters --> get_core_info with 3 parameters.
> >
> > I changed these API's internally in my copy of the trunk and tested
> > the new PLPA.
> > it works properly.
> >
> > Do you have an idea how to integrate the new PLPA with the new API's ?
> >
> > Sharon.
> >
> >
> >
> > On Feb 19, 2008 4:31 AM, Jeff Squyres  wrote:
> >> Sharon/Lenny --
> >>
> >> Could you try out the newest PLPA RC for me?  I think it's ready.  I
> >> just posted rc4 to the web site (I posted that rc3 was available, and
> >> then found a small bug that necessitated rc4): 
> >> http://www.open-mpi.org/software/plpa/v1.1/
> >>
> >> You should be able to do this to test it within an OMPI SVN checkout:
> >>
> >> cd opal/mca/paffinity/linux
> >> mv plpa bogus
> >> tar zxf plpa-1.1rc4.tar.gz
> >> ln -s plpa-1.1rc4 plpa
> >> cd ../../../..
> >> ./autogen && ./configure .. && make -j 4 ..
> >>
> >> Let me know if it works for you properly (configure, build, and
> >> function).  If so, I think it's ready for release.  I'll then do the
> >> SVN magic to bring it to the OMPI trunk.
> >>
> >> Thanks.
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
Index: opal/mca/paffinity/linux/paffinity_linux_module.c
===
--- opal/mca/paffinity/linux/paffinity_linux_module.c	(revision 17442)
+++ opal/mca/paffinity/linux/paffinity_linux_module.c	(working copy)
@@ -45,9 +45,9 @@
 static int linux_module_get(opal_paffinity_base_cpu_set_t *cpumask);
 static int linux_module_map_to_processor_id(int socket, int core, int *processor_id);
 static int linux_module_map_to_socket_core(int processor_id, int *socket, int *core);
-static int linux_module_max_processor_id(int *max_processor_id);
-static int linux_module_max_socket(int *max_socket);
-static int linux_module_max_core(int socket, int *max_core);
+static int linux_module_get_processor_info(int *num_processors, int *max_processor_id);
+static int linux_module_get_socket_info(int *num_sockets, int *max_socket_num);
+static int linux_module_get_core_info(int socket, int *num_cores, int *max_core_num);
 
 /*
  * Linux paffinity module
@@ -64,9 +64,9 @@
 linux_module_get,
 linux_module_map_to_processor_id,
 linux_module_map_to_socket_core,
-linux_module_max_processor_id,
-linux_module_max_socket,
-linux_module_max_core,
+linux_module_get_processor_info,
+linux_module_get_socket_info,
+linux_module_get_core_info,
 NULL
 };
 
@@ -168,18 +168,18 @@
return opal_paffinity_linux_plpa_map_to_socket_core(processor_id, socket, core);
 }
 
-static int linux_module_max_processor_id(int *max_processor_id)
+static int linux_module_get_processor_info(int *num_processors, int *max_processor_id)
 {
-   return opal_paffinity_linux_plpa_max_processor_id(max_processor_id);
+   return opal_paffinity_linux_plpa_get_processor_info(num_processors, max_processor_id);
 }
 
-static int linux_module_max_socket(int *max_socket)
+static int linux_module_get_socket_info(int *num_sockets, int *max_socket_num)
 {
-   return opal_paffinity_linux_plpa_max_socket(max_socket);
+   return opal_paffinity_linux_plpa_get_socket_info(num_sockets, max_socket_num);
 }
 
-static int linux_module_max_core(int socket, int *max_core)
+static int linux_module_get_core_info(int socket, int *num_cores, int *max_core_num)
 {
-   return opal_paffinity_linux_plpa_max_core(socket, max_core);
+   return opal_paffinity_linux_plpa_get_core_info(socket, num_cores, max_core_num);
 }
 
Index: opal/mca/

[OMPI devel] RDMA pipeline

2008-02-19 Thread George Bosilca
Few days ago during some testing I realize that the RDMA pipeline was  
disabled for MX and Elan (I didn't check for the others). A quick look  
into the source code, pinpointed the problem into the pml_ob1_rdma.c  
file, and it seems that the problem was introduced by commit 15247.  
The problem comes from the usage of the dummy registration, which is  
set for all non mpool friendly BTL. Later on this is checked against  
NULL (and of course it fails), which basically disable the RDMA  
pipeline.


I'll enable the RDMA pipeline back in 2 days if I don't hear anything  
back. Attached is the patch that fix this problem.


  Thanks,
george.



pipeline_rdma.patch
Description: Binary data






smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] RDMA pipeline

2008-02-19 Thread Gleb Natapov
On Tue, Feb 19, 2008 at 02:13:30PM -0500, George Bosilca wrote:
> Few days ago during some testing I realize that the RDMA pipeline was  
> disabled for MX and Elan (I didn't check for the others). A quick look  
> into the source code, pinpointed the problem into the pml_ob1_rdma.c  
> file, and it seems that the problem was introduced by commit 15247. The 
> problem comes from the usage of the dummy registration, which is set for 
> all non mpool friendly BTL. Later on this is checked against NULL (and of 
> course it fails), which basically disable the RDMA pipeline.
Do you mean that mca_pml_ob1_send_request_start_rdma() is used for
rendezvous sends? I will be very surprised if ompi 1.2 works
differently. It assumes that if btl has no mpool then entire message buffer
is registered and no pipeline is needed. Trunk does the same but
differently. OpenIB also choose this route if buffer memory is allocated
by MPI_alloc_mem().

>
> I'll enable the RDMA pipeline back in 2 days if I don't hear anything  
> back. Attached is the patch that fix this problem.
>
I am not sure why you need pipeline for BTLs that don't require
registration, but by applying this patch you'll change how ompi behaves
from v1.0. (unless I miss something, then please provide more
explanations).

--
Gleb.


Re: [OMPI devel] RDMA pipeline

2008-02-19 Thread George Bosilca
Actually, it restores the original behavior. The RDMA operations were  
pipelined before the r15247 commit, independent of the fact that they  
had mpool or not. We were actively using this behavior in the message  
logging framework to hide the cost of the local storage of the  
payload, and we were quite surprised when we realized that it  
disappeared.


If a BTL don't want to use pipeline for RDMA operations, it can set  
the RDMA fragment size to the max value, and this will automatically  
disable the pipeline. However, if the BTL support pipeline with the  
trunk version today it is not possible to activate it. Moreover, in  
the current version the parameters that define the BTL behavior are  
blatantly ignored, as the PML make high level assumption about what  
they want to do.


  Thanks,
george.


On Feb 19, 2008, at 3:03 PM, Gleb Natapov wrote:


On Tue, Feb 19, 2008 at 02:13:30PM -0500, George Bosilca wrote:

Few days ago during some testing I realize that the RDMA pipeline was
disabled for MX and Elan (I didn't check for the others). A quick  
look

into the source code, pinpointed the problem into the pml_ob1_rdma.c
file, and it seems that the problem was introduced by commit 15247.  
The
problem comes from the usage of the dummy registration, which is  
set for
all non mpool friendly BTL. Later on this is checked against NULL  
(and of

course it fails), which basically disable the RDMA pipeline.

Do you mean that mca_pml_ob1_send_request_start_rdma() is used for
rendezvous sends? I will be very surprised if ompi 1.2 works
differently. It assumes that if btl has no mpool then entire message  
buffer

is registered and no pipeline is needed. Trunk does the same but
differently. OpenIB also choose this route if buffer memory is  
allocated

by MPI_alloc_mem().



I'll enable the RDMA pipeline back in 2 days if I don't hear anything
back. Attached is the patch that fix this problem.


I am not sure why you need pipeline for BTLs that don't require
registration, but by applying this patch you'll change how ompi  
behaves

from v1.0. (unless I miss something, then please provide more
explanations).

--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature