Re: [OMPI devel] PLPA ready?
Jeff, The new PLPA fails in compilation. there is a need to change the paffinity API's: 1. max_processor_id with one parameter --> get_processor_info with 2 parameters. 2. max_socket with one parameter --> get_socket_info with 2 parameters. 3. max_core with 2 parameters --> get_core_info with 3 parameters. I changed these API's internally in my copy of the trunk and tested the new PLPA. it works properly. Do you have an idea how to integrate the new PLPA with the new API's ? Sharon. On Feb 19, 2008 4:31 AM, Jeff Squyres wrote: > Sharon/Lenny -- > > Could you try out the newest PLPA RC for me? I think it's ready. I > just posted rc4 to the web site (I posted that rc3 was available, and > then found a small bug that necessitated rc4): > http://www.open-mpi.org/software/plpa/v1.1/ > > You should be able to do this to test it within an OMPI SVN checkout: > > cd opal/mca/paffinity/linux > mv plpa bogus > tar zxf plpa-1.1rc4.tar.gz > ln -s plpa-1.1rc4 plpa > cd ../../../.. > ./autogen && ./configure .. && make -j 4 .. > > Let me know if it works for you properly (configure, build, and > function). If so, I think it's ready for release. I'll then do the > SVN magic to bring it to the OMPI trunk. > > Thanks. > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] PLPA ready?
$%@#$% Sorry. I saw that and fixed it in my local OMPI SVN copy last night as well. Here's a patch to make it go (I obviously didn't want to commit this until the new PLPA goes in). We *may* want to revise the paffinity API to match PLPA, not because Linux is the one-and-only-way, but because we actually took some effort in PLPA to make a fairly neutral API. On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote: Jeff, The new PLPA fails in compilation. there is a need to change the paffinity API's: 1. max_processor_id with one parameter --> get_processor_info with 2 parameters. 2. max_socket with one parameter --> get_socket_info with 2 parameters. 3. max_core with 2 parameters --> get_core_info with 3 parameters. I changed these API's internally in my copy of the trunk and tested the new PLPA. it works properly. Do you have an idea how to integrate the new PLPA with the new API's ? Sharon. On Feb 19, 2008 4:31 AM, Jeff Squyres wrote: Sharon/Lenny -- Could you try out the newest PLPA RC for me? I think it's ready. I just posted rc4 to the web site (I posted that rc3 was available, and then found a small bug that necessitated rc4): http://www.open-mpi.org/software/plpa/v1.1/ You should be able to do this to test it within an OMPI SVN checkout: cd opal/mca/paffinity/linux mv plpa bogus tar zxf plpa-1.1rc4.tar.gz ln -s plpa-1.1rc4 plpa cd ../../../.. ./autogen && ./configure .. && make -j 4 .. Let me know if it works for you properly (configure, build, and function). If so, I think it's ready for release. I'll then do the SVN magic to bring it to the OMPI trunk. Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems linux-paffinity.patch Description: Binary data
Re: [OMPI devel] PLPA ready?
Jeff Squyres wrote: $%@#$% Sorry. I saw that and fixed it in my local OMPI SVN copy last night as well. Here's a patch to make it go (I obviously didn't want to commit this until the new PLPA goes in). We *may* want to revise the paffinity API to match PLPA, not because Linux is the one-and-only-way, but because we actually took some effort in PLPA to make a fairly neutral API. Jeff can you work with Pak to make sure this doesn't completely mess up Solaris' processor affinity methods in OMPI. --td On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote: Jeff, The new PLPA fails in compilation. there is a need to change the paffinity API's: 1. max_processor_id with one parameter --> get_processor_info with 2 parameters. 2. max_socket with one parameter --> get_socket_info with 2 parameters. 3. max_core with 2 parameters --> get_core_info with 3 parameters. I changed these API's internally in my copy of the trunk and tested the new PLPA. it works properly. Do you have an idea how to integrate the new PLPA with the new API's ? Sharon. On Feb 19, 2008 4:31 AM, Jeff Squyres wrote: Sharon/Lenny -- Could you try out the newest PLPA RC for me? I think it's ready. I just posted rc4 to the web site (I posted that rc3 was available, and then found a small bug that necessitated rc4): http://www.open-mpi.org/software/plpa/v1.1/ You should be able to do this to test it within an OMPI SVN checkout: cd opal/mca/paffinity/linux mv plpa bogus tar zxf plpa-1.1rc4.tar.gz ln -s plpa-1.1rc4 plpa cd ../../../.. ./autogen && ./configure .. && make -j 4 .. Let me know if it works for you properly (configure, build, and function). If so, I think it's ready for release. I'll then do the SVN magic to bring it to the OMPI trunk. Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] PLPA ready?
Will do. I stress that it *might* be worthwhile -- I think it at least partially depends on what Voltaire does and whether they think it should change (since they're the first ones using the paffinity API in a meaningful way). If we want to change it, it would probably be good to do so before 1.3 so that the interface can be [at least pseudo-]stable for the 1.3.x series. Just my $0.02... On Feb 19, 2008, at 11:47 AM, Terry Dontje wrote: Jeff Squyres wrote: $%@#$% Sorry. I saw that and fixed it in my local OMPI SVN copy last night as well. Here's a patch to make it go (I obviously didn't want to commit this until the new PLPA goes in). We *may* want to revise the paffinity API to match PLPA, not because Linux is the one-and-only-way, but because we actually took some effort in PLPA to make a fairly neutral API. Jeff can you work with Pak to make sure this doesn't completely mess up Solaris' processor affinity methods in OMPI. --td On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote: Jeff, The new PLPA fails in compilation. there is a need to change the paffinity API's: 1. max_processor_id with one parameter --> get_processor_info with 2 parameters. 2. max_socket with one parameter --> get_socket_info with 2 parameters. 3. max_core with 2 parameters --> get_core_info with 3 parameters. I changed these API's internally in my copy of the trunk and tested the new PLPA. it works properly. Do you have an idea how to integrate the new PLPA with the new API's ? Sharon. On Feb 19, 2008 4:31 AM, Jeff Squyres wrote: Sharon/Lenny -- Could you try out the newest PLPA RC for me? I think it's ready. I just posted rc4 to the web site (I posted that rc3 was available, and then found a small bug that necessitated rc4): http://www.open-mpi.org/software/plpa/v1.1/ You should be able to do this to test it within an OMPI SVN checkout: cd opal/mca/paffinity/linux mv plpa bogus tar zxf plpa-1.1rc4.tar.gz ln -s plpa-1.1rc4 plpa cd ../../../.. ./autogen && ./configure .. && make -j 4 .. Let me know if it works for you properly (configure, build, and function). If so, I think it's ready for release. I'll then do the SVN magic to bring it to the OMPI trunk. Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] PLPA ready?
I am guessing it will not messing us up because these are the functions that Solaris doesn't really implement yet, right? Last time I check we are still hunting for some stable interfaces in Solaris to implement them. Terry Dontje wrote: Jeff Squyres wrote: $%@#$% Sorry. I saw that and fixed it in my local OMPI SVN copy last night as well. Here's a patch to make it go (I obviously didn't want to commit this until the new PLPA goes in). We *may* want to revise the paffinity API to match PLPA, not because Linux is the one-and-only-way, but because we actually took some effort in PLPA to make a fairly neutral API. Jeff can you work with Pak to make sure this doesn't completely mess up Solaris' processor affinity methods in OMPI. --td On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote: Jeff, The new PLPA fails in compilation. there is a need to change the paffinity API's: 1. max_processor_id with one parameter --> get_processor_info with 2 parameters. 2. max_socket with one parameter --> get_socket_info with 2 parameters. 3. max_core with 2 parameters --> get_core_info with 3 parameters. I changed these API's internally in my copy of the trunk and tested the new PLPA. it works properly. Do you have an idea how to integrate the new PLPA with the new API's ? Sharon. On Feb 19, 2008 4:31 AM, Jeff Squyres wrote: Sharon/Lenny -- Could you try out the newest PLPA RC for me? I think it's ready. I just posted rc4 to the web site (I posted that rc3 was available, and then found a small bug that necessitated rc4): http://www.open-mpi.org/software/plpa/v1.1/ You should be able to do this to test it within an OMPI SVN checkout: cd opal/mca/paffinity/linux mv plpa bogus tar zxf plpa-1.1rc4.tar.gz ln -s plpa-1.1rc4 plpa cd ../../../.. ./autogen && ./configure .. && make -j 4 .. Let me know if it works for you properly (configure, build, and function). If so, I think it's ready for release. I'll then do the SVN magic to bring it to the OMPI trunk. Thanks. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- - Pak Lui pak@sun.com
Re: [OMPI devel] PLPA ready?
Jeff, In the patch you sent the variables: num_processors, num_sockets and num_cores are lost outside the paffinity framework. I need those in the ODLS framework. what do think about the attached patch? Sharon. 2008/2/19 Jeff Squyres : > $%@#$% Sorry. > > I saw that and fixed it in my local OMPI SVN copy last night as well. > Here's a patch to make it go (I obviously didn't want to commit this > until the new PLPA goes in). We *may* want to revise the paffinity > API to match PLPA, not because Linux is the one-and-only-way, but > because we actually took some effort in PLPA to make a fairly neutral > API. > > > > On Feb 19, 2008, at 8:59 AM, Sharon Melamed wrote: > > > Jeff, > > > > The new PLPA fails in compilation. there is a need to change the > > paffinity API's: > > 1. max_processor_id with one parameter --> get_processor_info with 2 > > parameters. > > 2. max_socket with one parameter --> get_socket_info with 2 > > parameters. > > 3. max_core with 2 parameters --> get_core_info with 3 parameters. > > > > I changed these API's internally in my copy of the trunk and tested > > the new PLPA. > > it works properly. > > > > Do you have an idea how to integrate the new PLPA with the new API's ? > > > > Sharon. > > > > > > > > On Feb 19, 2008 4:31 AM, Jeff Squyres wrote: > >> Sharon/Lenny -- > >> > >> Could you try out the newest PLPA RC for me? I think it's ready. I > >> just posted rc4 to the web site (I posted that rc3 was available, and > >> then found a small bug that necessitated rc4): > >> http://www.open-mpi.org/software/plpa/v1.1/ > >> > >> You should be able to do this to test it within an OMPI SVN checkout: > >> > >> cd opal/mca/paffinity/linux > >> mv plpa bogus > >> tar zxf plpa-1.1rc4.tar.gz > >> ln -s plpa-1.1rc4 plpa > >> cd ../../../.. > >> ./autogen && ./configure .. && make -j 4 .. > >> > >> Let me know if it works for you properly (configure, build, and > >> function). If so, I think it's ready for release. I'll then do the > >> SVN magic to bring it to the OMPI trunk. > >> > >> Thanks. > >> > >> -- > >> Jeff Squyres > >> Cisco Systems > >> > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > Index: opal/mca/paffinity/linux/paffinity_linux_module.c === --- opal/mca/paffinity/linux/paffinity_linux_module.c (revision 17442) +++ opal/mca/paffinity/linux/paffinity_linux_module.c (working copy) @@ -45,9 +45,9 @@ static int linux_module_get(opal_paffinity_base_cpu_set_t *cpumask); static int linux_module_map_to_processor_id(int socket, int core, int *processor_id); static int linux_module_map_to_socket_core(int processor_id, int *socket, int *core); -static int linux_module_max_processor_id(int *max_processor_id); -static int linux_module_max_socket(int *max_socket); -static int linux_module_max_core(int socket, int *max_core); +static int linux_module_get_processor_info(int *num_processors, int *max_processor_id); +static int linux_module_get_socket_info(int *num_sockets, int *max_socket_num); +static int linux_module_get_core_info(int socket, int *num_cores, int *max_core_num); /* * Linux paffinity module @@ -64,9 +64,9 @@ linux_module_get, linux_module_map_to_processor_id, linux_module_map_to_socket_core, -linux_module_max_processor_id, -linux_module_max_socket, -linux_module_max_core, +linux_module_get_processor_info, +linux_module_get_socket_info, +linux_module_get_core_info, NULL }; @@ -168,18 +168,18 @@ return opal_paffinity_linux_plpa_map_to_socket_core(processor_id, socket, core); } -static int linux_module_max_processor_id(int *max_processor_id) +static int linux_module_get_processor_info(int *num_processors, int *max_processor_id) { - return opal_paffinity_linux_plpa_max_processor_id(max_processor_id); + return opal_paffinity_linux_plpa_get_processor_info(num_processors, max_processor_id); } -static int linux_module_max_socket(int *max_socket) +static int linux_module_get_socket_info(int *num_sockets, int *max_socket_num) { - return opal_paffinity_linux_plpa_max_socket(max_socket); + return opal_paffinity_linux_plpa_get_socket_info(num_sockets, max_socket_num); } -static int linux_module_max_core(int socket, int *max_core) +static int linux_module_get_core_info(int socket, int *num_cores, int *max_core_num) { - return opal_paffinity_linux_plpa_max_core(socket, max_core); + return opal_paffinity_linux_plpa_get_core_info(socket, num_cores, max_core_num); } Index: opal/mca/
[OMPI devel] RDMA pipeline
Few days ago during some testing I realize that the RDMA pipeline was disabled for MX and Elan (I didn't check for the others). A quick look into the source code, pinpointed the problem into the pml_ob1_rdma.c file, and it seems that the problem was introduced by commit 15247. The problem comes from the usage of the dummy registration, which is set for all non mpool friendly BTL. Later on this is checked against NULL (and of course it fails), which basically disable the RDMA pipeline. I'll enable the RDMA pipeline back in 2 days if I don't hear anything back. Attached is the patch that fix this problem. Thanks, george. pipeline_rdma.patch Description: Binary data smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] RDMA pipeline
On Tue, Feb 19, 2008 at 02:13:30PM -0500, George Bosilca wrote: > Few days ago during some testing I realize that the RDMA pipeline was > disabled for MX and Elan (I didn't check for the others). A quick look > into the source code, pinpointed the problem into the pml_ob1_rdma.c > file, and it seems that the problem was introduced by commit 15247. The > problem comes from the usage of the dummy registration, which is set for > all non mpool friendly BTL. Later on this is checked against NULL (and of > course it fails), which basically disable the RDMA pipeline. Do you mean that mca_pml_ob1_send_request_start_rdma() is used for rendezvous sends? I will be very surprised if ompi 1.2 works differently. It assumes that if btl has no mpool then entire message buffer is registered and no pipeline is needed. Trunk does the same but differently. OpenIB also choose this route if buffer memory is allocated by MPI_alloc_mem(). > > I'll enable the RDMA pipeline back in 2 days if I don't hear anything > back. Attached is the patch that fix this problem. > I am not sure why you need pipeline for BTLs that don't require registration, but by applying this patch you'll change how ompi behaves from v1.0. (unless I miss something, then please provide more explanations). -- Gleb.
Re: [OMPI devel] RDMA pipeline
Actually, it restores the original behavior. The RDMA operations were pipelined before the r15247 commit, independent of the fact that they had mpool or not. We were actively using this behavior in the message logging framework to hide the cost of the local storage of the payload, and we were quite surprised when we realized that it disappeared. If a BTL don't want to use pipeline for RDMA operations, it can set the RDMA fragment size to the max value, and this will automatically disable the pipeline. However, if the BTL support pipeline with the trunk version today it is not possible to activate it. Moreover, in the current version the parameters that define the BTL behavior are blatantly ignored, as the PML make high level assumption about what they want to do. Thanks, george. On Feb 19, 2008, at 3:03 PM, Gleb Natapov wrote: On Tue, Feb 19, 2008 at 02:13:30PM -0500, George Bosilca wrote: Few days ago during some testing I realize that the RDMA pipeline was disabled for MX and Elan (I didn't check for the others). A quick look into the source code, pinpointed the problem into the pml_ob1_rdma.c file, and it seems that the problem was introduced by commit 15247. The problem comes from the usage of the dummy registration, which is set for all non mpool friendly BTL. Later on this is checked against NULL (and of course it fails), which basically disable the RDMA pipeline. Do you mean that mca_pml_ob1_send_request_start_rdma() is used for rendezvous sends? I will be very surprised if ompi 1.2 works differently. It assumes that if btl has no mpool then entire message buffer is registered and no pipeline is needed. Trunk does the same but differently. OpenIB also choose this route if buffer memory is allocated by MPI_alloc_mem(). I'll enable the RDMA pipeline back in 2 days if I don't hear anything back. Attached is the patch that fix this problem. I am not sure why you need pipeline for BTLs that don't require registration, but by applying this patch you'll change how ompi behaves from v1.0. (unless I miss something, then please provide more explanations). -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature