Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-25 Thread Ralph Castain
Perhaps a telecon (myself, Jeff S, and you) would be best at this  
stage. I confess I'm now confused too - what you describe is precisely  
what we already do.


Let me know when you are available and we'll try to work out a time -  
might as well do that off list so we don't bang everyone's inbox.



On Jul 24, 2009, at 7:57 AM, Chris Samuel wrote:


Hi Ralph,

- "Ralph Castain"  wrote:


UmmmI'll let you guys work this out on PLPA. However, just to
clarify, OMPI currently binds to cores, not logical cpus. It is the
PLPA that is "dumb" and provides the plumbing to do what OMPI tells
it.

:-)


Ahh, if that's the case then this isn't something that PLPA
needs to be involved in then!  I'm getting confused here,
haven't had a coffee since Monday, must be the reason. :-)

What we really want is for the usual execution of:

mpiexec /path/to/my/code $arguments

to launch via the PBS TM API, work out what cores are
available to it on the nodes assigned to it and then
bind appropriately to them.  Easy, see.. ;-)

cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support

2009-07-25 Thread Chris Samuel

- "Ralph Castain"  wrote:

> Perhaps a telecon (myself, Jeff S, and you) would be best at this  
> stage.

Sounds good, will take that part to private email.

> I confess I'm now confused too - what you describe is precisely
> what we already do.

I added printf()'s to the PLPA init(),
PLPA_NAME(sched_getaffinity)() and
PLPA_NAME(sched_setaffinity)() functions to see where
they are getting called to try and clarify what's up.

I do see init() and PLPA_NAME(sched_getaffinity)() getting
called, but never PLPA_NAME(sched_getaffinity)().

This is on my home box (quad core) not our production
clusters, but the basics should be the same..

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


[OMPI devel] ptmalloc replacement

2009-07-25 Thread Jeff Squyres

SHORT VERSION
=

OpenFabrics vendors (Sun, IBM, Mellanox, Voltaire): please try Roland  
Dreier's "ummunot" kernel module with my OMPI Mercurial branch on your  
systems (relevant URLs and instructions below).  This is the  
improvement to replace the not-bulletproof ptmalloc2 hooks for  
mpi_leave_pinned behavior.  A big change like this really requires  
testing by everyone.  Please let me know your testing results.


MORE DETAILS


Roland Dreier from Cisco sent his "ummunot" kernel module upstream to  
the Linux kernel the other day; initial reviews have been favorable.   
Here's the latest version of his module, incorporating a few early  
reviews:


http://lkml.org/lkml/2009/7/24/308

It replaces the not-guarnateeable ptmalloc memory hooks with a  
userspace notification system when MMU events occur down in the kernel  
(basically: when memory is unmapped from a process).  See Roland's  
post for more details on his implementation.


It's passing all MPI tests that I can throw at it, so I think it's  
time for others to try this stuff with Open MPI.  I have a proof-of- 
concept mercurial branch here (I am keeping it up with the SVN trunk):


http://bitbucket.org/jsquyres/ummunot/

I currently have the support implemented in a standalone OPAL memory  
"ummunot" component.  Further integration work is required before it  
comes to the trunk, but it's good enough for testing and ensuring that  
the concept actually works.  Specifically, you must disable building  
OMPI's ptmalloc2.  Here's how I configure to build it:


./configure --enable-mca-no-build=memory-ptmalloc2 CPPFLAGS=-I/ 
path/to/ummunot.h ...


You should be able to see the "ummunot" component in the output of  
ompi_info when done.


Then try running any MPI test that you can think of (ensure that  
mpi_leave_pinned==1 to guarantee testing this stuff).


Please let me know your testing results.  I'm assuming that Sun, IBM,  
Mellanox, and Voltaire will be testing.


Thanks!

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Shared library versioning

2009-07-25 Thread Iain Bason


On Jul 23, 2009, at 5:53 PM, Jeff Squyres wrote:

We have talked many times about doing proper versioning for  
OMPI's .so libraries (e.g., libmpi.so -- *not* our component DSOs).


Forgive me if this has been hashed out, but won't you run into trouble  
by not versioning the components?  What happens when there are  
multiple versions of libmpi installed?  The user program will pick up  
the correct one because of versioning, but how will libmpi pick up the  
correct versions of the components?


Iain



Re: [OMPI devel] Shared library versioning

2009-07-25 Thread Jeff Squyres

On Jul 25, 2009, at 12:59 PM, Iain Bason wrote:


> We have talked many times about doing proper versioning for
> OMPI's .so libraries (e.g., libmpi.so -- *not* our component DSOs).

Forgive me if this has been hashed out, but won't you run into trouble
by not versioning the components?



This is a good question; it has been discussed a few times, but it's  
good to get it here on the web archives.  More below.



What happens when there are
multiple versions of libmpi installed?  The user program will pick up
the correct one because of versioning, but how will libmpi pick up the
correct versions of the components?




Even with this shared library versioning, you will still only really  
install one OMPI per directory tree anyway.  The library versioning  
won't allow you to install N different versions of OMPI in a single  
tree because of multiple things:


- support files are not versioned (e.g., show_help text files)
- include files are not versioned (e.g., mpi.h)
- OMPI's DSOs actually are versioned, but more work would be needed in  
this area to make that versioning scheme work in real world scenarios

- ...and probably some other things that I'm not thinking of...

We probably could solve all of these problems if we wanted to (and  
therefore make it safe to install multiple OMPI's in a single  
directory tree), but there hasn't been much demand for it.  The  
rationale for library versioning is:


- We're weird (and lying) for always using 0:0:0 in different releases
- The rest of the world does shared library versioning
- I've gotten pushback from Red Hat, Suse, and Debian
- It'll prevent at least some cases of MPI apps accidentally using an  
incompatible libmpi


--
Jeff Squyres
jsquy...@cisco.com