Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support
Perhaps a telecon (myself, Jeff S, and you) would be best at this stage. I confess I'm now confused too - what you describe is precisely what we already do. Let me know when you are available and we'll try to work out a time - might as well do that off list so we don't bang everyone's inbox. On Jul 24, 2009, at 7:57 AM, Chris Samuel wrote: Hi Ralph, - "Ralph Castain" wrote: UmmmI'll let you guys work this out on PLPA. However, just to clarify, OMPI currently binds to cores, not logical cpus. It is the PLPA that is "dumb" and provides the plumbing to do what OMPI tells it. :-) Ahh, if that's the case then this isn't something that PLPA needs to be involved in then! I'm getting confused here, haven't had a coffee since Monday, must be the reason. :-) What we really want is for the usual execution of: mpiexec /path/to/my/code $arguments to launch via the PBS TM API, work out what cores are available to it on the nodes assigned to it and then bind appropriately to them. Easy, see.. ;-) cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] OpenMPI, PLPA and Linux cpuset/cgroup support
- "Ralph Castain" wrote: > Perhaps a telecon (myself, Jeff S, and you) would be best at this > stage. Sounds good, will take that part to private email. > I confess I'm now confused too - what you describe is precisely > what we already do. I added printf()'s to the PLPA init(), PLPA_NAME(sched_getaffinity)() and PLPA_NAME(sched_setaffinity)() functions to see where they are getting called to try and clarify what's up. I do see init() and PLPA_NAME(sched_getaffinity)() getting called, but never PLPA_NAME(sched_getaffinity)(). This is on my home box (quad core) not our production clusters, but the basics should be the same.. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
[OMPI devel] ptmalloc replacement
SHORT VERSION = OpenFabrics vendors (Sun, IBM, Mellanox, Voltaire): please try Roland Dreier's "ummunot" kernel module with my OMPI Mercurial branch on your systems (relevant URLs and instructions below). This is the improvement to replace the not-bulletproof ptmalloc2 hooks for mpi_leave_pinned behavior. A big change like this really requires testing by everyone. Please let me know your testing results. MORE DETAILS Roland Dreier from Cisco sent his "ummunot" kernel module upstream to the Linux kernel the other day; initial reviews have been favorable. Here's the latest version of his module, incorporating a few early reviews: http://lkml.org/lkml/2009/7/24/308 It replaces the not-guarnateeable ptmalloc memory hooks with a userspace notification system when MMU events occur down in the kernel (basically: when memory is unmapped from a process). See Roland's post for more details on his implementation. It's passing all MPI tests that I can throw at it, so I think it's time for others to try this stuff with Open MPI. I have a proof-of- concept mercurial branch here (I am keeping it up with the SVN trunk): http://bitbucket.org/jsquyres/ummunot/ I currently have the support implemented in a standalone OPAL memory "ummunot" component. Further integration work is required before it comes to the trunk, but it's good enough for testing and ensuring that the concept actually works. Specifically, you must disable building OMPI's ptmalloc2. Here's how I configure to build it: ./configure --enable-mca-no-build=memory-ptmalloc2 CPPFLAGS=-I/ path/to/ummunot.h ... You should be able to see the "ummunot" component in the output of ompi_info when done. Then try running any MPI test that you can think of (ensure that mpi_leave_pinned==1 to guarantee testing this stuff). Please let me know your testing results. I'm assuming that Sun, IBM, Mellanox, and Voltaire will be testing. Thanks! -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] Shared library versioning
On Jul 23, 2009, at 5:53 PM, Jeff Squyres wrote: We have talked many times about doing proper versioning for OMPI's .so libraries (e.g., libmpi.so -- *not* our component DSOs). Forgive me if this has been hashed out, but won't you run into trouble by not versioning the components? What happens when there are multiple versions of libmpi installed? The user program will pick up the correct one because of versioning, but how will libmpi pick up the correct versions of the components? Iain
Re: [OMPI devel] Shared library versioning
On Jul 25, 2009, at 12:59 PM, Iain Bason wrote: > We have talked many times about doing proper versioning for > OMPI's .so libraries (e.g., libmpi.so -- *not* our component DSOs). Forgive me if this has been hashed out, but won't you run into trouble by not versioning the components? This is a good question; it has been discussed a few times, but it's good to get it here on the web archives. More below. What happens when there are multiple versions of libmpi installed? The user program will pick up the correct one because of versioning, but how will libmpi pick up the correct versions of the components? Even with this shared library versioning, you will still only really install one OMPI per directory tree anyway. The library versioning won't allow you to install N different versions of OMPI in a single tree because of multiple things: - support files are not versioned (e.g., show_help text files) - include files are not versioned (e.g., mpi.h) - OMPI's DSOs actually are versioned, but more work would be needed in this area to make that versioning scheme work in real world scenarios - ...and probably some other things that I'm not thinking of... We probably could solve all of these problems if we wanted to (and therefore make it safe to install multiple OMPI's in a single directory tree), but there hasn't been much demand for it. The rationale for library versioning is: - We're weird (and lying) for always using 0:0:0 in different releases - The rest of the world does shared library versioning - I've gotten pushback from Red Hat, Suse, and Debian - It'll prevent at least some cases of MPI apps accidentally using an incompatible libmpi -- Jeff Squyres jsquy...@cisco.com