SHORT VERSION
=============
OpenFabrics vendors (Sun, IBM, Mellanox, Voltaire): please try Roland
Dreier's "ummunot" kernel module with my OMPI Mercurial branch on your
systems (relevant URLs and instructions below). This is the
improvement to replace the not-bulletproof ptmalloc2 hooks for
mpi_leave_pinned behavior. A big change like this really requires
testing by everyone. Please let me know your testing results.
MORE DETAILS
============
Roland Dreier from Cisco sent his "ummunot" kernel module upstream to
the Linux kernel the other day; initial reviews have been favorable.
Here's the latest version of his module, incorporating a few early
reviews:
http://lkml.org/lkml/2009/7/24/308
It replaces the not-guarnateeable ptmalloc memory hooks with a
userspace notification system when MMU events occur down in the kernel
(basically: when memory is unmapped from a process). See Roland's
post for more details on his implementation.
It's passing all MPI tests that I can throw at it, so I think it's
time for others to try this stuff with Open MPI. I have a proof-of-
concept mercurial branch here (I am keeping it up with the SVN trunk):
http://bitbucket.org/jsquyres/ummunot/
I currently have the support implemented in a standalone OPAL memory
"ummunot" component. Further integration work is required before it
comes to the trunk, but it's good enough for testing and ensuring that
the concept actually works. Specifically, you must disable building
OMPI's ptmalloc2. Here's how I configure to build it:
./configure --enable-mca-no-build=memory-ptmalloc2 CPPFLAGS=-I/
path/to/ummunot.h ...
You should be able to see the "ummunot" component in the output of
ompi_info when done.
Then try running any MPI test that you can think of (ensure that
mpi_leave_pinned==1 to guarantee testing this stuff).
Please let me know your testing results. I'm assuming that Sun, IBM,
Mellanox, and Voltaire will be testing.
Thanks!
--
Jeff Squyres
jsquy...@cisco.com