SHORT VERSION
=============

OpenFabrics vendors (Sun, IBM, Mellanox, Voltaire): please try Roland Dreier's "ummunot" kernel module with my OMPI Mercurial branch on your systems (relevant URLs and instructions below). This is the improvement to replace the not-bulletproof ptmalloc2 hooks for mpi_leave_pinned behavior. A big change like this really requires testing by everyone. Please let me know your testing results.

MORE DETAILS
============

Roland Dreier from Cisco sent his "ummunot" kernel module upstream to the Linux kernel the other day; initial reviews have been favorable. Here's the latest version of his module, incorporating a few early reviews:

    http://lkml.org/lkml/2009/7/24/308

It replaces the not-guarnateeable ptmalloc memory hooks with a userspace notification system when MMU events occur down in the kernel (basically: when memory is unmapped from a process). See Roland's post for more details on his implementation.

It's passing all MPI tests that I can throw at it, so I think it's time for others to try this stuff with Open MPI. I have a proof-of- concept mercurial branch here (I am keeping it up with the SVN trunk):

    http://bitbucket.org/jsquyres/ummunot/

I currently have the support implemented in a standalone OPAL memory "ummunot" component. Further integration work is required before it comes to the trunk, but it's good enough for testing and ensuring that the concept actually works. Specifically, you must disable building OMPI's ptmalloc2. Here's how I configure to build it:

./configure --enable-mca-no-build=memory-ptmalloc2 CPPFLAGS=-I/ path/to/ummunot.h ...

You should be able to see the "ummunot" component in the output of ompi_info when done.

Then try running any MPI test that you can think of (ensure that mpi_leave_pinned==1 to guarantee testing this stuff).

Please let me know your testing results. I'm assuming that Sun, IBM, Mellanox, and Voltaire will be testing.

Thanks!

--
Jeff Squyres
jsquy...@cisco.com

Reply via email to