Progress still continues on this issue... (Nathan and I are actually sitting
together in a room this week and are continuing to work on this)
We just put up a new FAQ item about this issue:
http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
On Jul 13, 2012, at 7:02 PM, Jeff Squyres wrote:
> On Jul 12, 2012, at 12:04 PM, Paul Kapinos wrote:
>
>> a long time ago, I reported about an error in Open MPI:
>> http://www.open-mpi.org/community/lists/users/2012/02/18565.php
>>
>> Well, in the 1.6 the behaviour has changed: the test case don't hang forever
>> and block an InfiniBand interface, but seem to run through, and now this
>> error message is printed:
>> --------------------------------------------------------------------------
>> The OpenFabrics (openib) BTL failed to register memory in the driver.
>> Please check /var/log/messages or dmesg for driver specific failure
>> reason.
>
> We updated our mechanism, but accidentally left this warning message in (it
> has since been removed).
>
> Here's what's happening: Mellanox changed the default amount of registered
> memory that is available -- they dramatically reduced it. We haven't gotten
> a good answer yet as to *why* this change was made.
>
> You can change some kernel-level parameters to increase it again, and then
> OMPI should work fine. Here's an IBM article about it:
>
> http://www.ibm.com/developerworks/wikis/display/hpccentral/Using+RDMA+with+pagepool+larger+than+8GB
>
> And here's some comments that Mellanox made on a ticket about this issue
> (including some corrections/clarifications to that IBM article):
>
> https://svn.open-mpi.org/trac/ompi/ticket/3134#comment:12
>
> -----
>
> Basically, what's happening is that OMPI is behaving badly when it runs out
> of registered memory. We have tried two things to make this better (i.e.,
> still perform *correctly*, albeit at a lower performance level), and we're
> not sure yet whether they work properly.
>
> 1. When OMPI tries to register more memory for an RDMA message transaction
> and fails, it falls back to send-receive (where we already have
> pre-registered memory available to use). However, this can still end up
> hanging because of OMPI's "lazy connection" scheme -- where OMPI doesn't open
> IB connections between MPI processes until the first time each pair of
> processes communicate. So if OMPI runs out of registered memory and then
> tries to open a new IB connection to a new peer -- kaboom.
>
> 2. When OMPI starts it, it guesstimates how much memory can be registered and
> equally divides it between all the OMPI processes *in that job* on the same
> node. We had mixed reports of this working or not. I made a 1.6.x tarball
> with this fix in it, if you could give it a whirl (with the default low
> registered memory kernel parameters, to ensure that you can invoke the "out
> of registered memory" issue):
>
> http://www.open-mpi.org/~jsquyres/unofficial/
> Use the openmpi-1.6.1ticket3131r26612M.tar.bz2 tarball
>
> #2 is the latest attempt to fix it, but we haven't had good testing of it.
> Could you give it a whirl and let us know what happens?
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/