Progress still continues on this issue... (Nathan and I are actually sitting 
together in a room this week and are continuing to work on this)

We just put up a new FAQ item about this issue:

     http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem


On Jul 13, 2012, at 7:02 PM, Jeff Squyres wrote:

> On Jul 12, 2012, at 12:04 PM, Paul Kapinos wrote:
> 
>> a long time ago, I reported about an error in Open MPI:
>> http://www.open-mpi.org/community/lists/users/2012/02/18565.php
>> 
>> Well, in the 1.6 the behaviour has changed: the test case don't hang forever 
>> and block an InfiniBand interface, but seem to run through, and now this 
>> error message is printed:
>> --------------------------------------------------------------------------
>> The OpenFabrics (openib) BTL failed to register memory in the driver.
>> Please check /var/log/messages or dmesg for driver specific failure
>> reason.
> 
> We updated our mechanism, but accidentally left this warning message in (it 
> has since been removed).
> 
> Here's what's happening: Mellanox changed the default amount of registered 
> memory that is available -- they dramatically reduced it.  We haven't gotten 
> a good answer yet as to *why* this change was made.
> 
> You can change some kernel-level parameters to increase it again, and then 
> OMPI should work fine.  Here's an IBM article about it:
> 
> http://www.ibm.com/developerworks/wikis/display/hpccentral/Using+RDMA+with+pagepool+larger+than+8GB
> 
> And here's some comments that Mellanox made on a ticket about this issue 
> (including some corrections/clarifications to that IBM article):
> 
>    https://svn.open-mpi.org/trac/ompi/ticket/3134#comment:12
> 
> -----
> 
> Basically, what's happening is that OMPI is behaving badly when it runs out 
> of registered memory.  We have tried two things to make this better (i.e., 
> still perform *correctly*, albeit at a lower performance level), and we're 
> not sure yet whether they work properly.
> 
> 1. When OMPI tries to register more memory for an RDMA message transaction 
> and fails, it falls back to send-receive (where we already have 
> pre-registered memory available to use).  However, this can still end up 
> hanging because of OMPI's "lazy connection" scheme -- where OMPI doesn't open 
> IB connections between MPI processes until the first time each pair of 
> processes communicate.  So if OMPI runs out of registered memory and then 
> tries to open a new IB connection to a new peer -- kaboom.
> 
> 2. When OMPI starts it, it guesstimates how much memory can be registered and 
> equally divides it between all the OMPI processes *in that job* on the same 
> node.  We had mixed reports of this working or not.  I made a 1.6.x tarball 
> with this fix in it, if you could give it a whirl (with the default low 
> registered memory kernel parameters, to ensure that you can invoke the "out 
> of registered memory" issue):
> 
>    http://www.open-mpi.org/~jsquyres/unofficial/
>    Use the openmpi-1.6.1ticket3131r26612M.tar.bz2 tarball
> 
> #2 is the latest attempt to fix it, but we haven't had good testing of it.  
> Could you give it a whirl and let us know what happens?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to