Re: [OMPI devel] SM init failures

Eugene Loh Mon, 30 Mar 2009 13:56:00 -0400

George Bosilca wrote:

Then it looks like the safest solution is the use either ftruncate orthe lseek method and then touch the first byte of all memory pages.Unfortunately, I see two problems with this. First, there is a clearperformance hit on the startup time. And second, we will have to finda pretty smart way to do this or we will completely break the memoryaffinity stuff.


We're basically touching all the pages on start-up anyhow.

Let me explain.

The sm BTL needs to set up a shared/mmap file to accommodate what'sneeded at MPI_Init time and how much space you'll want for growingduring the course of the run. We used to size this file "arbitrarily"(mpool_sm_per_peer_size and mpool_sm_[min|max]_size), which allocatedshared memory excessively for small jobs but insufficiently (won't startup) for big jobs. As part of moving to the single-queue model, I triedto size the shared memory more reasonably -- at a minimu, so that jobswould start up. The current formula is to estimate how much memory willbe needed at MPI_Init time and set the file for that size. We can argueabout whether or not headroom should be included, but currently (1.3.2)none is really provided.

So, the shared area is basically filled up during MPI_Init(). For largenp, most of that space is eager fragments. An eager fragment in theshared area includes a pointer back to the free list that manages thatfragment. Those pointers have to be initialized. Since eager fragmentsby default are 4K, it turns out that basically every page is touchedduring MPI_Init(). (Fine print: not true of the max fragments, butthere aren't very many of those.)

Re: [OMPI devel] SM init failures

Reply via email to