Ralph Castain wrote:
I too am interested - I think we need to do something about the sm
backing file situation as larger core machines are slated to become
more prevalent shortly.
I think there is at least one piece of low-flying fruit: get rid of a
lot of the page alignments. Especially as one goes to large core
counts, the O(n^2) number of local "connections" becomes important, and
each connection starts with three page-aligned allocations, each
allocation very tiny (and hence uses only a tiny portion of the page+
that is allocated to it). So, most of the allocated memory is never used.
Personally, I question the rationale for the page alignment in the first
place, but don't mind listening to anyone who wants to explain it to
me. Presumably, in a NUMA machine, localizing FIFOs to separate
physical memory improves performance. I get that basic premise. I just
question the reasoning beyond that.
The page alignment appears in ompi_fifo_init and ompi_cb_fifo_init. It
comes additionally from mca_mpool_sm_alloc. Four minor changes could
change alignment from page to cacheline size.
what happens when there isn't enough memory to support all this? Are
we smart enough to detect this situation? Does the sm subsystem
quietly shut down? Warn and shut down? Segfault?
I'm not exactly sure. I think it's a combination of three things:
*) some attempt to signal problems correctly
*) some degree just to live with less shared memory (possibly leading to
performance degradation)
*) poorly tested in any case
I have two examples so far:
1. using a ramdisk, /tmp was set to 10MB. OMPI was run on a single
node, 2ppn, with btl=openib,sm,self. The program started, but
segfaulted on the first MPI_Send. No warnings were printed.
2. again with a ramdisk, /tmp was reportedly set to 16MB (unverified
- some uncertainty, could be have been much larger). OMPI was run on
multiple nodes, 16ppn, with btl=openib,sm,self. The program ran to
completion without errors or warning. I don't know the communication
pattern - could be no local comm was performed, though that sounds
doubtful.