We are running into a problem when running on one of our larger SMPs
using the latest Open MPI v1.2 branch.  We are trying to run a job
with np=128 within a single node.  We are seeing the following error:

"SM failed to send message due to shortage of shared memory."

We then increased the allowable maximum size of the shared segment to
2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
used the mca parameter to increase it as shown here.

-mca mpool_sm_max_size 2147483647

This allowed the program to run to completion.  Therefore, we would
like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
Does anyone have an objection to this change?  Soon we are going to
have larger CPU counts and would like to increase the odds that things
work "out of the box" on these large SMPs.

On a side note, I did a quick comparison of the shared memory needs of
the old Sun ClusterTools to Open MPI and came up with this table.

                                        Open MPI
np      Sun ClusterTools 6        current   suggested
-----------------------------------------------------------------
 2         20M                      128M        128M
 4         20M                      128M        128M
 8         22M                      256M        256M
16         27M                      512M        512M
32         48M                      512M          1G
64        133M                      512M        2G-1
128        476M                      512M        2G-1

Reply via email to