Maybe an clarification of the SM BTL implementation is needed. Does the SM BTL not set a limit based on np using the max allowable as a ceiling? If not and all jobs are allowed to use up to max allowable I see the reason for not wanting to raise the max allowable. That being said it seems to me that the memory usage of the SM BTL is a lot larger than it should be. Wasn't there some work done around June that looked why the SM BTL was allocating a lot of memory, anything come out of that?
--td

Markus Daene wrote:

Rolf,

I think it is not a good idea to increase the default value to 2G.  You
have to keep in mind that there are not so many people who have a machine with 128 and more cores on a single node. The average people
will have nodes with 2,4 maybe 8 cores and therefore it is not necessary
to set this parameter to such a high value. Eventually it allocates all
of this memory per node, and if you have only 4 or 8G per node it will
be inbalanced. For my 8core nodes I have even decreased the sm_max_size
to 32G and I had no problems with that. As far as I know (if not
otherwise specified during runtime) this parameter is global. So even if
you  run on your machine with 2 procs it might allocate the 2G for the
MPI smp module.
I would recommend like Richard suggests to set the parameter for your
machine in
etc/openmpi-mca-params.conf
and not to change the default value.

Markus


Rolf vandeVaart wrote:
We are running into a problem when running on one of our larger SMPs
using the latest Open MPI v1.2 branch.  We are trying to run a job
with np=128 within a single node.  We are seeing the following error:

"SM failed to send message due to shortage of shared memory."

We then increased the allowable maximum size of the shared segment to
2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
used the mca parameter to increase it as shown here.

-mca mpool_sm_max_size 2147483647

This allowed the program to run to completion.  Therefore, we would
like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
Does anyone have an objection to this change?  Soon we are going to
have larger CPU counts and would like to increase the odds that things
work "out of the box" on these large SMPs.

On a side note, I did a quick comparison of the shared memory needs of
the old Sun ClusterTools to Open MPI and came up with this table.

                                        Open MPI
np      Sun ClusterTools 6        current   suggested
-----------------------------------------------------------------
 2         20M                      128M        128M
 4         20M                      128M        128M
 8         22M                      256M        256M
16         27M                      512M        512M
32         48M                      512M          1G
64        133M                      512M        2G-1
128        476M                      512M        2G-1

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to