Rolf,

I think it is not a good idea to increase the default value to 2G.  You
have to keep in mind that there are not so many people who have  a 
machine with 128 and more cores on a single node. The average people
will have nodes with 2,4 maybe 8 cores and therefore it is not necessary
to set this parameter to such a high value. Eventually it allocates all
of this memory per node, and if you have only 4 or 8G per node it will
be inbalanced. For my 8core nodes I have even decreased the sm_max_size
to 32G and I had no problems with that. As far as I know (if not
otherwise specified during runtime) this parameter is global. So even if
you  run on your machine with 2 procs it might allocate the 2G for the
MPI smp module.
I would recommend like Richard suggests to set the parameter for your
machine in
etc/openmpi-mca-params.conf
and not to change the default value.

Markus


Rolf vandeVaart wrote:
> We are running into a problem when running on one of our larger SMPs
> using the latest Open MPI v1.2 branch.  We are trying to run a job
> with np=128 within a single node.  We are seeing the following error:
>
> "SM failed to send message due to shortage of shared memory."
>
> We then increased the allowable maximum size of the shared segment to
> 2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
> used the mca parameter to increase it as shown here.
>
> -mca mpool_sm_max_size 2147483647
>
> This allowed the program to run to completion.  Therefore, we would
> like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
> Does anyone have an objection to this change?  Soon we are going to
> have larger CPU counts and would like to increase the odds that things
> work "out of the box" on these large SMPs.
>
> On a side note, I did a quick comparison of the shared memory needs of
> the old Sun ClusterTools to Open MPI and came up with this table.
>  
>                                          Open MPI
> np      Sun ClusterTools 6        current   suggested
> -----------------------------------------------------------------
>   2         20M                      128M        128M
>   4         20M                      128M        128M
>   8         22M                      256M        256M
>  16         27M                      512M        512M
>  32         48M                      512M          1G
>  64        133M                      512M        2G-1
> 128        476M                      512M        2G-1
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>   

Reply via email to