Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
Hi again. I am using /etc/modprobe.d/mofed.conf, otherwise I get: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/ But I am still getting the memory errors after making the changes and rebooting: $ cat /etc/modprobe.d/mofed.conf options mlx4_core log_num_mtt=24 options mlx4_core log_mtts_per_seg=1 $ mpirun hello -- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. On 11/29/2012 04:39 PM, Yevgeny Kliteynik wrote: You can also set these parameters in /etc/modprobe.conf: options mlx4_core log_num_mtt=24 log_mtts_per_seg=1 -- YK
Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
You can also set these parameters in /etc/modprobe.conf: options mlx4_core log_num_mtt=24 log_mtts_per_seg=1 -- YK On 11/30/2012 2:12 AM, Yevgeny Kliteynik wrote: > On 11/30/2012 12:47 AM, Joseph Farran wrote: >> I'll assume: /etc/modprobe.d/mlx4_en.conf > > Add these to /etc/modprobe.d/mofed.conf: > > options mlx4_core log_num_mtt=24 > options mlx4_core log_mtts_per_seg=1 > > And then restart the driver. > You need to do it on all the machines. > > -- YK > >> >> On 11/29/2012 02:34 PM, Joseph Farran wrote: >>> Where do change those mellanox settings? >>> >>> On 11/29/2012 02:23 PM, Jeff Squyres wrote: See http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem. On Nov 29, 2012, at 5:21 PM, Joseph Farran wrote: >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >
Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
On 11/30/2012 12:47 AM, Joseph Farran wrote: > I'll assume: /etc/modprobe.d/mlx4_en.conf Add these to /etc/modprobe.d/mofed.conf: options mlx4_core log_num_mtt=24 options mlx4_core log_mtts_per_seg=1 And then restart the driver. You need to do it on all the machines. -- YK > > On 11/29/2012 02:34 PM, Joseph Farran wrote: >> Where do change those mellanox settings? >> >> On 11/29/2012 02:23 PM, Jeff Squyres wrote: >>> See http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem. >>> >>> On Nov 29, 2012, at 5:21 PM, Joseph Farran wrote: >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
I'll assume: /etc/modprobe.d/mlx4_en.conf On 11/29/2012 02:34 PM, Joseph Farran wrote: Where do change those mellanox settings? On 11/29/2012 02:23 PM, Jeff Squyres wrote: See http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem. On Nov 29, 2012, at 5:21 PM, Joseph Farran wrote: ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
Where do change those mellanox settings? On 11/29/2012 02:23 PM, Jeff Squyres wrote: See http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem. On Nov 29, 2012, at 5:21 PM, Joseph Farran wrote:
Re: [OMPI users] OpenMPI 1.6.3 and Memory Issues
See http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem. On Nov 29, 2012, at 5:21 PM, Joseph Farran wrote: > Hi All. > > In compiling a simple Hello world with OpenMPI 1.6.3 and mpirun the hello > program, I am getting: > > $ ulimit -l unlimited > $ mpirun -np 2 hello > -- > WARNING: It appears that your OpenFabrics subsystem is configured to only > allow registering part of your physical memory. This can cause MPI jobs to > run with erratic performance, hang, and/or crash. > > This may be caused by your OpenFabrics vendor limiting the amount of > physical memory that can be registered. You should investigate the > relevant Linux kernel module parameters that control how much physical > memory can be registered, and increase them to allow registering all > physical memory on your machine. > > See this Open MPI FAQ item for more information on these Linux kernel module > parameters: > >http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages > > Local host: hpc > Registerable memory: 4096 MiB > Total memory:258470 MiB > > Your MPI job will continue, but may be behave poorly and/or hang. > -- > Hello World. I am the Master Node (hpc) with Rank 0. > Hello World. I am compute Node (hpc) with Rank 1 > [hpc:08261] 1 more process has sent help message help-mpi-btl-openib.txt / > reg mem limit low > [hpc:08261] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help > / error messages > > > I have my limits setup with: > cat /etc/security/limits.conf > * soft memlock unlimited > * hard memlock unlimited > > What am I missing? > > OS is CentOS 6.3. > > Joseph > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI users] OpenMPI 1.6.3 and Memory Issues
Hi All. In compiling a simple Hello world with OpenMPI 1.6.3 and mpirun the hello program, I am getting: $ ulimit -l unlimited $ mpirun -np 2 hello -- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. This may be caused by your OpenFabrics vendor limiting the amount of physical memory that can be registered. You should investigate the relevant Linux kernel module parameters that control how much physical memory can be registered, and increase them to allow registering all physical memory on your machine. See this Open MPI FAQ item for more information on these Linux kernel module parameters: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: hpc Registerable memory: 4096 MiB Total memory:258470 MiB Your MPI job will continue, but may be behave poorly and/or hang. -- Hello World. I am the Master Node (hpc) with Rank 0. Hello World. I am compute Node (hpc) with Rank 1 [hpc:08261] 1 more process has sent help message help-mpi-btl-openib.txt / reg mem limit low [hpc:08261] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages I have my limits setup with: cat /etc/security/limits.conf * soft memlock unlimited * hard memlock unlimited What am I missing? OS is CentOS 6.3. Joseph