Folks, FWIW, i observe a similar behaviour on my system.
imho, the root cause is OFED has been upgraded from a (quite) older version to latest 3.12 version here is the relevant part of code (btl_openib.c from the master) : static uint64_t calculate_max_reg (void) { if (0 == stat("/sys/module/mlx4_core/parameters/log_num_mtt", &statinfo)) { } else if (0 == stat("/sys/module/ib_mthca/parameters/num_mtt", &statinfo)) { mtts_per_seg = 1 << read_module_param("/sys/module/ib_mthca/parameters/log_mtts_per_seg", 1); num_mtt = read_module_param("/sys/module/ib_mthca/parameters/num_mtt", 1 << 20); reserved_mtt = read_module_param("/sys/module/ib_mthca/parameters/fmr_reserved_mtts", 0); max_reg = (num_mtt - reserved_mtt) * opal_getpagesize () * mtts_per_seg; } else if ( (0 == stat("/sys/module/mlx5_core", &statinfo)) || (0 == stat("/sys/module/mlx4_core/parameters", &statinfo)) || (0 == stat("/sys/module/ib_mthca/parameters", &statinfo)) ) { /* mlx5 means that we have ofed 2.0 and it can always register 2xmem_total for any mlx hca */ max_reg = 2 * mem_total; } else { } /* Print a warning if we can't register more than 75% of physical memory. Abort if the abort_not_enough_reg_mem MCA param was set. */ if (max_reg < mem_total * 3 / 4) { } return (max_reg * 7) >> 3; } with OFED 3.12, the /sys/module/mlx4_core/parameters/log_num_mtt pseudo file does *not* exist any more /sys/module/ib_mthca/parameters/num_mtt exists so the second path is taken and mtts_per_seg is read from /sys/module/ib_mthca/parameters/log_mtts_per_seg i noted that log_mtts_per_seg is also a parameter of mlx4_core : /sys/module/mlx4_core/parameters/log_mtts_per_seg the value is 3 in ib_mthca (and leads to a warning) but 5 in mlx4_core (big enough, and does not lead to a warning if this value is read) i had no time to read the latest ofed doc, so i cannot answer : - should log_mtts_per_seg be read from mlx4_core instead ? - is the warning a false positive ? my only point is this warning *might* be a false positive and the root cause *might* be calculate_max_reg logic *could* be wrong with the latest OFED stack. Could the Mellanox folks comment on this ? Cheers, Gilles On 2014/12/09 3:18, Götz Waschk wrote: > Hi, > > here's another test with openmpi 1.8.3. With 1.8.1, 32GB was detected, now > it is just 16: > % mpirun -np 2 /usr/lib64/openmpi-intel/bin/mpitests-osu_get_bw > -------------------------------------------------------------------------- > WARNING: It appears that your OpenFabrics subsystem is configured to only > allow registering part of your physical memory. This can cause MPI jobs to > run with erratic performance, hang, and/or crash. > > This may be caused by your OpenFabrics vendor limiting the amount of > physical memory that can be registered. You should investigate the > relevant Linux kernel module parameters that control how much physical > memory can be registered, and increase them to allow registering all > physical memory on your machine. > > See this Open MPI FAQ item for more information on these Linux kernel module > parameters: > > http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages > > Local host: pax95 > Registerable memory: 16384 MiB > Total memory: 49106 MiB > > Your MPI job will continue, but may be behave poorly and/or hang. > -------------------------------------------------------------------------- > # OSU MPI_Get Bandwidth Test v4.3 > # Window creation: MPI_Win_allocate > # Synchronization: MPI_Win_flush > # Size Bandwidth (MB/s) > 1 28.56 > 2 58.74 > > > So it wasn't fixed for RHEL 6.6. > > Regards, Götz > > On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk <goetz.was...@gmail.com> wrote: > >> Hi, >> >> I had tested 1.8.4rc1 and it wasn't fixed. I can try again though, >> maybe I had made an error. >> >> Regards, Götz Waschk >> >> On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: >>> Hi, >>> >>> This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3 >> a >>> shot? >>> >>> Best, >>> >>> Josh >>> >>> On Mon, Dec 8, 2014 at 8:43 AM, Götz Waschk <goetz.was...@gmail.com> >> wrote: >>>> Dear Open-MPI experts, >>>> >>>> I have updated my little cluster from Scientific Linux 6.5 to 6.6, >>>> this included extensive changes in the Infiniband drivers and a newer >>>> openmpi version (1.8.1). Now I'm getting this message on all nodes >>>> with more than 32 GB of RAM: >>>> >>>> >>>> WARNING: It appears that your OpenFabrics subsystem is configured to >> only >>>> allow registering part of your physical memory. This can cause MPI jobs >>>> to >>>> run with erratic performance, hang, and/or crash. >>>> >>>> This may be caused by your OpenFabrics vendor limiting the amount of >>>> physical memory that can be registered. You should investigate the >>>> relevant Linux kernel module parameters that control how much physical >>>> memory can be registered, and increase them to allow registering all >>>> physical memory on your machine. >>>> >>>> See this Open MPI FAQ item for more information on these Linux kernel >>>> module >>>> parameters: >>>> >>>> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages >>>> >>>> Local host: pax98 >>>> Registerable memory: 32768 MiB >>>> Total memory: 49106 MiB >>>> >>>> Your MPI job will continue, but may be behave poorly and/or hang. >>>> >>>> >>>> The issue is similar to the one described in a previous thread about >>>> Ubuntu nodes: >>>> http://www.open-mpi.org/community/lists/users/2014/08/25090.php >>>> But the Infiniband driver is different, the values log_num_mtt and >>>> log_mtts_per_seg both still exist, but they cannot be changed and have >>>> on all configurations the same values: >>>> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_num_mtt >>>> 0 >>>> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg >>>> 3 >>>> >>>> The kernel changelog says that Red Hat has included this commit: >>>> mlx4: Scale size of MTT table with system RAM (Doug Ledford) >>>> so it should be all fine, the buffers scale automatically, however, as >>>> far as I can see, the wrong value calculated by calculate_max_reg() is >>>> used in the code, so I think I cannot simply ignore the warning. Also, >>>> a user has reported a problem with a job, I cannot confirm that this >>>> is the cause. >>>> >>>> My workaround was to simply load the mlx5_core kernel module, as this >>>> is used by calculate_max_reg() to detect OFED 2.0. >>>> >>>> Regards, Götz Waschk >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/25923.php >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25924.php >> >> >> -- >> AL I:40: Do what thou wilt shall be the whole of the Law. >> > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25929.php