We pushed a fix for this that hopefully resolves all of these issues. It was merged into the release branch this morning. You can give it a try there, or you can wait until the 1.8.4rc5 drops.
Josh On Wed, Dec 10, 2014 at 9:37 AM, Joshua Ladd <jladd.m...@gmail.com> wrote: > > Thanks, Gilles > > We're back to looking at this (yet again.) It's a false positive, yes, > however, it's not completely benign. The max_reg that's calculated is much > smaller than it should be. In OFED 3.12, max_reg should be 2*TOTAL_RAM. We > should have a fix for 1.8.4. > > Josh > > On Mon, Dec 8, 2014 at 9:25 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Folks, >> >> FWIW, i observe a similar behaviour on my system. >> >> imho, the root cause is OFED has been upgraded from a (quite) older >> version to latest 3.12 version >> >> here is the relevant part of code (btl_openib.c from the master) : >> >> >> static uint64_t calculate_max_reg (void) >> { >> if (0 == stat("/sys/module/mlx4_core/parameters/log_num_mtt", >> &statinfo)) { >> } else if (0 == stat("/sys/module/ib_mthca/parameters/num_mtt", >> &statinfo)) { >> mtts_per_seg = 1 << >> read_module_param("/sys/module/ib_mthca/parameters/log_mtts_per_seg", 1); >> num_mtt = >> read_module_param("/sys/module/ib_mthca/parameters/num_mtt", 1 << 20); >> reserved_mtt = >> read_module_param("/sys/module/ib_mthca/parameters/fmr_reserved_mtts", 0); >> >> max_reg = (num_mtt - reserved_mtt) * opal_getpagesize () * >> mtts_per_seg; >> } else if ( >> (0 == stat("/sys/module/mlx5_core", &statinfo)) || >> (0 == stat("/sys/module/mlx4_core/parameters", &statinfo)) || >> (0 == stat("/sys/module/ib_mthca/parameters", &statinfo)) >> ) { >> /* mlx5 means that we have ofed 2.0 and it can always register >> 2xmem_total for any mlx hca */ >> max_reg = 2 * mem_total; >> } else { >> } >> >> /* Print a warning if we can't register more than 75% of physical >> memory. Abort if the abort_not_enough_reg_mem MCA param was >> set. */ >> if (max_reg < mem_total * 3 / 4) { >> } >> return (max_reg * 7) >> 3; >> } >> >> with OFED 3.12, the /sys/module/mlx4_core/parameters/log_num_mtt pseudo >> file does *not* exist any more >> /sys/module/ib_mthca/parameters/num_mtt exists so the second path is taken >> and mtts_per_seg is read from >> /sys/module/ib_mthca/parameters/log_mtts_per_seg >> >> i noted that log_mtts_per_seg is also a parameter of mlx4_core : >> /sys/module/mlx4_core/parameters/log_mtts_per_seg >> >> the value is 3 in ib_mthca (and leads to a warning) but 5 in mlx4_core >> (big enough, and does not lead to a warning if this value is read) >> >> >> i had no time to read the latest ofed doc, so i cannot answer : >> - should log_mtts_per_seg be read from mlx4_core instead ? >> - is the warning a false positive ? >> >> >> my only point is this warning *might* be a false positive and the root >> cause *might* be calculate_max_reg logic >> *could* be wrong with the latest OFED stack. >> >> Could the Mellanox folks comment on this ? >> >> Cheers, >> >> Gilles >> >> >> >> >> >> On 2014/12/09 3:18, Götz Waschk wrote: >> >> Hi, >> >> here's another test with openmpi 1.8.3. With 1.8.1, 32GB was detected, now >> it is just 16: >> % mpirun -np 2 /usr/lib64/openmpi-intel/bin/mpitests-osu_get_bw >> -------------------------------------------------------------------------- >> WARNING: It appears that your OpenFabrics subsystem is configured to only >> allow registering part of your physical memory. This can cause MPI jobs to >> run with erratic performance, hang, and/or crash. >> >> This may be caused by your OpenFabrics vendor limiting the amount of >> physical memory that can be registered. You should investigate the >> relevant Linux kernel module parameters that control how much physical >> memory can be registered, and increase them to allow registering all >> physical memory on your machine. >> >> See this Open MPI FAQ item for more information on these Linux kernel module >> parameters: >> >> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages >> >> Local host: pax95 >> Registerable memory: 16384 MiB >> Total memory: 49106 MiB >> >> Your MPI job will continue, but may be behave poorly and/or hang. >> -------------------------------------------------------------------------- >> # OSU MPI_Get Bandwidth Test v4.3 >> # Window creation: MPI_Win_allocate >> # Synchronization: MPI_Win_flush >> # Size Bandwidth (MB/s) >> 1 28.56 >> 2 58.74 >> >> >> So it wasn't fixed for RHEL 6.6. >> >> Regards, Götz >> >> On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk <goetz.was...@gmail.com> >> <goetz.was...@gmail.com> wrote: >> >> >> Hi, >> >> I had tested 1.8.4rc1 and it wasn't fixed. I can try again though, >> maybe I had made an error. >> >> Regards, Götz Waschk >> >> On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd <jladd.m...@gmail.com> >> <jladd.m...@gmail.com> wrote: >> >> Hi, >> >> This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3 >> >> a >> >> shot? >> >> Best, >> >> Josh >> >> On Mon, Dec 8, 2014 at 8:43 AM, Götz Waschk <goetz.was...@gmail.com> >> <goetz.was...@gmail.com> >> >> wrote: >> >> Dear Open-MPI experts, >> >> I have updated my little cluster from Scientific Linux 6.5 to 6.6, >> this included extensive changes in the Infiniband drivers and a newer >> openmpi version (1.8.1). Now I'm getting this message on all nodes >> with more than 32 GB of RAM: >> >> >> WARNING: It appears that your OpenFabrics subsystem is configured to >> >> only >> >> allow registering part of your physical memory. This can cause MPI jobs >> to >> run with erratic performance, hang, and/or crash. >> >> This may be caused by your OpenFabrics vendor limiting the amount of >> physical memory that can be registered. You should investigate the >> relevant Linux kernel module parameters that control how much physical >> memory can be registered, and increase them to allow registering all >> physical memory on your machine. >> >> See this Open MPI FAQ item for more information on these Linux kernel >> module >> parameters: >> >> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages >> >> Local host: pax98 >> Registerable memory: 32768 MiB >> Total memory: 49106 MiB >> >> Your MPI job will continue, but may be behave poorly and/or hang. >> >> >> The issue is similar to the one described in a previous thread about >> Ubuntu nodes:http://www.open-mpi.org/community/lists/users/2014/08/25090.php >> But the Infiniband driver is different, the values log_num_mtt and >> log_mtts_per_seg both still exist, but they cannot be changed and have >> on all configurations the same values: >> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_num_mtt >> 0 >> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg >> 3 >> >> The kernel changelog says that Red Hat has included this commit: >> mlx4: Scale size of MTT table with system RAM (Doug Ledford) >> so it should be all fine, the buffers scale automatically, however, as >> far as I can see, the wrong value calculated by calculate_max_reg() is >> used in the code, so I think I cannot simply ignore the warning. Also, >> a user has reported a problem with a job, I cannot confirm that this >> is the cause. >> >> My workaround was to simply load the mlx5_core kernel module, as this >> is used by calculate_max_reg() to detect OFED 2.0. >> >> Regards, Götz Waschk >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this >> post:http://www.open-mpi.org/community/lists/users/2014/12/25923.php >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this >> post:http://www.open-mpi.org/community/lists/users/2014/12/25924.php >> >> >> -- >> AL I:40: Do what thou wilt shall be the whole of the Law. >> >> >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25929.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16454.php >> > >