We pushed a fix for this that hopefully resolves all of these issues. It
was merged into the release branch this morning. You can give it a try
there, or you can wait until the 1.8.4rc5 drops.

Josh

On Wed, Dec 10, 2014 at 9:37 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>
> Thanks, Gilles
>
> We're back to looking at this (yet again.) It's a false positive, yes,
> however, it's not completely benign. The max_reg that's calculated is much
> smaller than it should be. In OFED 3.12, max_reg should be 2*TOTAL_RAM. We
> should have a fix for 1.8.4.
>
> Josh
>
> On Mon, Dec 8, 2014 at 9:25 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Folks,
>>
>> FWIW, i observe a similar behaviour on my system.
>>
>> imho, the root cause is OFED has been upgraded from a (quite) older
>> version to latest 3.12 version
>>
>> here is the relevant part of code (btl_openib.c from the master) :
>>
>>
>> static uint64_t calculate_max_reg (void)
>> {
>>     if (0 == stat("/sys/module/mlx4_core/parameters/log_num_mtt",
>> &statinfo)) {
>>     } else if (0 == stat("/sys/module/ib_mthca/parameters/num_mtt",
>> &statinfo)) {
>>         mtts_per_seg = 1 <<
>> read_module_param("/sys/module/ib_mthca/parameters/log_mtts_per_seg", 1);
>>         num_mtt =
>> read_module_param("/sys/module/ib_mthca/parameters/num_mtt", 1 << 20);
>>         reserved_mtt =
>> read_module_param("/sys/module/ib_mthca/parameters/fmr_reserved_mtts", 0);
>>
>>         max_reg = (num_mtt - reserved_mtt) * opal_getpagesize () *
>> mtts_per_seg;
>>     } else if (
>>             (0 == stat("/sys/module/mlx5_core", &statinfo)) ||
>>             (0 == stat("/sys/module/mlx4_core/parameters", &statinfo)) ||
>>             (0 == stat("/sys/module/ib_mthca/parameters", &statinfo))
>>             ) {
>>         /* mlx5 means that we have ofed 2.0 and it can always register
>> 2xmem_total for any mlx hca */
>>         max_reg = 2 * mem_total;
>>     } else {
>>     }
>>
>>     /* Print a warning if we can't register more than 75% of physical
>>        memory.  Abort if the abort_not_enough_reg_mem MCA param was
>>        set. */
>>     if (max_reg < mem_total * 3 / 4) {
>>     }
>>     return (max_reg * 7) >> 3;
>> }
>>
>> with OFED 3.12, the /sys/module/mlx4_core/parameters/log_num_mtt pseudo
>> file does *not* exist any more
>> /sys/module/ib_mthca/parameters/num_mtt exists so the second path is taken
>> and mtts_per_seg is read from
>> /sys/module/ib_mthca/parameters/log_mtts_per_seg
>>
>> i noted that log_mtts_per_seg is also a parameter of mlx4_core :
>> /sys/module/mlx4_core/parameters/log_mtts_per_seg
>>
>> the value is 3 in ib_mthca (and leads to a warning) but 5 in mlx4_core
>> (big enough, and does not lead to a warning if this value is read)
>>
>>
>> i had no time to read the latest ofed doc, so i cannot answer :
>> - should log_mtts_per_seg be read from mlx4_core instead ?
>> - is the warning a false positive ?
>>
>>
>> my only point is this warning *might* be a false positive and the root
>> cause *might* be calculate_max_reg logic
>> *could* be wrong with the latest OFED stack.
>>
>> Could the Mellanox folks comment on this ?
>>
>> Cheers,
>>
>> Gilles
>>
>>
>>
>>
>>
>> On 2014/12/09 3:18, Götz Waschk wrote:
>>
>> Hi,
>>
>> here's another test with openmpi 1.8.3. With 1.8.1, 32GB was detected, now
>> it is just 16:
>> % mpirun -np 2 /usr/lib64/openmpi-intel/bin/mpitests-osu_get_bw
>> --------------------------------------------------------------------------
>> WARNING: It appears that your OpenFabrics subsystem is configured to only
>> allow registering part of your physical memory.  This can cause MPI jobs to
>> run with erratic performance, hang, and/or crash.
>>
>> This may be caused by your OpenFabrics vendor limiting the amount of
>> physical memory that can be registered.  You should investigate the
>> relevant Linux kernel module parameters that control how much physical
>> memory can be registered, and increase them to allow registering all
>> physical memory on your machine.
>>
>> See this Open MPI FAQ item for more information on these Linux kernel module
>> parameters:
>>
>>     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>>
>>   Local host:              pax95
>>   Registerable memory:     16384 MiB
>>   Total memory:            49106 MiB
>>
>> Your MPI job will continue, but may be behave poorly and/or hang.
>> --------------------------------------------------------------------------
>> # OSU MPI_Get Bandwidth Test v4.3
>> # Window creation: MPI_Win_allocate
>> # Synchronization: MPI_Win_flush
>> # Size      Bandwidth (MB/s)
>> 1                      28.56
>> 2                      58.74
>>
>>
>> So it wasn't fixed for RHEL 6.6.
>>
>> Regards, Götz
>>
>> On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk <goetz.was...@gmail.com> 
>> <goetz.was...@gmail.com> wrote:
>>
>>
>>  Hi,
>>
>> I had tested 1.8.4rc1 and it wasn't fixed. I can try again though,
>> maybe I had made an error.
>>
>> Regards, Götz Waschk
>>
>> On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd <jladd.m...@gmail.com> 
>> <jladd.m...@gmail.com> wrote:
>>
>>  Hi,
>>
>> This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3
>>
>>  a
>>
>>  shot?
>>
>> Best,
>>
>> Josh
>>
>> On Mon, Dec 8, 2014 at 8:43 AM, Götz Waschk <goetz.was...@gmail.com> 
>> <goetz.was...@gmail.com>
>>
>>  wrote:
>>
>>  Dear Open-MPI experts,
>>
>> I have updated my little cluster from Scientific Linux 6.5 to 6.6,
>> this included extensive changes in the Infiniband drivers and a newer
>> openmpi version (1.8.1). Now I'm getting this message on all nodes
>> with more than 32 GB of RAM:
>>
>>
>> WARNING: It appears that your OpenFabrics subsystem is configured to
>>
>>  only
>>
>>  allow registering part of your physical memory.  This can cause MPI jobs
>> to
>> run with erratic performance, hang, and/or crash.
>>
>> This may be caused by your OpenFabrics vendor limiting the amount of
>> physical memory that can be registered.  You should investigate the
>> relevant Linux kernel module parameters that control how much physical
>> memory can be registered, and increase them to allow registering all
>> physical memory on your machine.
>>
>> See this Open MPI FAQ item for more information on these Linux kernel
>> module
>> parameters:
>>
>>     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>>
>>   Local host:              pax98
>>   Registerable memory:     32768 MiB
>>   Total memory:            49106 MiB
>>
>> Your MPI job will continue, but may be behave poorly and/or hang.
>>
>>
>> The issue is similar to the one described in a previous thread about
>> Ubuntu nodes:http://www.open-mpi.org/community/lists/users/2014/08/25090.php
>> But the Infiniband driver is different, the values log_num_mtt and
>> log_mtts_per_seg both still exist, but they cannot be changed and have
>> on all configurations the same values:
>> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_num_mtt
>> 0
>> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
>> 3
>>
>> The kernel changelog says that Red Hat has included this commit:
>> mlx4: Scale size of MTT table with system RAM (Doug Ledford)
>> so it should be all fine, the buffers scale automatically, however, as
>> far as I can see, the wrong value calculated by calculate_max_reg() is
>> used in the code, so I think I cannot simply ignore the warning. Also,
>> a user has reported a problem with a job, I cannot confirm that this
>> is the cause.
>>
>> My workaround was to simply load the mlx5_core kernel module, as this
>> is used by calculate_max_reg() to detect OFED 2.0.
>>
>> Regards, Götz Waschk
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this 
>> post:http://www.open-mpi.org/community/lists/users/2014/12/25923.php
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this 
>> post:http://www.open-mpi.org/community/lists/users/2014/12/25924.php
>>
>>
>> --
>> AL I:40: Do what thou wilt shall be the whole of the Law.
>>
>>
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/12/25929.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16454.php
>>
>
>

Reply via email to