Re: [OMPI users] Openmpi-3.1.0 + slurm (fixed)

2018-05-08 Thread Bill Broadley

Sorry all,

Chris S over on the slurm list spotted it right away.  I didn't have the
MpiDefault set to pmix_v2.

I can confirm that Ubuntu 18.04, gcc-7.3, openmpi-3.1.0, pmix-2.1.1, and
slurm-17.11.5 seem to work well together.

Sorry for the bother.

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Openmpi-3.1.0 + slurm?

2018-05-08 Thread Bill Broadley

I have openmpi-3.0.1, pmix-1.2.4, and slurm-17.11.5 working well on a few
clusters.  For things like:

bill@headnode:~/src/relay$ srun -N 2 -n 2 -t 1 ./relay 1
c7-18 c7-19
size= 1,  16384 hops,  2 nodes in   0.03 sec (  2.00 us/hop)   1953 KB/sec

I've been having a tougher time trying to get openmpi-3.1, (external)
pmix-2.1.1, and slurm-17.11.5 working.  Anyone have similar working?

I compiled them both with:

./configure --prefix=/share/apps/openmpi-3.1.0/gcc7
--with-pmix=/share/apps/pmix-2.1.1/gcc7 --with-libevent=external
--disable-io-romio --disable-io-ompio

./configure --prefix=/share/apps/slurm-17.11.5/gcc7
--with-pmix=/share/apps/pmix-2.1.1/gcc7

Both config.log's look promising.  No pmix related errors, and variables being
set including the PMIX discovered flags.  I did notice that the working openmpi
configs had:
#define OPAL_PMIX_V1 1

But the nonworking openmpi config had:
#define OPAL_PMIX_V1 0

Although it's not too surprising since I'm trying to compile and link against
pmix-2.1.1.

The other relevant env variables set by the configure:
OPAL_CONFIGURE_CLI=' \'\''--prefix=/share/apps/openmpi-3.1.0/gcc7\'\''
\'\''--with-pmix=/share/apps/pmix-2.1.1/gcc7\'\''
\'\''--with-libevent=external\'\'' \'\''--disable-io-romio\'\''
\'\''--disable-io-ompio\'\'''
opal_pmix_ext1x_CPPFLAGS='-I/share/apps/pmix-2.1.1/gcc7/include'
opal_pmix_ext1x_LDFLAGS='-L/share/apps/pmix-2.1.1/gcc7/lib'
opal_pmix_ext1x_LIBS='-lpmix'
opal_pmix_ext2x_CPPFLAGS='-I/share/apps/pmix-2.1.1/gcc7/include'
opal_pmix_ext2x_LDFLAGS='-L/share/apps/pmix-2.1.1/gcc7/lib'

Any hints on how to debug this?

When I try to run:
bill@demon:~/relay$ mpicc -O3 relay.c -o relay
bill@demon:~/relay$ srun -N 2 -n 2 ./relay 1
[c2-50:01318] OPAL ERROR: Not initialized in file ext2x_client.c at line 109
--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[c2-50:01318] Local abort before MPI_INIT completed completed successfully, but
am not able to aggregate error messages, and not able to guarantee that all
other processes were killed!
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] New ib locked pages behavior?

2014-10-22 Thread Bill Broadley
On 10/22/2014 12:37 AM, r...@q-leap.de wrote:
>>>>>> "Bill" == Bill Broadley  writes:
> 
> It seems the half-life period of knowledge on the list has decayed to
> two weeks on the list :)
> 
> I've commented in detail on this (non-)issue on 2014-08-20:
> 
> http://www.open-mpi.org/community/lists/users/2014/08/25090.php

I read that.  It seems pretty clear what the problem is, but not so clear on
what a user experiencing this problem should do about it.

So for people who are using ubuntu 14.04 and openmpi-1.6.5 and 64 GB nodes.

Should they:
* bump log_mtts_per_seg from 3 to 4 (64GB) or 5 (128GB)?
* ignore the error message because it doesn't apply?
* ditch ubuntu's packagedopenmpi 1.6.5 and all the packages that depends on
  it and install something newer than 1.8.2rc4?

I also found:
  http://www.open-mpi.org/community/lists/users/2013/02/21430.php

It was similarly vague as to if it was a real problem and exactly what the fix 
is.



Re: [OMPI users] New ib locked pages behavior?

2014-10-22 Thread Bill Broadley
On 10/21/2014 05:38 PM, Gus Correa wrote:
> Hi Bill
> 
> I have 2.6.X CentOS stock kernel.

Heh, wow, quite a blast from the past.

> I set both parameters.
> It works.

Yes, for kernels that old I had it working fine.

> Maybe the parameter names may changed in 3.X kernels?
> (Which is really bad ...)
> You could check if there is more information in:
> /sys/module/mlx4_core/parameters/

$  ls /sys/module/mlx4_core/parameters/
debug_level log_mtts_per_segmsi_xuse_prio
enable_64b_cqe_eqe  log_num_mac num_vfs
enable_qos  log_num_mgm_entry_size  port_type_array
internal_err_reset  log_num_vlanprobe_vf
$

As expected there's a log_mtts_per_seg, but no log_num_mtt or num_mtt.

> There seems to be a thread on the list about this (but apparently
> no solution):
> http://www.open-mpi.org/community/lists/users/2013/02/21430.php
> 
> Maybe Mellanox has more information about this?

I'm all ears.  No idea what was behind the change to eliminate what
sound like fairly important parameters in mlx4_core.



Re: [OMPI users] New ib locked pages behavior?

2014-10-21 Thread Bill Broadley
On 10/21/2014 04:18 PM, Gus Correa wrote:
> Hi Bill
> 
> Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ?
> 
> http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem

Ah, that helped.  Although:
/lib/modules/3.13.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx4$
modinfo mlx4_core | grep "^parm"

Lists some promising looking parameters:
parm:   log_mtts_per_seg:Log2 number of MTT entries per segment (1-7) 
(int)

The FAQ recommends log_num_mtt or num_mtt and NOT log_mtts_per_seg, sadly:
$ modinfo mlx4_core | grep "^parm" | grep mtt
parm:   log_mtts_per_seg:Log2 number of MTT entries per segment (1-7) 
(int)
$

Looks like the best I can do is bump log_mtts_per_seg.

I tried:
$ cat /etc/modprobe.d/mlx4_core.conf
options mlx4_core log_num_mtt=24
$

But:
[6.691959] mlx4_core: unknown parameter 'log_num_mtt' ignored

I ended up with:
options mlx4_core log_mtts_per_seg=2

I'm hoping that doubles the registerable memory, although I did see a
recommendation to raise it to double the system ram (in this case 64GB ram/128GB
locakable.

Maybe an update to the FAQ is needed?



[OMPI users] New ib locked pages behavior?

2014-10-21 Thread Bill Broadley

I've setup several clusters over the years with OpenMPI.  I often get the below
error:

   WARNING: It appears that your OpenFabrics subsystem is configured to only
   allow registering part of your physical memory.  This can cause MPI jobs to
   run with erratic performance, hang, and/or crash.
   ...
   http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

 Local host:  c2-31
 Registerable memory: 32768 MiB
 Total memory:64398 MiB

I'm well aware of the normal fixes, and have implemented them in puppet to
ensure compute nodes get the changes.  To be paranoid I've implemented all the
changes, and they all worked under ubuntu 13.10.

However with ubuntu 14.04 it seems like it's not working, thus the above 
message.

As recommended by the faq's I've implemented:
1) ulimit -l unlimited in /etc/profile.d/slurm.sh
2) PropagateResourceLimitsExcept=MEMLOCK in slurm.conf
3) UsePAM=1 in slurm.conf
4) in /etc/security/limits.conf
   * hard memlock unlimited
   * soft memlock unlimited
   * hard stack unlimited
   * soft stack unlimited

My changes seem to be working, of I submit this to slurm:
#!/bin/bash -l
ulimit -l
hostname
mpirun bash -c ulimit -l
mpirun ./relay 1 131072

I get:
   unlimited
   c2-31
   unlimited
   unlimited
   unlimited
   unlimited
   
   

Is there some new kernel parameter, ofed parameter, or similar that controls
locked pages now?  The kernel is 3.13.0-36 and the libopenmpi-dev package is 
1.6.5.

Since the ulimit -l is getting to both the slurm launched script and also to the
mpirun launched binaries I'm pretty puzzled.

Any suggestions?


Re: [OMPI users] MPI processes hang when using OpenMPI 1.3.2 and Gcc-4.4.0

2009-11-18 Thread Bill Broadley
A rather stable production code that has worked with various versions of MPI
on various architectures started hanging with gcc-4.4.2 and openmpi 1.3.33

Which lead me to this thread.

I made some very small changes to Eugene's code, here's the diff:
$ diff testorig.c billtest.c
3,5c3,4
<
< #define N 4
< #define M 4
---
> #define N 8000
> #define M 8000
17c16
<
---
>   fprintf (stderr, "Initialized\n");
32,33c31,39
< MPI_Sendrecv (sbuf, N, MPI_FLOAT, top, 0,
< rbuf, N, MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, &status);
---
> {
>   if ((me == 0) && (i % 100 == 0))
>   {
> fprintf (stderr, "%d\n", i);
>   }
>   MPI_Sendrecv (sbuf, N, MPI_FLOAT, top, 0, rbuf, N, MPI_FLOAT, bottom, 0,
>   MPI_COMM_WORLD, &status);
> }
>

Basically print some occasional progress, and shrink M and N.

I'm running on a new intel dual socket nehalem system with centos-5.4.  I
compiled gcc-4.4.2 and openmpi myself with all the defaults, except I had to
point out mpfr-2.4.1 to gcc.

If I run:
$ mpirun -np 4 ./billtest

About 1 in 2 times I get something like:
bill@farm bill]$ mpirun -np 4 ./billtest
Initialized
Initialized
Initialized
Initialized
0
100


Next time worked, next time:
[bill@farm bill]$ mpirun -np 4 ./billtest
Initialized
Initialized
Initialized
Initialized
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
3500


Next time hung at 7100.

Next time worked.

If I strace it when hung I get something like:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN},
{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}], 6, 0) =
0 (Timeout)

If I run gdb on a hung job (compiled with -O4 -g)
(gdb) bt
#0  0x2ab3b34cb385 in ompi_request_default_wait ()
   from /share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#1  0x2ab3b34f0d48 in PMPI_Sendrecv () from
/share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0
#2  0x00400b88 in main (argc=1, argv=0x7fff083fd298) at billtest.c:36
(gdb)

If I recompile with -O1 I get the same thing.

Even -g I get the same thing.

If I compile the application with gcc-4.3 and still use a gcc-4.4 compiled
openmpi I still get hangs.

If I compiled openmpi-1.3.3 with gcc-4.3 and the application with gcc-4.3 and
I run it 20 times I get zero hangs.  Seems like that gcc-4.4 and openib-1.3.3
are incompatible.  In my production code I'd always get hung at MPI_Waitall,
but the above is obviously inside of Sendrecv.

To be paranoid I just reran it 40 times without a hang.

Original code below.

Eugene Loh wrote:
...

> #include 
> #include 
> 
> #define N 4
> #define M 4
> 
> int main(int argc, char **argv) {
>  int np, me, i, top, bottom;
>  float sbuf[N], rbuf[N];
>  MPI_Status status;
> 
>  MPI_Init(&argc,&argv);
>  MPI_Comm_size(MPI_COMM_WORLD,&np);
>  MPI_Comm_rank(MPI_COMM_WORLD,&me);
> 
>  top= me + 1;   if ( top  >= np ) top-= np;
>  bottom = me - 1;   if ( bottom < 0 ) bottom += np;
> 
>  for ( i = 0; i < N; i++ ) sbuf[i] = 0;
>  for ( i = 0; i < N; i++ ) rbuf[i] = 0;
> 
>  MPI_Barrier(MPI_COMM_WORLD);
>  for ( i = 0; i < M - 1; i++ )
>MPI_Sendrecv(sbuf, N, MPI_FLOAT, top   , 0,
> rbuf, N, MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, &status);
>  MPI_Barrier(MPI_COMM_WORLD);
> 
>  MPI_Finalize();
>  return 0;
> }
> 
> Can you reproduce your problem with this test case?
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Can't use tcp instead of openib/infinipath

2008-07-23 Thread Bill Broadley

Jeff Squyres wrote:

Sorry for the delay in replying.

What exactly is the relay program timing?  Can you run a standard 
benchmark like NetPIPE, perchance?  (http://www.scl.ameslab.gov/netpipe/)




It gives very similar numbers to osu_latency.  Turns out the mca btl seems to 
be completely ignored, I.e.:

[bill@compute-0-0 relay]$ mpirun -np 2 -mca btl foo -machinefile m ./relay 1
compute-0-0.local compute-0-1.local
size=1, 131072 hops, 2 nodes in  0.266 sec ( 2.027 us/hop)   1928 KB/sec

Or:
mpirun -np 2 -mca btl foo -machinefile m \ 
/usr/mpi/gcc/openmpi-1.2.6/tests/osu_benchmarks-3.0/osu_bw

# OSU MPI Bandwidth Test v3.0
# SizeBandwidth (MB/s)
1 2.40
...

My understanding is that -mca btl foo should fail since there isn't a 
transport layer called foo.


[bill@compute-0-0 relay]$ which mpirun
/usr/mpi/gcc/openmpi-1.2.6/bin/mpirun

ldd ./relay
libm.so.6 => /lib64/libm.so.6 (0x2acc7000)
libmpi.so.0 => /usr/mpi/gcc/openmpi-1.2.6/lib64/libmpi.so.0 
(0x2af4a000)
	libopen-rte.so.0 => /usr/mpi/gcc/openmpi-1.2.6/lib64/libopen-rte.so.0 
(0x2b1d8000)
	libopen-pal.so.0 => /usr/mpi/gcc/openmpi-1.2.6/lib64/libopen-pal.so.0 
(0x2b433000)

libdl.so.2 => /lib64/libdl.so.2 (0x2b692000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x2b896000)
libutil.so.1 => /lib64/libutil.so.1 (0x2baaf000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2bcb2000)
libc.so.6 => /lib64/libc.so.6 (0x2becc000)
/lib64/ld-linux-x86-64.so.2 (0x2aaab000)


So OFED-1.3.1 (or an openmpi build from source) ./install.pl works with TCP, 
but not infinipath (because of a missing psm library).  All the "-mca btl" 
functionality works as expected.


OFED-1.3.1 (or an openmpi build from source) when I add "--with-psm" works 
with infinipath, but all -mca parameters are ignored.  Is there a way to get 
openmpi working with infinipath without the psm library?  Or a suggestion on 
how to get the -mca functionality working?




[OMPI users] Can't use tcp instead of openib/infinipath

2008-07-19 Thread Bill Broadley


I built openib-1.2.6 on centos-5.2 with gcc-4.3.1.

I did a tar xvzf, cd openib-1.2.6, mkdir obj, cd obj:
(I put gcc-4.3.1/bin first in my path)
../configure --prefix=/opt/pkg/openmpi-1.2.6 --enable-shared --enable-debug

If I look in config.log I see:
MCA_btl_ALL_COMPONENTS=' self sm gm mvapi mx openib portals tcp udapl'
MCA_btl_DSO_COMPONENTS=' self sm openib tcp'

So both openib and tcp are available and have many parameters under
ompi_info --param btl tcp
ompi_info --param btl openib

Yet, when I run a MPI program I can't get use TCP:
# which mpirun
/opt/pkg/openmpi-1.2.6/bin/mpirun
# mpirun -mca btl ^openib -np 2 -machinefile m ./relay 1
compute-0-1.local compute-0-0.local
size=1, 131072 hops, 2 nodes in  0.304 sec ( 2.320 us/hop)   1683 KB/sec

Or if I try the inverse:
# mpirun -mca btl self,tcp -np 2 -machinefile m ./relay 1
compute-0-1.local compute-0-0.local
size=1, 131072 hops, 2 nodes in  0.313 sec ( 2.386 us/hop)   1637 KB/sec

2.3us is definitely faster than GigE.  I don't have IPoverIB setup, ifconfig 
-a shows ib0, but it has no IP address.


I removed all other openib implementations (infinipath came with one) before I
compiled, and the binary seems to be linked against the right libraries:
# ldd ./relay
libmpi.so.0 => /opt/pkg/openmpi-1.2.6/lib/libmpi.so.0 
(0x2acc7000)
	libopen-rte.so.0 => /opt/pkg/openmpi-1.2.6/lib/libopen-rte.so.0 
(0x2afb5000)
	libopen-pal.so.0 => /opt/pkg/openmpi-1.2.6/lib/libopen-pal.so.0 
(0x2b23d000)

libdl.so.2 => /lib64/libdl.so.2 (0x2b4b2000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x2b6b6000)
libutil.so.1 => /lib64/libutil.so.1 (0x2b8ce000)
libm.so.6 => /lib64/libm.so.6 (0x2bad2000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2bd55000)
libc.so.6 => /lib64/libc.so.6 (0x2bf6f000)
/lib64/ld-linux-x86-64.so.2 (0x2aaab000)

Can anyone suggest what to look into?