Re: [OMPI users] Openmpi-3.1.0 + slurm (fixed)
Sorry all, Chris S over on the slurm list spotted it right away. I didn't have the MpiDefault set to pmix_v2. I can confirm that Ubuntu 18.04, gcc-7.3, openmpi-3.1.0, pmix-2.1.1, and slurm-17.11.5 seem to work well together. Sorry for the bother. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Openmpi-3.1.0 + slurm?
I have openmpi-3.0.1, pmix-1.2.4, and slurm-17.11.5 working well on a few clusters. For things like: bill@headnode:~/src/relay$ srun -N 2 -n 2 -t 1 ./relay 1 c7-18 c7-19 size= 1, 16384 hops, 2 nodes in 0.03 sec ( 2.00 us/hop) 1953 KB/sec I've been having a tougher time trying to get openmpi-3.1, (external) pmix-2.1.1, and slurm-17.11.5 working. Anyone have similar working? I compiled them both with: ./configure --prefix=/share/apps/openmpi-3.1.0/gcc7 --with-pmix=/share/apps/pmix-2.1.1/gcc7 --with-libevent=external --disable-io-romio --disable-io-ompio ./configure --prefix=/share/apps/slurm-17.11.5/gcc7 --with-pmix=/share/apps/pmix-2.1.1/gcc7 Both config.log's look promising. No pmix related errors, and variables being set including the PMIX discovered flags. I did notice that the working openmpi configs had: #define OPAL_PMIX_V1 1 But the nonworking openmpi config had: #define OPAL_PMIX_V1 0 Although it's not too surprising since I'm trying to compile and link against pmix-2.1.1. The other relevant env variables set by the configure: OPAL_CONFIGURE_CLI=' \'\''--prefix=/share/apps/openmpi-3.1.0/gcc7\'\'' \'\''--with-pmix=/share/apps/pmix-2.1.1/gcc7\'\'' \'\''--with-libevent=external\'\'' \'\''--disable-io-romio\'\'' \'\''--disable-io-ompio\'\''' opal_pmix_ext1x_CPPFLAGS='-I/share/apps/pmix-2.1.1/gcc7/include' opal_pmix_ext1x_LDFLAGS='-L/share/apps/pmix-2.1.1/gcc7/lib' opal_pmix_ext1x_LIBS='-lpmix' opal_pmix_ext2x_CPPFLAGS='-I/share/apps/pmix-2.1.1/gcc7/include' opal_pmix_ext2x_LDFLAGS='-L/share/apps/pmix-2.1.1/gcc7/lib' Any hints on how to debug this? When I try to run: bill@demon:~/relay$ mpicc -O3 relay.c -o relay bill@demon:~/relay$ srun -N 2 -n 2 ./relay 1 [c2-50:01318] OPAL ERROR: Not initialized in file ext2x_client.c at line 109 -- The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using: version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix. Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location. Please configure as appropriate and try again. -- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [c2-50:01318] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] New ib locked pages behavior?
On 10/22/2014 12:37 AM, r...@q-leap.de wrote: >>>>>> "Bill" == Bill Broadley writes: > > It seems the half-life period of knowledge on the list has decayed to > two weeks on the list :) > > I've commented in detail on this (non-)issue on 2014-08-20: > > http://www.open-mpi.org/community/lists/users/2014/08/25090.php I read that. It seems pretty clear what the problem is, but not so clear on what a user experiencing this problem should do about it. So for people who are using ubuntu 14.04 and openmpi-1.6.5 and 64 GB nodes. Should they: * bump log_mtts_per_seg from 3 to 4 (64GB) or 5 (128GB)? * ignore the error message because it doesn't apply? * ditch ubuntu's packagedopenmpi 1.6.5 and all the packages that depends on it and install something newer than 1.8.2rc4? I also found: http://www.open-mpi.org/community/lists/users/2013/02/21430.php It was similarly vague as to if it was a real problem and exactly what the fix is.
Re: [OMPI users] New ib locked pages behavior?
On 10/21/2014 05:38 PM, Gus Correa wrote: > Hi Bill > > I have 2.6.X CentOS stock kernel. Heh, wow, quite a blast from the past. > I set both parameters. > It works. Yes, for kernels that old I had it working fine. > Maybe the parameter names may changed in 3.X kernels? > (Which is really bad ...) > You could check if there is more information in: > /sys/module/mlx4_core/parameters/ $ ls /sys/module/mlx4_core/parameters/ debug_level log_mtts_per_segmsi_xuse_prio enable_64b_cqe_eqe log_num_mac num_vfs enable_qos log_num_mgm_entry_size port_type_array internal_err_reset log_num_vlanprobe_vf $ As expected there's a log_mtts_per_seg, but no log_num_mtt or num_mtt. > There seems to be a thread on the list about this (but apparently > no solution): > http://www.open-mpi.org/community/lists/users/2013/02/21430.php > > Maybe Mellanox has more information about this? I'm all ears. No idea what was behind the change to eliminate what sound like fairly important parameters in mlx4_core.
Re: [OMPI users] New ib locked pages behavior?
On 10/21/2014 04:18 PM, Gus Correa wrote: > Hi Bill > > Maybe you're missing these settings in /etc/modprobe.d/mlx4_core.conf ? > > http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem Ah, that helped. Although: /lib/modules/3.13.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx4$ modinfo mlx4_core | grep "^parm" Lists some promising looking parameters: parm: log_mtts_per_seg:Log2 number of MTT entries per segment (1-7) (int) The FAQ recommends log_num_mtt or num_mtt and NOT log_mtts_per_seg, sadly: $ modinfo mlx4_core | grep "^parm" | grep mtt parm: log_mtts_per_seg:Log2 number of MTT entries per segment (1-7) (int) $ Looks like the best I can do is bump log_mtts_per_seg. I tried: $ cat /etc/modprobe.d/mlx4_core.conf options mlx4_core log_num_mtt=24 $ But: [6.691959] mlx4_core: unknown parameter 'log_num_mtt' ignored I ended up with: options mlx4_core log_mtts_per_seg=2 I'm hoping that doubles the registerable memory, although I did see a recommendation to raise it to double the system ram (in this case 64GB ram/128GB locakable. Maybe an update to the FAQ is needed?
[OMPI users] New ib locked pages behavior?
I've setup several clusters over the years with OpenMPI. I often get the below error: WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. ... http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: c2-31 Registerable memory: 32768 MiB Total memory:64398 MiB I'm well aware of the normal fixes, and have implemented them in puppet to ensure compute nodes get the changes. To be paranoid I've implemented all the changes, and they all worked under ubuntu 13.10. However with ubuntu 14.04 it seems like it's not working, thus the above message. As recommended by the faq's I've implemented: 1) ulimit -l unlimited in /etc/profile.d/slurm.sh 2) PropagateResourceLimitsExcept=MEMLOCK in slurm.conf 3) UsePAM=1 in slurm.conf 4) in /etc/security/limits.conf * hard memlock unlimited * soft memlock unlimited * hard stack unlimited * soft stack unlimited My changes seem to be working, of I submit this to slurm: #!/bin/bash -l ulimit -l hostname mpirun bash -c ulimit -l mpirun ./relay 1 131072 I get: unlimited c2-31 unlimited unlimited unlimited unlimited Is there some new kernel parameter, ofed parameter, or similar that controls locked pages now? The kernel is 3.13.0-36 and the libopenmpi-dev package is 1.6.5. Since the ulimit -l is getting to both the slurm launched script and also to the mpirun launched binaries I'm pretty puzzled. Any suggestions?
Re: [OMPI users] MPI processes hang when using OpenMPI 1.3.2 and Gcc-4.4.0
A rather stable production code that has worked with various versions of MPI on various architectures started hanging with gcc-4.4.2 and openmpi 1.3.33 Which lead me to this thread. I made some very small changes to Eugene's code, here's the diff: $ diff testorig.c billtest.c 3,5c3,4 < < #define N 4 < #define M 4 --- > #define N 8000 > #define M 8000 17c16 < --- > fprintf (stderr, "Initialized\n"); 32,33c31,39 < MPI_Sendrecv (sbuf, N, MPI_FLOAT, top, 0, < rbuf, N, MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, &status); --- > { > if ((me == 0) && (i % 100 == 0)) > { > fprintf (stderr, "%d\n", i); > } > MPI_Sendrecv (sbuf, N, MPI_FLOAT, top, 0, rbuf, N, MPI_FLOAT, bottom, 0, > MPI_COMM_WORLD, &status); > } > Basically print some occasional progress, and shrink M and N. I'm running on a new intel dual socket nehalem system with centos-5.4. I compiled gcc-4.4.2 and openmpi myself with all the defaults, except I had to point out mpfr-2.4.1 to gcc. If I run: $ mpirun -np 4 ./billtest About 1 in 2 times I get something like: bill@farm bill]$ mpirun -np 4 ./billtest Initialized Initialized Initialized Initialized 0 100 Next time worked, next time: [bill@farm bill]$ mpirun -np 4 ./billtest Initialized Initialized Initialized Initialized 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 Next time hung at 7100. Next time worked. If I strace it when hung I get something like: poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}], 6, 0) = 0 (Timeout) If I run gdb on a hung job (compiled with -O4 -g) (gdb) bt #0 0x2ab3b34cb385 in ompi_request_default_wait () from /share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0 #1 0x2ab3b34f0d48 in PMPI_Sendrecv () from /share/apps/openmpisb-1.3/gcc-4.4/lib/libmpi.so.0 #2 0x00400b88 in main (argc=1, argv=0x7fff083fd298) at billtest.c:36 (gdb) If I recompile with -O1 I get the same thing. Even -g I get the same thing. If I compile the application with gcc-4.3 and still use a gcc-4.4 compiled openmpi I still get hangs. If I compiled openmpi-1.3.3 with gcc-4.3 and the application with gcc-4.3 and I run it 20 times I get zero hangs. Seems like that gcc-4.4 and openib-1.3.3 are incompatible. In my production code I'd always get hung at MPI_Waitall, but the above is obviously inside of Sendrecv. To be paranoid I just reran it 40 times without a hang. Original code below. Eugene Loh wrote: ... > #include > #include > > #define N 4 > #define M 4 > > int main(int argc, char **argv) { > int np, me, i, top, bottom; > float sbuf[N], rbuf[N]; > MPI_Status status; > > MPI_Init(&argc,&argv); > MPI_Comm_size(MPI_COMM_WORLD,&np); > MPI_Comm_rank(MPI_COMM_WORLD,&me); > > top= me + 1; if ( top >= np ) top-= np; > bottom = me - 1; if ( bottom < 0 ) bottom += np; > > for ( i = 0; i < N; i++ ) sbuf[i] = 0; > for ( i = 0; i < N; i++ ) rbuf[i] = 0; > > MPI_Barrier(MPI_COMM_WORLD); > for ( i = 0; i < M - 1; i++ ) >MPI_Sendrecv(sbuf, N, MPI_FLOAT, top , 0, > rbuf, N, MPI_FLOAT, bottom, 0, MPI_COMM_WORLD, &status); > MPI_Barrier(MPI_COMM_WORLD); > > MPI_Finalize(); > return 0; > } > > Can you reproduce your problem with this test case? > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Can't use tcp instead of openib/infinipath
Jeff Squyres wrote: Sorry for the delay in replying. What exactly is the relay program timing? Can you run a standard benchmark like NetPIPE, perchance? (http://www.scl.ameslab.gov/netpipe/) It gives very similar numbers to osu_latency. Turns out the mca btl seems to be completely ignored, I.e.: [bill@compute-0-0 relay]$ mpirun -np 2 -mca btl foo -machinefile m ./relay 1 compute-0-0.local compute-0-1.local size=1, 131072 hops, 2 nodes in 0.266 sec ( 2.027 us/hop) 1928 KB/sec Or: mpirun -np 2 -mca btl foo -machinefile m \ /usr/mpi/gcc/openmpi-1.2.6/tests/osu_benchmarks-3.0/osu_bw # OSU MPI Bandwidth Test v3.0 # SizeBandwidth (MB/s) 1 2.40 ... My understanding is that -mca btl foo should fail since there isn't a transport layer called foo. [bill@compute-0-0 relay]$ which mpirun /usr/mpi/gcc/openmpi-1.2.6/bin/mpirun ldd ./relay libm.so.6 => /lib64/libm.so.6 (0x2acc7000) libmpi.so.0 => /usr/mpi/gcc/openmpi-1.2.6/lib64/libmpi.so.0 (0x2af4a000) libopen-rte.so.0 => /usr/mpi/gcc/openmpi-1.2.6/lib64/libopen-rte.so.0 (0x2b1d8000) libopen-pal.so.0 => /usr/mpi/gcc/openmpi-1.2.6/lib64/libopen-pal.so.0 (0x2b433000) libdl.so.2 => /lib64/libdl.so.2 (0x2b692000) libnsl.so.1 => /lib64/libnsl.so.1 (0x2b896000) libutil.so.1 => /lib64/libutil.so.1 (0x2baaf000) libpthread.so.0 => /lib64/libpthread.so.0 (0x2bcb2000) libc.so.6 => /lib64/libc.so.6 (0x2becc000) /lib64/ld-linux-x86-64.so.2 (0x2aaab000) So OFED-1.3.1 (or an openmpi build from source) ./install.pl works with TCP, but not infinipath (because of a missing psm library). All the "-mca btl" functionality works as expected. OFED-1.3.1 (or an openmpi build from source) when I add "--with-psm" works with infinipath, but all -mca parameters are ignored. Is there a way to get openmpi working with infinipath without the psm library? Or a suggestion on how to get the -mca functionality working?
[OMPI users] Can't use tcp instead of openib/infinipath
I built openib-1.2.6 on centos-5.2 with gcc-4.3.1. I did a tar xvzf, cd openib-1.2.6, mkdir obj, cd obj: (I put gcc-4.3.1/bin first in my path) ../configure --prefix=/opt/pkg/openmpi-1.2.6 --enable-shared --enable-debug If I look in config.log I see: MCA_btl_ALL_COMPONENTS=' self sm gm mvapi mx openib portals tcp udapl' MCA_btl_DSO_COMPONENTS=' self sm openib tcp' So both openib and tcp are available and have many parameters under ompi_info --param btl tcp ompi_info --param btl openib Yet, when I run a MPI program I can't get use TCP: # which mpirun /opt/pkg/openmpi-1.2.6/bin/mpirun # mpirun -mca btl ^openib -np 2 -machinefile m ./relay 1 compute-0-1.local compute-0-0.local size=1, 131072 hops, 2 nodes in 0.304 sec ( 2.320 us/hop) 1683 KB/sec Or if I try the inverse: # mpirun -mca btl self,tcp -np 2 -machinefile m ./relay 1 compute-0-1.local compute-0-0.local size=1, 131072 hops, 2 nodes in 0.313 sec ( 2.386 us/hop) 1637 KB/sec 2.3us is definitely faster than GigE. I don't have IPoverIB setup, ifconfig -a shows ib0, but it has no IP address. I removed all other openib implementations (infinipath came with one) before I compiled, and the binary seems to be linked against the right libraries: # ldd ./relay libmpi.so.0 => /opt/pkg/openmpi-1.2.6/lib/libmpi.so.0 (0x2acc7000) libopen-rte.so.0 => /opt/pkg/openmpi-1.2.6/lib/libopen-rte.so.0 (0x2afb5000) libopen-pal.so.0 => /opt/pkg/openmpi-1.2.6/lib/libopen-pal.so.0 (0x2b23d000) libdl.so.2 => /lib64/libdl.so.2 (0x2b4b2000) libnsl.so.1 => /lib64/libnsl.so.1 (0x2b6b6000) libutil.so.1 => /lib64/libutil.so.1 (0x2b8ce000) libm.so.6 => /lib64/libm.so.6 (0x2bad2000) libpthread.so.0 => /lib64/libpthread.so.0 (0x2bd55000) libc.so.6 => /lib64/libc.so.6 (0x2bf6f000) /lib64/ld-linux-x86-64.so.2 (0x2aaab000) Can anyone suggest what to look into?