Re: [slurm-users] Segfault with 32 processes, OK with 30 ???

2020-10-12 Thread Chris Samuel
On Monday, 12 October 2020 2:43:36 AM PDT Diego Zuccato wrote:

> Seems so:
> "The application appears to have been direct launched using "srun",
> but OMPI was not built with SLURM's PMI support and therefore cannot
> execute."
> 
> So it seems I can't use srun to launch OpenMPI jobs.

OK, I suspect this rules Slurm out of the running as the cause, I'd suggest 
either rebuilding OpenMPI with Slurm support or if it's a distro related 
package filing a bug with the distro, or alternatively trying for help with the 
OpenMPI users list:

https://lists.open-mpi.org/mailman/listinfo/users

Best of luck!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






Re: [slurm-users] sbatch overallocation

2020-10-12 Thread Diego Zuccato
Il 10/10/20 18:53, Renfro, Michael ha scritto:

>   * Do you want to ensure that one job requesting 9 tasks (and 1 CPU per
> task) can’t overstep its reservation and take resources away from
> other jobs on those nodes? Cgroups [1] should be able to confine the
> job to its 9 CPUs, and even if 8 processes get started at once in
> the job, they’ll only drive up the nodes’ load average, and not
> affect others’ performance.
IMHO cgroups is a must-have: each job is guaranteed to receive what it
asks for, and no more. If it tries to use more, it just self-contends
resources, w/o impacting other jobs.
Configuring it greatly reduced headaches on our cluster :)

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] Segfault with 32 processes, OK with 30 ???

2020-10-12 Thread Diego Zuccato
Il 08/10/20 08:48, Chris Samuel ha scritto:

Sorry for being so late. I've had to wait for the node to be free.

> Launch it with "srun" rather than "mpirun", that way it'll be managed by 
> Slurm.  If your test program then says every rank is rank 0 that will tell 
> you 
> OpenMPI is not built with Slurm support.
Seems so:
"The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute."

So it seems I can't use srun to launch OpenMPI jobs.
But sust s/srun/mpirun (that, IIUC, should be supported) it seems to
work, and even auto-detects the corrent number of ranks to use.
I launched the test executable with mpirun on one of the newer nodes (56
threads) and got:
-8<--
[...]
Hello from task 52 on str957-mtx-11!
Hello from task 53 on str957-mtx-11!
Hello from task 54 on str957-mtx-11!
This is an MPI parallel code for Hello World with no communication
Hello from task 0 on str957-mtx-11!
MASTER: Number of MPI tasks is: 56
Hello from task 18 on str957-mtx-11!
[...]
-8<--
But if I run it on the older 32-thread node:
-8<--
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7480b700 (LWP 19633)]
[New Thread 0x73fe9700 (LWP 19634)]
[New Thread 0x73764700 (LWP 19635)]
[New Thread 0x72f63700 (LWP 19636)]
[Detaching after fork from child process 19637]
[Detaching after fork from child process 19638]
[Detaching after fork from child process 19639]
[Detaching after fork from child process 19641]
[Detaching after fork from child process 19643]
[Detaching after fork from child process 19645]
[Detaching after fork from child process 19647]
[Detaching after fork from child process 19649]
[Detaching after fork from child process 19651]
[Detaching after fork from child process 19653]
[Detaching after fork from child process 19655]
[Detaching after fork from child process 19657]
[Detaching after fork from child process 19659]
[Detaching after fork from child process 19661]
[Detaching after fork from child process 19663]
[Detaching after fork from child process 19665]
[Detaching after fork from child process 19667]
[Detaching after fork from child process 19669]
[Detaching after fork from child process 19671]
[Detaching after fork from child process 19673]
[Detaching after fork from child process 19675]
[Detaching after fork from child process 19677]
[Detaching after fork from child process 19679]
[Detaching after fork from child process 19681]
[Detaching after fork from child process 19683]
[Detaching after fork from child process 19685]
[Detaching after fork from child process 19687]
[Detaching after fork from child process 19689]
[Detaching after fork from child process 19691]
[Detaching after fork from child process 19693]
[Detaching after fork from child process 19695]
[Detaching after fork from child process 19697]
[str957-bl0-03:19637] *** Process received signal ***
[str957-bl0-03:19637] Signal: Segmentation fault (11)
[str957-bl0-03:19637] Signal code: Address not mapped (1)
[str957-bl0-03:19637] Failing at address: 0x77fac008
[str957-bl0-03:19637] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x77e92730]
[str957-bl0-03:19637] [ 1]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7646d936]
[str957-bl0-03:19637] [ 2]
/usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x76444733]
[str957-bl0-03:19637] [ 3]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7646d5b4]
[str957-bl0-03:19637] [ 4]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7659346e]
[str957-bl0-03:19637] [ 5]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7654b88d]
[str957-bl0-03:19637] [ 6]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x76507d7c]
[str957-bl0-03:19637] [ 7]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x76603fe4]
[str957-bl0-03:19637] [ 8]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x77fb1656]
[str957-bl0-03:19637] [ 9]
/usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x77c1c11a]
[str957-bl0-03:19637] [10]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x77eece62]
[str957-bl0-03:19637] [11]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x77f1b17e]
[str957-bl0-03:19637] [12] ./mpitest-debug(+0x11c6)[0x51c6]
[str957-bl0-03:19637] [13]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x77ce309b]
[str957-bl0-03:19637] [14] ./mpitest-debug(+0x10da)[0x50da]
[str957-bl0-03:19637] *** End of error message ***
[... repeats the same error another 29 times ...]
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

Re: [slurm-users] unable to run on all the logical cores

2020-10-12 Thread Diego Zuccato
Il 12/10/20 05:55, William Brown ha scritto:

> We always disable hyper-threading on the compute nodes, at the
> recommendation of the company that installed it, who reported that it
> ran faster that way.  Multi-threading has a cost and is not ideal for
> compute workloads.  It is better for things like web servers where some
> tasks are waiting IO or user input.  That said, it likely depends on the
> CPU and what is true for Intel might not be true for AMD.
Well, we did some tests too. Mostly w/ MPI on the newer nodes (2x14x2).
We didn't test disabling HT, just "not using" (scheduling a single
process per core). Even FPU-intensive jobs (where there should be
maximum contention of the FPU between the two threads) seems to be
worthwile to use HT.
Say a job can process 100 iterations when using a single thread on a
reserved core. If I let it use both threads it can process roughly 180
iterations. So every thread is actually about 10% slower, but overall I
can get 1.8 times the throughput when using HT.

That's just my very limited experience, but it seems it could be
worthwhile to test.

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786