Re: [slurm-users] Simple free for all cluster

2020-10-10 Thread Chris Samuel
On Tuesday, 6 October 2020 7:53:02 AM PDT Jason Simms wrote:

> I currently don't have a MaxTime defined, because how do I know how long a
> job will take? Most jobs on my cluster require no more than 3-4 days, but
> in some cases at other campuses, I know that jobs can run for weeks. I
> suppose even setting a time limit such as 4 weeks would be overkill, but at
> least it's not infinite. I'm curious what others use as that value, and how
> you arrived at it

My journey over the last 16 years in HPC has been one of decreasing time 
limits, back in 2003 with VPAC's first Linux cluster we had no time limits, we 
then introduced a 90 day limit so we could plan quarterly maintenances (and 
yes, we had users who had jobs which legitimately ran longer than that, so 
they had to learn to checkpoint).  At VLSCI we had 30 day limits (life 
sciences, so many long running poorly scaling jobs), then when I was at 
Swinburne it was a 7 day limit, and now here at NERSC we've got 2 day limits.

It really is down to what your use cases are and how much influence you have 
over your users.  It's often the HPC sysadmins responsibility to try and find 
that balance between good utilisation, effective use of the system and reaching 
the desired science/research/development outcomes.

Best of luck!
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






Re: [slurm-users] sbatch overallocation

2020-10-10 Thread Renfro, Michael
I think the answer depends on why you’re trying to prevent the observed 
behavior:


  *   Do you want to ensure that one job requesting 9 tasks (and 1 CPU per 
task) can’t overstep its reservation and take resources away from other jobs on 
those nodes? Cgroups [1] should be able to confine the job to its 9 CPUs, and 
even if 8 processes get started at once in the job, they’ll only drive up the 
nodes’ load average, and not affect others’ performance.
  *   Are you trying to define a workflow where these 8 jobs can be run in 
parallel, and you want to wait until they’ve all completed before starting 
another job? Job dependencies using the --dependency flag to sbatch [2] should 
be able to handle that.

[1] https://slurm.schedmd.com/cgroups.html
[2] https://slurm.schedmd.com/sbatch.html

From: slurm-users  on behalf of Max 
Quast 
Reply-To: Slurm User Community List 
Date: Saturday, October 10, 2020 at 6:06 AM
To: 
Subject: [slurm-users] sbatch overallocation

Dear slurm-users,

I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm 20.02.5):

# COMPUTE NODES
GresTypes=gpu
NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64 
RealMemory=192073 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN
PartitionName=admin Nodes=lsm[216-217] Default=YES 
MaxTime=INFINITE State=UP

The slurmctl is running on a separate Ubuntu system where no slurmd is 
installed.

If a user executes this script (sbatch srun2.bash)

#!/bin/bash
#SBATCH -N 2 -n9
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-10 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-11 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-12 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-13 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-14 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-15 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-16 
-parallel > /dev/null &
srun pimpleFoam -case /mnt/NFS/users/quast/channel395-17 
-parallel > /dev/null &
wait

8 jobs with 9 threads are launched and distributed on two nodes.

If more such scripts get started at the same time, all the srun commands will 
be executed even though no free cores are available. So the nodes are 
overallocated.
How can this be prevented?

Thx :)

Greetings
max



Re: [slurm-users] sbatch overallocation

2020-10-10 Thread mercan

Hi;

You can submit each pimplefoam as a seperate job. or if you realy submit 
as a single job, you can use a program to run each of them as much as 
cpu count such as gnu parallel:


https://www.gnu.org/software/parallel/

regards;

Ahmet M.


10.10.2020 14:05 tarihinde Max Quast yazdı:


Dear slurm-users,

I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm 
20.02.5):


    # COMPUTE NODES

GresTypes=gpu

NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64 RealMemory=192073 
Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN


PartitionName=admin Nodes=lsm[216-217] Default=YES MaxTime=INFINITE 
State=UP


The slurmctl is running on a separate Ubuntu system where no slurmd is 
installed.


If a user executes this script (sbatch srun2.bash)

#!/bin/bash

    #SBATCH -N 2 -n9

    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-10 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-11 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-12 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-13 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-14 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-15 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-16 -parallel > /dev/null &


    srun pimpleFoam -case 
/mnt/NFS/users/quast/channel395-17 -parallel > /dev/null &


    wait

8 jobs with 9 threads are launched and distributed on two nodes.

If more such scripts get started at the same time, all the srun 
commands will be executed even though no free cores are available. So 
the nodes are overallocated.


How can this be prevented?

Thx :)

Greetings

max





Re: [slurm-users] How does SLURM calculate StartTime for pending jobs

2020-10-10 Thread Huda, Zia Ul
Hi Jianwen,

It is done by the select plugin. For example check the linear select plugin 
here: 
https://github.com/SchedMD/slurm/blob/master/src/plugins/select/linear/select_linear.c

Check the function extern int select_p_job_test() at line 3563. You can see the 
details of this function in the comments above it.

Best


Zia Ul Huda
Forschungszentrum Jülich GmbH
Institute for Advanced Simulation (IAS)
Jülich Supercomputing Centre (JSC)
Wilhelm-Johnen-Straße
52425 Jülich, Germany

Phone: +49 2461 61 96905
E-mail:  z.h...@fz-juelich.de

WWW: http://www.fz-juelich.de/ias/jsc/


JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing



On 10. Oct 2020, at 11:38, SJTU 
mailto:weijian...@sjtu.edu.cn>> wrote:

Hi,

`scontrol show jobid xxx` shows SLURM's estimation of StartTime for a pending 
job. I wonder where I can find the code implementation of StartTime .

Thank you!

Jianwen





Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt





[slurm-users] sbatch overallocation

2020-10-10 Thread Max Quast
Dear slurm-users, 

 

I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm
20.02.5):

 

# COMPUTE NODES

GresTypes=gpu

NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64
RealMemory=192073 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN

PartitionName=admin Nodes=lsm[216-217] Default=YES
MaxTime=INFINITE State=UP

 

The slurmctl is running on a separate Ubuntu system where no slurmd is
installed.

 

If a user executes this script (sbatch srun2.bash)

 

#!/bin/bash

#SBATCH -N 2 -n9

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-10
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-11
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-12
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-13
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-14
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-15
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-16
-parallel > /dev/null &

srun pimpleFoam -case /mnt/NFS/users/quast/channel395-17
-parallel > /dev/null &

wait

 

8 jobs with 9 threads are launched and distributed on two nodes.

 

If more such scripts get started at the same time, all the srun commands
will be executed even though no free cores are available. So the nodes are
overallocated.

How can this be prevented?

 

Thx :)

 

Greetings 

max

 



smime.p7s
Description: S/MIME cryptographic signature


[slurm-users] How does SLURM calculate StartTime for pending jobs

2020-10-10 Thread SJTU
Hi,

`scontrol show jobid xxx` shows SLURM's estimation of StartTime for a pending 
job. I wonder where I can find the code implementation of StartTime .

Thank you!

Jianwen