Re: [slurm-users] How to avoid a feature?

2021-07-06 Thread Relu Patrascu
We have had a similar problem, even with different partitions for CPU 
and GPU nodes, people still submitted jobs to the GPU nodes, and we 
suspected running CPU type jobs. Doesn't help to look for the missing 
--gres=gpu:x because a user can ask for GPUs and simply not use them. We 
thought of getting into GPU usage checks but that isn't ideal either, in 
part because it makes things pretty messy if you want to get real GPU 
usage (and we did it for a while using NVIDIA's API for that), and in 
part because there are legitimate jobs which need a GPU but not 
intensively (e.g. some reinforcement learning experiments).


The main currency on our cluster is the fairshare score. We do not use 
shares as credit points, rather as a resource that gets eroded as per 
resource consumption. We assigned tres billing weights on the GPU nodes 
such that allocating one GPU on a four GPU node would automatically 
charge you max(N/4, M/4, G/4) if N, M, and G were cores, memory, and 
number of GPUs. To make this work we also used PriorityFlags=MAX_TRES in 
slurm.conf.


Now we don't have to worry about someone taking all the RAM and just 1 
CPU and 1 GPU on a node. They "pay" for the resource that they consume, 
maximally. We did have a problem where someone would allocate just 1 
GPU, a few CPU cores, and almost all the RAM, effectively rendering the 
node useless to others. Now they pay almost for the entire node if they 
do, which is the fairest charge, because nobody else can use the node.


Works for us also because we use preemption across the cluster (1h 
exemption) and jobs get preempted based on job priority. The more anyone 
consumes resources, the lower their fairshare score, which results in 
lower job priorities.


Relu



On 2021-07-01 13:21, Tina Friedrich wrote:

Hi Brian,

sometimes it would be nice if SLURM had what Grid Engine calls a 
'forced complex' (i.e. a feature that you *have* to request to land on 
a node that has it), wouldn't it?


I do something like that for all of my 'special' nodes (GPU, KNL, 
nodes...) - I want to avoid jobs not requesting that resource or 
allowing that architecture landing on it. I 'tag' all nodes with a 
relevant feature (cpu, gpu, knl, ...), and have a LUA submit verifier 
that checks for a 'relevant' feature (or a --gres=gpu or somthing) and 
if there isn't one I add the 'cpu' feature to the request.


Works for us!

Tina

On 01/07/2021 15:08, Brian Andrus wrote:

All,

I have a partition where one of the nodes has a node-locked license.
That license is not used by everyone that uses the partition.
They are cloud nodes, so weights do not work (there is an open bug 
about that).


I need to have jobs 'avoid' that node by default. I am thinking I can 
use a feature constraint, but that seems to only apply to those that 
want the feature. Since we have so many other users, it isn't 
feasible to have them modify their scripts, so having it avoid by 
default would work.


Any ideas how to do that? Submit LUA perhaps?

Brian Andrus








Re: [slurm-users] MinJobAge

2021-07-06 Thread Emre Brookes

Ward Poelmans wrote:

On 6/07/2021 14:59, Emre Brookes wrote:

I'm using slurm 20.02.7 & have the same issue (except I am running batch jobs).
Does MinJobAge work to keep completed jobs around for the specified duration in 
squeue output?

It does for me if I do 'squeue -t all'. This is slurm 20.11.7.

Ward

.

Ah! that was my issue, I didn't supply a '-t all' argument, so squeue 
was ignoring jobs in state CD.


Thanks for your help!
Emre




Re: [slurm-users] MinJobAge

2021-07-06 Thread Ward Poelmans
On 6/07/2021 14:59, Emre Brookes wrote:
> I'm using slurm 20.02.7 & have the same issue (except I am running batch 
> jobs).
> Does MinJobAge work to keep completed jobs around for the specified duration 
> in squeue output?

It does for me if I do 'squeue -t all'. This is slurm 20.11.7.

Ward



Re: [slurm-users] MinJobAge

2021-07-06 Thread Paul Edmon

The documentation indicates that's what should happen with MinJobAge:

*MinJobAge*
   The minimum age of a completed job before its record is purged from
   Slurm's active database. Set the values of *MaxJobCount* and to
   ensure the slurmctld daemon does not exhaust its memory or other
   resources. The default value is 300 seconds. A value of zero
   prevents any job record purging. Jobs are not purged during a
   backfill cycle, so it can take longer than MinJobAge seconds to
   purge a job if using the backfill scheduling plugin. In order to
   eliminate some possible race conditions, the minimum non-zero value
   for *MinJobAge* recommended is 2. 

From my experience this does work.  We've been running with 
MinJobAge=600 for years with out any problems to my knowledge


-Paul Edmon-

On 7/6/2021 8:59 AM, Emre Brookes wrote:


  Brian Andrus

Nov 23, 2020, 1:55:54 PM
to slurm...@lists.schedmd.com
All,

I always thought that MinJobAge affected how long a job will show up
when doing 'squeue'

That does not seem to be the case for me.

I have MinJobAge=900, but if I do 'squeue --me' as soon as I finish an
interactive job, there is nothing in the queue.

I swear I used to see jobs in a completed state for a period of time,
but they are not showing up at all on our cluster.


How does one have jobs show up that are completed?
I'm using slurm 20.02.7 & have the same issue (except I am running 
batch jobs).
Does MinJobAge work to keep completed jobs around for the specified 
duration in squeue output?


Thanks,
Emre




[slurm-users] MinJobAge

2021-07-06 Thread Emre Brookes


  Brian Andrus

Nov 23, 2020, 1:55:54 PM
to slurm...@lists.schedmd.com
All,

I always thought that MinJobAge affected how long a job will show up
when doing 'squeue'

That does not seem to be the case for me.

I have MinJobAge=900, but if I do 'squeue --me' as soon as I finish an
interactive job, there is nothing in the queue.

I swear I used to see jobs in a completed state for a period of time,
but they are not showing up at all on our cluster.


How does one have jobs show up that are completed?
I'm using slurm 20.02.7 & have the same issue (except I am running batch 
jobs).
Does MinJobAge work to keep completed jobs around for the specified 
duration in squeue output?


Thanks,
Emre