Re: [slurm-users] Question about determining pre-empted jobs

2020-02-29 Thread Chris Samuel

On 28/2/20 9:53 am, Jeffrey R. Lang wrote:

We have had a request to generate a report showing the number of jobs by 
date showing pre-empted jobs.   We used sacct to try to gather the data 
but we only found a few jobs with the state “PREEMPTED”.


It might be that if jobs are being set to be requeued then you'll need 
to use the --duplicates option to sacct to see previous iterations of 
the job when it was preempted.


Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Setup for backup slurmctld

2020-02-29 Thread Chris Samuel
On Wednesday, 26 February 2020 12:48:26 PM PST Joshua Baker-LePain wrote:

> We're planning the migration of our moderately sized cluster (~400 nodes,
> 40K jobs/day) from SGE to slurm.  We'd very much like to have a backup
> slurmctld, and it'd be even better if our backup slurmctld could be in a
> separate data center from the primary (though they'd still be on the same
> private network).  So, how are folks sharing the StateSaveLocation in such
> a setup?  Any and all recommendations (including those with the 2
> slurmctld servers in the same rack) welcome.  Thanks!

We use GPFS for our shared state directory (Cori is 12K nodes and we put 
5K-30K jobs a day through it, very variable job mix); the important thing is 
the IOPS rate for the filesystem, if it can't keep up with Slurm then you're 
going to see performance issues.

Tim from SchedMD had some notes on HA (and other things) from the Slurm 2017 
user group):  https://slurm.schedmd.com/SLUG17/FieldNotes.pdf

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






Re: [slurm-users] Slurm 19.05 X11-forwarding

2020-02-29 Thread Christopher Samuel

On 2/28/20 8:56 PM, Pär Lundö wrote:

I thought that I could run the srun-command with X11-forwarding called 
from an sbatch-jobarray-script and get the X11-forwarding to my display.


No, I believe X11 forwarding can only work when you run "srun --x11" 
directly on a login node, not from inside a batch script.


(You should not need to be logged into a compute node either)

See:

https://slurm.schedmd.com/faq.html#x11

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Hybrid compiling options

2020-02-29 Thread dean.w.schulze
There are GPU plugins that won't be built unless you build on a node that has 
the Nvidia drivers installed.

-Original Message-
From: slurm-users  On Behalf Of Brian 
Andrus
Sent: Friday, February 28, 2020 7:36 PM
To: slurm-users@lists.schedmd.com
Subject: [slurm-users] Hybrid compiling options

All,

Wanted to reach out for input on how folks compile slurm when you have a hybrid 
cluster.

Scenario:

you have 4 node types:

A) CPU only
B) GPU Only
C) CPU+IB
D) GPU+IB

So, you can compile slurm with/without IB support and/or with/without GPU 
support.
Including either option creates a dependency when packaging (RPM based).

So, do you compile different versions for the different node types or install 
the dependent packages on nodes that have no user (nvidia in particular here)?

Generally, I have always added the superfluous packages, but wondered what the 
thoughts on that are.

Brian Andrus