Re: [slurm-users] Question about determining pre-empted jobs
On 28/2/20 9:53 am, Jeffrey R. Lang wrote: We have had a request to generate a report showing the number of jobs by date showing pre-empted jobs. We used sacct to try to gather the data but we only found a few jobs with the state “PREEMPTED”. It might be that if jobs are being set to be requeued then you'll need to use the --duplicates option to sacct to see previous iterations of the job when it was preempted. Best of luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Setup for backup slurmctld
On Wednesday, 26 February 2020 12:48:26 PM PST Joshua Baker-LePain wrote: > We're planning the migration of our moderately sized cluster (~400 nodes, > 40K jobs/day) from SGE to slurm. We'd very much like to have a backup > slurmctld, and it'd be even better if our backup slurmctld could be in a > separate data center from the primary (though they'd still be on the same > private network). So, how are folks sharing the StateSaveLocation in such > a setup? Any and all recommendations (including those with the 2 > slurmctld servers in the same rack) welcome. Thanks! We use GPFS for our shared state directory (Cori is 12K nodes and we put 5K-30K jobs a day through it, very variable job mix); the important thing is the IOPS rate for the filesystem, if it can't keep up with Slurm then you're going to see performance issues. Tim from SchedMD had some notes on HA (and other things) from the Slurm 2017 user group): https://slurm.schedmd.com/SLUG17/FieldNotes.pdf All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Slurm 19.05 X11-forwarding
On 2/28/20 8:56 PM, Pär Lundö wrote: I thought that I could run the srun-command with X11-forwarding called from an sbatch-jobarray-script and get the X11-forwarding to my display. No, I believe X11 forwarding can only work when you run "srun --x11" directly on a login node, not from inside a batch script. (You should not need to be logged into a compute node either) See: https://slurm.schedmd.com/faq.html#x11 All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Hybrid compiling options
There are GPU plugins that won't be built unless you build on a node that has the Nvidia drivers installed. -Original Message- From: slurm-users On Behalf Of Brian Andrus Sent: Friday, February 28, 2020 7:36 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Hybrid compiling options All, Wanted to reach out for input on how folks compile slurm when you have a hybrid cluster. Scenario: you have 4 node types: A) CPU only B) GPU Only C) CPU+IB D) GPU+IB So, you can compile slurm with/without IB support and/or with/without GPU support. Including either option creates a dependency when packaging (RPM based). So, do you compile different versions for the different node types or install the dependent packages on nodes that have no user (nvidia in particular here)? Generally, I have always added the superfluous packages, but wondered what the thoughts on that are. Brian Andrus