Re: [slurm-users] 19.05 and GPUs vs GRES

2019-09-05 Thread Christopher Samuel

On 9/5/19 3:49 PM, Bill Broadley wrote:


I have a user with a particularly flexible code that would like to run a single 
MPI job across
multiple nodes, some with 8 GPUs each, some with 2 GPUs.


Perhaps they could just specify a number of tasks with cpus per task, 
mem per task and GPUs per task and let Slurm balance it out?


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] 19.05 and GPUs vs GRES

2019-09-05 Thread Christopher Samuel

On 8/13/19 10:44 PM, Barbara Krašovec wrote:


We still have the gres configuration, users have their workload scripted
and some still use sbatch with gres. Both options work.


I missed this before Barbara, sorry - that's really good to know that 
the options aren't mutually exclusive, thank you!


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] 19.05 and GPUs vs GRES

2019-09-05 Thread Bill Broadley
Anyone know if the new GPU support allows having a different number of GPUs per 
node?

I found:
https://www.ch.cam.ac.uk/computing/slurm-usage

Which mentions "SLURM does not support having varying numbers of GPUs per node 
in a job yet."

I have a user with a particularly flexible code that would like to run a single 
MPI job across
multiple nodes, some with 8 GPUs each, some with 2 GPUs.





Re: [slurm-users] slurm node weights

2019-09-05 Thread Marcus Boden
Hello Doug,

tp quote the slurm.conf page:
It would be preferable to allocate smaller memory nodes rather than
larger memory nodes if either will satisfy a job's requirements.

So I guess the idea is, that if a smaller node satisfies all
requirements, why 'waste' a bigger one for it? It makes sense for
memory, though I agree that it is counterintuitive for preocessors.

Best,
Marcus

On 19-09-05 15:48, Douglas Duckworth wrote:
> Hello
> 
> We added some newer Epyc nodes, with NVMe scratch, to our cluster and so want 
> jobs to run on these over others.  So we added "Weight=100" to the older 
> nodes and left the new ones blank.  So indeed, ceteris paribus, srun reveals 
> that the faster nodes will accept jobs over older ones.
> 
> We have the desired outcome though I am a bit confused by two statements in 
> the manpage that seem to be 
> contradictory:
> 
> "All things being equal, jobs will be allocated the nodes with the lowest 
> weight which satisfies their requirements."
> 
> "...larger weights should be assigned to nodes with more processors, memory, 
> disk space, higher processor speed, etc."
> 
> 100 is larger than 1 and we do see jobs preferring the new nodes which have 
> the default weight of 1.  Yet we're also told to assign larger weights to 
> faster nodes?
> 
> Thanks!
> Doug
> 
> 
> --
> Thanks,
> 
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit
> Weill Cornell Medicine"
> E: d...@med.cornell.edu
> O: 212-746-6305
> F: 212-746-8690

-- 
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience
Tel.:   +49 (0)551 201-2191
E-Mail: mbo...@gwdg.de
---
Gesellschaft fuer wissenschaftliche
Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:http://www.gwdg.de
E-Mail: g...@gwdg.de
Tel.:   +49 (0)551 201-1510
Fax:+49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---


smime.p7s
Description: S/MIME cryptographic signature


Re: [slurm-users] slurm node weights

2019-09-05 Thread Merlin Hartley
I believe this is so that small jobs will naturally go on older, slower nodes 
first - leaving the bigger,better ones for jobs that actually need them.


Merlin
--
Merlin Hartley
IT Support Engineer
MRC Mitochondrial Biology Unit
University of Cambridge
Cambridge, CB2 0XY
United Kingdom

> On 5 Sep 2019, at 16:48, Douglas Duckworth  wrote:
> 
> Hello
> 
> We added some newer Epyc nodes, with NVMe scratch, to our cluster and so want 
> jobs to run on these over others.  So we added "Weight=100" to the older 
> nodes and left the new ones blank.  So indeed, ceteris paribus, srun reveals 
> that the faster nodes will accept jobs over older ones.
> 
> We have the desired outcome though I am a bit confused by two statements in 
> the manpage  that seem to be 
> contradictory:
> 
> "All things being equal, jobs will be allocated the nodes with the lowest 
> weight which satisfies their requirements."
> 
> "...larger weights should be assigned to nodes with more processors, memory, 
> disk space, higher processor speed, etc."
> 
> 100 is larger than 1 and we do see jobs preferring the new nodes which have 
> the default weight of 1.  Yet we're also told to assign larger weights to 
> faster nodes?
> 
> Thanks!
> Doug
> 
> -- 
> Thanks,
> 
> Douglas Duckworth, MSc, LFCS
> HPC System Administrator
> Scientific Computing Unit 
> Weill Cornell Medicine"
> E: d...@med.cornell.edu 
> O: 212-746-6305
> F: 212-746-8690



Re: [slurm-users] slurm node weights

2019-09-05 Thread Brian Andrus
The intention there is to pack jobs on the smallest node that can handle 
the job.


This way jobs that only need 1 cpu don't take it from a 64-core node 
unless it has to, leaving that one available for that 64-core job.



It really boils down to what you want to happen, which will vary with 
each installation.



Brian Andrus


On 9/5/2019 8:48 AM, Douglas Duckworth wrote:

Hello

We added some newer Epyc nodes, with NVMe scratch, to our cluster and 
so want jobs to run on these over others.  So we added "Weight=100" 
/*to the older nodes*/ and left the new ones blank. So indeed, ceteris 
paribus, srun reveals that the faster nodes will accept jobs over 
older ones.


We have the desired outcome though I am a bit confused by two 
statements in the manpage  
that seem to be contradictory:


"All things being equal, jobs will be allocated the nodes with the 
lowest weight which satisfies their requirements."


"...larger weights should be assigned to nodes with more processors, 
memory, disk space, higher processor speed, etc."


100 is larger than 1 and we do see jobs preferring the new nodes which 
have the default weight of 1.  Yet we're also told to assign larger 
weights to faster nodes?


Thanks!
Doug

--
Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit 
Weill Cornell Medicine"
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690



[slurm-users] slurm node weights

2019-09-05 Thread Douglas Duckworth
Hello

We added some newer Epyc nodes, with NVMe scratch, to our cluster and so want 
jobs to run on these over others.  So we added "Weight=100" to the older nodes 
and left the new ones blank.  So indeed, ceteris paribus, srun reveals that the 
faster nodes will accept jobs over older ones.

We have the desired outcome though I am a bit confused by two statements in the 
manpage that seem to be 
contradictory:

"All things being equal, jobs will be allocated the nodes with the lowest 
weight which satisfies their requirements."

"...larger weights should be assigned to nodes with more processors, memory, 
disk space, higher processor speed, etc."

100 is larger than 1 and we do see jobs preferring the new nodes which have the 
default weight of 1.  Yet we're also told to assign larger weights to faster 
nodes?

Thanks!
Doug


--
Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine"
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690