Re: [slurm-users] trying to add gres
Important notes... If requesting more than one core and not using "-N 1", equal numbers of GPUs will be allocated on each node where the cores are allocated. (i.e. if requesting 1 GPU for a 2-core job, if one core is allocated on each of two nodes, one GPU will be allocated on each node). If you are running node exclusive, all GPUs on the node will be allocated to the job, regardless of how many are used. On Tue, Jan 5, 2021 at 7:30 PM Erik Bryer wrote: > I made the gres.conf the same on both nodes and Slurm started without > error. I'm now seeing another error. > > There are 4 GPUs defined per node. If I start 2 jobs with > #SBATCH --gpus=foolsgold:4 > it runs one job in each of the 2 nodes. If I scancel those and run 4 jobs > with the script reading > #SBATCH --gpus=foolsgold:1 > I get 2 queued and 2 running jobs. It seems allocating 1 gpu allocates all > 4, not just 1. But why would this be so? > > Thanks, > Erik > -- > *From:* slurm-users on behalf of > Chris Samuel > *Sent:* Thursday, December 24, 2020 5:44 PM > *To:* slurm-users@lists.schedmd.com > *Subject:* Re: [slurm-users] trying to add gres > > On 24/12/20 4:42 pm, Erik Bryer wrote: > > > I made sure my slurm.conf is synchronized across machines. My intention > > is to add some arbitrary gres for testing purposes. > > Did you update your gres.conf on all the nodes to match? > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > >
Re: [slurm-users] trying to add gres
I made the gres.conf the same on both nodes and Slurm started without error. I'm now seeing another error. There are 4 GPUs defined per node. If I start 2 jobs with #SBATCH --gpus=foolsgold:4 it runs one job in each of the 2 nodes. If I scancel those and run 4 jobs with the script reading #SBATCH --gpus=foolsgold:1 I get 2 queued and 2 running jobs. It seems allocating 1 gpu allocates all 4, not just 1. But why would this be so? Thanks, Erik From: slurm-users on behalf of Chris Samuel Sent: Thursday, December 24, 2020 5:44 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] trying to add gres On 24/12/20 4:42 pm, Erik Bryer wrote: > I made sure my slurm.conf is synchronized across machines. My intention > is to add some arbitrary gres for testing purposes. Did you update your gres.conf on all the nodes to match? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] trying to add gres
On 24/12/20 4:42 pm, Erik Bryer wrote: I made sure my slurm.conf is synchronized across machines. My intention is to add some arbitrary gres for testing purposes. Did you update your gres.conf on all the nodes to match? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
[slurm-users] trying to add gres
Hello List, I am trying to change: NodeName=saga-test02 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN to NodeName=saga-test02 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:foolsgold:4 But I get this error once per second: Dec 24 16:08:32 saga-test03 slurmctld[115409]: error: _slurm_rpc_node_registration node=saga-test02: Invalid argument I made sure my slurm.conf is synchronized across machines. My intention is to add some arbitrary gres for testing purposes. Thanks, Erik