Re: [slurm-users] trying to add gres

2021-01-05 Thread Fulcomer, Samuel
Important notes...

If requesting more than one core and not using "-N 1", equal numbers of
GPUs will be allocated on each node where the cores are allocated. (i.e. if
requesting 1 GPU for a 2-core job, if one core is allocated on each of two
nodes, one GPU will be allocated on each node).

If you are running node exclusive, all GPUs on the node will be allocated
to the job, regardless of how many are used.






On Tue, Jan 5, 2021 at 7:30 PM Erik Bryer  wrote:

> I made the gres.conf the same on both nodes and Slurm started without
> error. I'm now seeing another error.
>
> There are 4 GPUs defined per node. If I start 2 jobs with
> #SBATCH --gpus=foolsgold:4
> it runs one job in each of the 2 nodes. If I scancel those and run 4 jobs
> with the script reading
> #SBATCH --gpus=foolsgold:1
> I get 2 queued and 2 running jobs. It seems allocating 1 gpu allocates all
> 4, not just 1. But why would this be so?
>
> Thanks,
> Erik
> --
> *From:* slurm-users  on behalf of
> Chris Samuel 
> *Sent:* Thursday, December 24, 2020 5:44 PM
> *To:* slurm-users@lists.schedmd.com 
> *Subject:* Re: [slurm-users] trying to add gres
>
> On 24/12/20 4:42 pm, Erik Bryer wrote:
>
> > I made sure my slurm.conf is synchronized across machines. My intention
> > is to add some arbitrary gres for testing purposes.
>
> Did you update your gres.conf on all the nodes to match?
>
> All the best,
> Chris
> --
> Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>


Re: [slurm-users] trying to add gres

2021-01-05 Thread Erik Bryer
I made the gres.conf the same on both nodes and Slurm started without error. 
I'm now seeing another error.

There are 4 GPUs defined per node. If I start 2 jobs with
#SBATCH --gpus=foolsgold:4
it runs one job in each of the 2 nodes. If I scancel those and run 4 jobs with 
the script reading
#SBATCH --gpus=foolsgold:1
I get 2 queued and 2 running jobs. It seems allocating 1 gpu allocates all 4, 
not just 1. But why would this be so?

Thanks,
Erik

From: slurm-users  on behalf of Chris 
Samuel 
Sent: Thursday, December 24, 2020 5:44 PM
To: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] trying to add gres

On 24/12/20 4:42 pm, Erik Bryer wrote:

> I made sure my slurm.conf is synchronized across machines. My intention
> is to add some arbitrary gres for testing purposes.

Did you update your gres.conf on all the nodes to match?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] trying to add gres

2020-12-24 Thread Chris Samuel

On 24/12/20 4:42 pm, Erik Bryer wrote:

I made sure my slurm.conf is synchronized across machines. My intention 
is to add some arbitrary gres for testing purposes.


Did you update your gres.conf on all the nodes to match?

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



[slurm-users] trying to add gres

2020-12-24 Thread Erik Bryer
Hello List,

I am trying to change:
NodeName=saga-test02 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 
RealMemory=1800 State=UNKNOWN
to
NodeName=saga-test02 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 
RealMemory=1800 State=UNKNOWN Gres=gpu:foolsgold:4
But I get this error once per second:
Dec 24 16:08:32 saga-test03 slurmctld[115409]: error: 
_slurm_rpc_node_registration node=saga-test02: Invalid argument

I made sure my slurm.conf is synchronized across machines. My intention is to 
add some arbitrary gres for testing purposes.

Thanks,
Erik