Re: [slurm-users] Slurm configuration, Weight Parameter

2019-12-05 Thread Sarlo, Jeffrey S
We have weights and priority/multifactor.

Jeff

From: Sistemas NLHPC [mailto:siste...@nlhpc.cl]
Sent: Thursday, December 05, 2019 12:01 PM
To: Sarlo, Jeffrey S; Slurm User Community List
Subject: Re: [slurm-users] Slurm configuration, Weight Parameter

Thanks Jeff !

We upgrade slurm to 18.08.4 and now work with Weight !  but the parameter its 
possible running with plugin priority/multifactor ?

Thanks in advance

Regards

El mar., 3 dic. 2019 a las 17:37, Sarlo, Jeffrey S 
(mailto:jsa...@central.uh.edu>>) escribió:
Which version of slurm are you using?  I know in the early versions of 18.08 
prior to 18.08.04 there was a bug with weights not working.  Once we got past 
18.08.04,  then weights worked for us.

Jeff
University of Houston - HPC

From: slurm-users 
[mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>]
 On Behalf Of Sistemas NLHPC
Sent: Tuesday, December 03, 2019 12:33 PM
To: Slurm User Community List
Subject: Re: [slurm-users] Slurm configuration, Weight Parameter

Hi Renfro

I am testing this configuration, test configuration and as clean as possible:



NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle 
Sockets=2 CoresPerSocket=1
NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle Sockets=2 
CoresPerSocket=1
NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle 
Sockets=2 CoresPerSocket=1

PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes Shared=yes 
State=up

===

In your config is necessary one plugin extra or parameter for option Weight?

The configuration does not work as expected.

Regards,

El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael 
(mailto:ren...@tntech.edu>>) escribió:
We’ve been using that weighting scheme for a year or so, and it works as 
expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines 
like you have, but here’s our node settings and a subset of our partition 
settings.

In our environment, we’d often have lots of idle cores on GPU nodes, since 
those jobs tend to be GPU-bound rather than CPU-bound. So in one of our 
interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU node. 
Additionally, we have three memory configurations in our main batch partition. 
We want to bias jobs to running on the smaller-memory nodes by default. And the 
same principle applies to our GPU partition, where the smaller-memory GPU nodes 
get jobs before the larger-memory GPU node.

=

NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2 
ThreadsPerCore=1 Weight=10011 Gres=gpu:2
NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2 
ThreadsPerCore=1 Weight=10021 Gres=gpu:2
NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2 
ThreadsPerCore=1 Weight=10201
NodeName=node[023-034]  CoresPerSocket=14 RealMemory=126000 Sockets=2 
ThreadsPerCore=1 Weight=10211
NodeName=node[035-040]  CoresPerSocket=14 RealMemory=254000 Sockets=2 
ThreadsPerCore=1 Weight=10221

PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 MaxTime=02:00:00 
AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO 
RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO 
DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=12 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP 
Nodes=node[001-040],gpunode[001-004]

PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040]

PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 
State=UP Nodes=gpunode[001-004]

=

> On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC 
> mailto:siste...@nlhpc.cl>> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hi All,
>
> Thanks all for your posts
>
> Reading the documentation of Slurm and other sites like Niflheim 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole Holm 
> Nielsen) the parameter "Weight" is to assign a value to the nodes, with this 
> you can have priority in the nodes. But I have not obtained positive results.
>
> Thanks in advance
>
> Regards
>
> El sáb., 23 nov. 2019 a las 14:18, Chris Samuel 
&g

Re: [slurm-users] Slurm configuration, Weight Parameter

2019-12-05 Thread Sistemas NLHPC
Thanks Jeff !

We upgrade slurm to 18.08.4 and now work with Weight !  but the parameter
its possible running with plugin priority/multifactor ?

Thanks in advance

Regards

El mar., 3 dic. 2019 a las 17:37, Sarlo, Jeffrey S ()
escribió:

> Which version of slurm are you using?  I know in the early versions of
> 18.08 prior to 18.08.04 there was a bug with weights not working.  Once we
> got past 18.08.04,  then weights worked for us.
>
>
>
> Jeff
>
> University of Houston - HPC
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *Sistemas NLHPC
> *Sent:* Tuesday, December 03, 2019 12:33 PM
> *To:* Slurm User Community List
> *Subject:* Re: [slurm-users] Slurm configuration, Weight Parameter
>
>
>
> Hi Renfro
>
>
>
> I am testing this configuration, test configuration and as clean as
> possible:
>
>
>
> 
>
>
>
> NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle
> Sockets=2 CoresPerSocket=1
> NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle
> Sockets=2 CoresPerSocket=1
> NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle
> Sockets=2 CoresPerSocket=1
>
> PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes
> Shared=yes State=up
>
>
>
> ===
>
>
>
> In your config is necessary one plugin extra or parameter for option
> Weight?
>
>
>
> The configuration does not work as expected.
>
>
>
> Regards,
>
>
>
> El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael ()
> escribió:
>
> We’ve been using that weighting scheme for a year or so, and it works as
> expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines
> like you have, but here’s our node settings and a subset of our partition
> settings.
>
> In our environment, we’d often have lots of idle cores on GPU nodes, since
> those jobs tend to be GPU-bound rather than CPU-bound. So in one of our
> interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU
> node. Additionally, we have three memory configurations in our main batch
> partition. We want to bias jobs to running on the smaller-memory nodes by
> default. And the same principle applies to our GPU partition, where the
> smaller-memory GPU nodes get jobs before the larger-memory GPU node.
>
> =
>
> NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2
> ThreadsPerCore=1 Weight=10011 Gres=gpu:2
> NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2
> ThreadsPerCore=1 Weight=10021 Gres=gpu:2
> NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2
> ThreadsPerCore=1 Weight=10201
> NodeName=node[023-034]  CoresPerSocket=14 RealMemory=126000 Sockets=2
> ThreadsPerCore=1 Weight=10211
> NodeName=node[035-040]  CoresPerSocket=14 RealMemory=254000 Sockets=2
> ThreadsPerCore=1 Weight=10221
>
> PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4
> MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
> PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL
> LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0
> State=UP Nodes=node[001-040],gpunode[001-004]
>
> PartitionName=batch Default=YES MinNodes=1 MaxNodes=40
> DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL
> PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO
> Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000
> AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO
> OverTimeLimit=0 State=UP Nodes=node[001-040]
>
> PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00
> MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
> PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL
> LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO
> OverTimeLimit=0 State=UP Nodes=gpunode[001-004]
>
> =
>
> > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC  wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Hi All,
> >
> > Thanks all for your posts
> >
> > Reading the documentation of Slurm and other sites like Niflheim
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole
> Holm Nielsen) the parameter "Weight" is to assign a value to the nodes,
> with this you can have priority in the nodes

Re: [slurm-users] Slurm configuration, Weight Parameter

2019-12-03 Thread Sistemas NLHPC
Hi Renfro

I am testing this configuration, test configuration and as clean as
possible:



NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle
Sockets=2 CoresPerSocket=1

PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes Shared=yes
State=up

===

In your config is necessary one plugin extra or parameter for option
Weight?

The configuration does not work as expected.

Regards,

El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael ()
escribió:

> We’ve been using that weighting scheme for a year or so, and it works as
> expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines
> like you have, but here’s our node settings and a subset of our partition
> settings.
>
> In our environment, we’d often have lots of idle cores on GPU nodes, since
> those jobs tend to be GPU-bound rather than CPU-bound. So in one of our
> interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU
> node. Additionally, we have three memory configurations in our main batch
> partition. We want to bias jobs to running on the smaller-memory nodes by
> default. And the same principle applies to our GPU partition, where the
> smaller-memory GPU nodes get jobs before the larger-memory GPU node.
>
> =
>
> NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2
> ThreadsPerCore=1 Weight=10011 Gres=gpu:2
> NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2
> ThreadsPerCore=1 Weight=10021 Gres=gpu:2
> NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2
> ThreadsPerCore=1 Weight=10201
> NodeName=node[023-034]  CoresPerSocket=14 RealMemory=126000 Sockets=2
> ThreadsPerCore=1 Weight=10211
> NodeName=node[035-040]  CoresPerSocket=14 RealMemory=254000 Sockets=2
> ThreadsPerCore=1 Weight=10221
>
> PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4
> MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
> PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL
> LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0
> State=UP Nodes=node[001-040],gpunode[001-004]
>
> PartitionName=batch Default=YES MinNodes=1 MaxNodes=40
> DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL
> PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO
> Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000
> AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO
> OverTimeLimit=0 State=UP Nodes=node[001-040]
>
> PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00
> MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
> PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL
> LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO
> OverTimeLimit=0 State=UP Nodes=gpunode[001-004]
>
> =
>
> > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC  wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Hi All,
> >
> > Thanks all for your posts
> >
> > Reading the documentation of Slurm and other sites like Niflheim
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole
> Holm Nielsen) the parameter "Weight" is to assign a value to the nodes,
> with this you can have priority in the nodes. But I have not obtained
> positive results.
> >
> > Thanks in advance
> >
> > Regards
> >
> > El sáb., 23 nov. 2019 a las 14:18, Chris Samuel ()
> escribió:
> > On 23/11/19 9:14 am, Chris Samuel wrote:
> >
> > > My gut instinct (and I've never tried this) is to make the 3GB nodes
> be
> > > in a separate partition that is guarded by AllowQos=3GB and have a QOS
> > > called "3GB" that uses MinTRESPerJob to require jobs to ask for more
> > > than 2GB of RAM to be allowed into the QOS.
> >
> > Of course there's nothing to stop a user requesting more memory than
> > they need to get access to these nodes, but that's a social issue not a
> > technical one. :-)
> >
> > --
> >   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> >
>
>


Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-30 Thread Renfro, Michael
We’ve been using that weighting scheme for a year or so, and it works as 
expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines 
like you have, but here’s our node settings and a subset of our partition 
settings.

In our environment, we’d often have lots of idle cores on GPU nodes, since 
those jobs tend to be GPU-bound rather than CPU-bound. So in one of our 
interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU node. 
Additionally, we have three memory configurations in our main batch partition. 
We want to bias jobs to running on the smaller-memory nodes by default. And the 
same principle applies to our GPU partition, where the smaller-memory GPU nodes 
get jobs before the larger-memory GPU node.

=

NodeName=gpunode[001-003]  CoresPerSocket=14 RealMemory=382000 Sockets=2 
ThreadsPerCore=1 Weight=10011 Gres=gpu:2
NodeName=gpunode004  CoresPerSocket=14 RealMemory=894000 Sockets=2 
ThreadsPerCore=1 Weight=10021 Gres=gpu:2
NodeName=node[001-022]  CoresPerSocket=14 RealMemory=62000 Sockets=2 
ThreadsPerCore=1 Weight=10201
NodeName=node[023-034]  CoresPerSocket=14 RealMemory=126000 Sockets=2 
ThreadsPerCore=1 Weight=10211
NodeName=node[035-040]  CoresPerSocket=14 RealMemory=254000 Sockets=2 
ThreadsPerCore=1 Weight=10221

PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 MaxTime=02:00:00 
AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO 
RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO 
DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=12 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP 
Nodes=node[001-040],gpunode[001-004]

PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040]

PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 
MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF 
ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO 
MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 
State=UP Nodes=gpunode[001-004]

=

> On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC  wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hi All,
> 
> Thanks all for your posts 
> 
> Reading the documentation of Slurm and other sites like Niflheim 
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole Holm 
> Nielsen) the parameter "Weight" is to assign a value to the nodes, with this 
> you can have priority in the nodes. But I have not obtained positive results.
> 
> Thanks in advance
> 
> Regards
> 
> El sáb., 23 nov. 2019 a las 14:18, Chris Samuel () 
> escribió:
> On 23/11/19 9:14 am, Chris Samuel wrote:
> 
> > My gut instinct (and I've never tried this) is to make the 3GB nodes be 
> > in a separate partition that is guarded by AllowQos=3GB and have a QOS 
> > called "3GB" that uses MinTRESPerJob to require jobs to ask for more 
> > than 2GB of RAM to be allowed into the QOS.
> 
> Of course there's nothing to stop a user requesting more memory than 
> they need to get access to these nodes, but that's a social issue not a 
> technical one. :-)
> 
> -- 
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
> 



Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-29 Thread Sistemas NLHPC
Hi All,

Thanks all for your posts

Reading the documentation of Slurm and other sites like Niflheim
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole
Holm Nielsen) the parameter "Weight" is to assign a value to the nodes,
with this you can have priority in the nodes. But I have not obtained
positive results.

Thanks in advance

Regards

El sáb., 23 nov. 2019 a las 14:18, Chris Samuel ()
escribió:

> On 23/11/19 9:14 am, Chris Samuel wrote:
>
> > My gut instinct (and I've never tried this) is to make the 3GB nodes be
> > in a separate partition that is guarded by AllowQos=3GB and have a QOS
> > called "3GB" that uses MinTRESPerJob to require jobs to ask for more
> > than 2GB of RAM to be allowed into the QOS.
>
> Of course there's nothing to stop a user requesting more memory than
> they need to get access to these nodes, but that's a social issue not a
> technical one. :-)
>
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>


Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-23 Thread Chris Samuel

On 23/11/19 9:14 am, Chris Samuel wrote:

My gut instinct (and I've never tried this) is to make the 3GB nodes be 
in a separate partition that is guarded by AllowQos=3GB and have a QOS 
called "3GB" that uses MinTRESPerJob to require jobs to ask for more 
than 2GB of RAM to be allowed into the QOS.


Of course there's nothing to stop a user requesting more memory than 
they need to get access to these nodes, but that's a social issue not a 
technical one. :-)


--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-23 Thread Chris Samuel

On 21/11/19 7:25 am, Sistemas NLHPC wrote:

Currently we have two types of nodes, one with 3GB and another with 2GB 
of RAM, it is required that in nodes of 3 GB it is not allowed to 
execute tasks with less than 2GB, to avoid underutilization of resources.


My gut instinct (and I've never tried this) is to make the 3GB nodes be 
in a separate partition that is guarded by AllowQos=3GB and have a QOS 
called "3GB" that uses MinTRESPerJob to require jobs to ask for more 
than 2GB of RAM to be allowed into the QOS.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-22 Thread Goetz, Patrick G
Can't you just set the usage priority to be higher for the 2GB machines? 
  This way, if the requested memory is less than 2GB those machines will 
be used first, and larger jobs skip to the higher memory machines. 
 


On 11/21/19 9:44 AM, Jim Prewett wrote:
> 
> Hi Sistemas,
> 
> I could be mistaken, but I don't think there is a way to require jobs on 
> the 3GB nodes to request more than 2GB!
> 
> https://slurm.schedmd.com/slurm.conf.html states this: "Note that if a 
> job allocation request can not be satisfied using the nodes with the 
> lowest weight, the set of nodes with the next lowest weight is added to 
> the set of nodes under consideration for use (repeat as needed for 
> higher weight values)."
> 
> I read that to mean "if there are only 3GB nodes available, jobs will be 
> run there reguardless of the memory needed."  We had a similar request 
> but were unable to find a solution (and, ultimately the particular user 
> is happier to not have idle machines when there's work to be done!).
> 
> If I'm misunderstanding, I'd love to know!
> 
> HTH,
> Jim
> 
> On Thu, 21 Nov 2019, Sistemas NLHPC wrote:
> 
>> Hi all,
>>
>> Currently we have two types of nodes, one with 3GB and another with 
>> 2GB of
>> RAM, it is required that in nodes of 3 GB it is not allowed to execute
>> tasks with less than 2GB, to avoid underutilization of resources.
>>
>> This, because we have nodes that can fulfill the condition of executing
>> tasks with 2GB or less.
>>
>> I try in the nodes configuration with the option "Weight".I send 
>> multiples
>> jobs but slurm not asigned by "Weight", it's arbitrary in the order how
>> send jobs. Some configuration and logs:
>>
>> slurm.conf
>>
>> NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle
>> Sockets=2 CoresPerSocket=1
>> NodeName=devcn050
>>
>> NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle
>> Sockets=2 CoresPerSocket=1
>> NodeName=devcn002
>>
>> NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle
>> Sockets=2 CoresPerSocket=1
>> NodeName=devcn001
>>
>> Extra information, I see that slurm assing Weight in the node.
>>
>> # sinfo -N -l
>>
>> NODELIST   NODES PARTITION   STATE CPUS    S:C:T MEMORY TMP_DISK 
>> WEIGHT
>> AVAIL_FE REASON
>> devcn001   1  slims*   idle   2
>> 2:1:1   2000   0    1 2000MB    none
>>
>> devcn002   1  slims*   idle   2
>> 2:1:1   3007   0 100    3007MB    none
>>
>> devcn050   1  slims*   idle   2
>> 2:1:1   3007   0 500    3007MB    none
>>
>> I test other settings, such as the TRESWeigths parameter with no results,
>> for example:
>>
>> NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB"
>>
>> Too PriorityType=priority/multifactor plugin is also activated and
>> deactivated to test, but in all these cases it does not work.
>>
>> Thanks in advance.
>>
>> Regards.
>>
> 
> James E. Prewett    j...@prewett.org downl...@hpc.unm.edu
> Systems Team Leader   LoGS: http://www.hpc.unm.edu/~download/LoGS/
> Designated Security Officer OpenPGP key: pub 1024D/31816D93
> HPC Systems Engineer III   UNM HPC  505.277.8210
> 
>>> This message is from an external sender. Learn more about why this <<
>>> matters at https://links.utexas.edu/rtyclf.    <<
> 


Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-21 Thread Jim Prewett



Hi Sistemas,

I could be mistaken, but I don't think there is a way to require jobs on 
the 3GB nodes to request more than 2GB!


https://slurm.schedmd.com/slurm.conf.html states this: "Note that if a job 
allocation request can not be satisfied using the nodes with the lowest 
weight, the set of nodes with the next lowest weight is added to the set 
of nodes under consideration for use (repeat as needed for higher weight 
values)."


I read that to mean "if there are only 3GB nodes available, jobs will be 
run there reguardless of the memory needed."  We had a similar request but 
were unable to find a solution (and, ultimately the particular user is 
happier to not have idle machines when there's work to be done!).


If I'm misunderstanding, I'd love to know!

HTH,
Jim

On Thu, 21 Nov 2019, Sistemas NLHPC wrote:


Hi all,

Currently we have two types of nodes, one with 3GB and another with 2GB of
RAM, it is required that in nodes of 3 GB it is not allowed to execute
tasks with less than 2GB, to avoid underutilization of resources.

This, because we have nodes that can fulfill the condition of executing
tasks with 2GB or less.

I try in the nodes configuration with the option "Weight".I send multiples
jobs but slurm not asigned by "Weight", it's arbitrary in the order how
send jobs. Some configuration and logs:

slurm.conf

NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn050

NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn002

NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn001

Extra information, I see that slurm assing Weight in the node.

# sinfo -N -l

NODELIST   NODES PARTITION   STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT
AVAIL_FE REASON
devcn001   1  slims*   idle   2
2:1:1   2000   01 2000MBnone

devcn002   1  slims*   idle   2
2:1:1   3007   0 1003007MBnone

devcn050   1  slims*   idle   2
2:1:1   3007   0 5003007MBnone

I test other settings, such as the TRESWeigths parameter with no results,
for example:

NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB"

Too PriorityType=priority/multifactor plugin is also activated and
deactivated to test, but in all these cases it does not work.

Thanks in advance.

Regards.



James E. Prewettj...@prewett.org downl...@hpc.unm.edu
Systems Team Leader   LoGS: http://www.hpc.unm.edu/~download/LoGS/
Designated Security Officer OpenPGP key: pub 1024D/31816D93
HPC Systems Engineer III   UNM HPC  505.277.8210



[slurm-users] Slurm configuration, Weight Parameter

2019-11-21 Thread Sistemas NLHPC
Hi all,

Currently we have two types of nodes, one with 3GB and another with 2GB of
RAM, it is required that in nodes of 3 GB it is not allowed to execute
tasks with less than 2GB, to avoid underutilization of resources.

This, because we have nodes that can fulfill the condition of executing
tasks with 2GB or less.

I try in the nodes configuration with the option "Weight".I send multiples
jobs but slurm not asigned by "Weight", it's arbitrary in the order how
send jobs. Some configuration and logs:

slurm.conf

NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn050

NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn002

NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle
Sockets=2 CoresPerSocket=1
NodeName=devcn001

Extra information, I see that slurm assing Weight in the node.

# sinfo -N -l

NODELIST   NODES PARTITION   STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT
AVAIL_FE REASON
devcn001   1  slims*   idle   2
2:1:1   2000   01 2000MBnone

devcn002   1  slims*   idle   2
2:1:1   3007   0 1003007MBnone

devcn050   1  slims*   idle   2
2:1:1   3007   0 5003007MBnone

I test other settings, such as the TRESWeigths parameter with no results,
for example:

NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB"

Too PriorityType=priority/multifactor plugin is also activated and
deactivated to test, but in all these cases it does not work.

Thanks in advance.

Regards.