Re: [slurm-users] Slurm configuration, Weight Parameter
We have weights and priority/multifactor. Jeff From: Sistemas NLHPC [mailto:siste...@nlhpc.cl] Sent: Thursday, December 05, 2019 12:01 PM To: Sarlo, Jeffrey S; Slurm User Community List Subject: Re: [slurm-users] Slurm configuration, Weight Parameter Thanks Jeff ! We upgrade slurm to 18.08.4 and now work with Weight ! but the parameter its possible running with plugin priority/multifactor ? Thanks in advance Regards El mar., 3 dic. 2019 a las 17:37, Sarlo, Jeffrey S (mailto:jsa...@central.uh.edu>>) escribió: Which version of slurm are you using? I know in the early versions of 18.08 prior to 18.08.04 there was a bug with weights not working. Once we got past 18.08.04, then weights worked for us. Jeff University of Houston - HPC From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>] On Behalf Of Sistemas NLHPC Sent: Tuesday, December 03, 2019 12:33 PM To: Slurm User Community List Subject: Re: [slurm-users] Slurm configuration, Weight Parameter Hi Renfro I am testing this configuration, test configuration and as clean as possible: NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle Sockets=2 CoresPerSocket=1 PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes Shared=yes State=up === In your config is necessary one plugin extra or parameter for option Weight? The configuration does not work as expected. Regards, El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael (mailto:ren...@tntech.edu>>) escribió: We’ve been using that weighting scheme for a year or so, and it works as expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines like you have, but here’s our node settings and a subset of our partition settings. In our environment, we’d often have lots of idle cores on GPU nodes, since those jobs tend to be GPU-bound rather than CPU-bound. So in one of our interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU node. Additionally, we have three memory configurations in our main batch partition. We want to bias jobs to running on the smaller-memory nodes by default. And the same principle applies to our GPU partition, where the smaller-memory GPU nodes get jobs before the larger-memory GPU node. = NodeName=gpunode[001-003] CoresPerSocket=14 RealMemory=382000 Sockets=2 ThreadsPerCore=1 Weight=10011 Gres=gpu:2 NodeName=gpunode004 CoresPerSocket=14 RealMemory=894000 Sockets=2 ThreadsPerCore=1 Weight=10021 Gres=gpu:2 NodeName=node[001-022] CoresPerSocket=14 RealMemory=62000 Sockets=2 ThreadsPerCore=1 Weight=10201 NodeName=node[023-034] CoresPerSocket=14 RealMemory=126000 Sockets=2 ThreadsPerCore=1 Weight=10211 NodeName=node[035-040] CoresPerSocket=14 RealMemory=254000 Sockets=2 ThreadsPerCore=1 Weight=10221 PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040],gpunode[001-004] PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040] PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004] = > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC > mailto:siste...@nlhpc.cl>> wrote: > > External Email Warning > This email originated from outside the university. Please use caution when > opening attachments, clicking links, or responding to requests. > Hi All, > > Thanks all for your posts > > Reading the documentation of Slurm and other sites like Niflheim > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole Holm > Nielsen) the parameter "Weight" is to assign a value to the nodes, with this > you can have priority in the nodes. But I have not obtained positive results. > > Thanks in advance > > Regards > > El sáb., 23 nov. 2019 a las 14:18, Chris Samuel &g
Re: [slurm-users] Slurm configuration, Weight Parameter
Thanks Jeff ! We upgrade slurm to 18.08.4 and now work with Weight ! but the parameter its possible running with plugin priority/multifactor ? Thanks in advance Regards El mar., 3 dic. 2019 a las 17:37, Sarlo, Jeffrey S () escribió: > Which version of slurm are you using? I know in the early versions of > 18.08 prior to 18.08.04 there was a bug with weights not working. Once we > got past 18.08.04, then weights worked for us. > > > > Jeff > > University of Houston - HPC > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *Sistemas NLHPC > *Sent:* Tuesday, December 03, 2019 12:33 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] Slurm configuration, Weight Parameter > > > > Hi Renfro > > > > I am testing this configuration, test configuration and as clean as > possible: > > > > > > > > NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle > Sockets=2 CoresPerSocket=1 > NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle > Sockets=2 CoresPerSocket=1 > NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle > Sockets=2 CoresPerSocket=1 > > PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes > Shared=yes State=up > > > > === > > > > In your config is necessary one plugin extra or parameter for option > Weight? > > > > The configuration does not work as expected. > > > > Regards, > > > > El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael () > escribió: > > We’ve been using that weighting scheme for a year or so, and it works as > expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines > like you have, but here’s our node settings and a subset of our partition > settings. > > In our environment, we’d often have lots of idle cores on GPU nodes, since > those jobs tend to be GPU-bound rather than CPU-bound. So in one of our > interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU > node. Additionally, we have three memory configurations in our main batch > partition. We want to bias jobs to running on the smaller-memory nodes by > default. And the same principle applies to our GPU partition, where the > smaller-memory GPU nodes get jobs before the larger-memory GPU node. > > = > > NodeName=gpunode[001-003] CoresPerSocket=14 RealMemory=382000 Sockets=2 > ThreadsPerCore=1 Weight=10011 Gres=gpu:2 > NodeName=gpunode004 CoresPerSocket=14 RealMemory=894000 Sockets=2 > ThreadsPerCore=1 Weight=10021 Gres=gpu:2 > NodeName=node[001-022] CoresPerSocket=14 RealMemory=62000 Sockets=2 > ThreadsPerCore=1 Weight=10201 > NodeName=node[023-034] CoresPerSocket=14 RealMemory=126000 Sockets=2 > ThreadsPerCore=1 Weight=10211 > NodeName=node[035-040] CoresPerSocket=14 RealMemory=254000 Sockets=2 > ThreadsPerCore=1 Weight=10221 > > PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 > MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 > DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 > PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL > LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 > State=UP Nodes=node[001-040],gpunode[001-004] > > PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 > DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL > PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO > Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 > AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO > OverTimeLimit=0 State=UP Nodes=node[001-040] > > PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 > MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 > DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 > PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL > LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO > OverTimeLimit=0 State=UP Nodes=gpunode[001-004] > > = > > > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Hi All, > > > > Thanks all for your posts > > > > Reading the documentation of Slurm and other sites like Niflheim > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole > Holm Nielsen) the parameter "Weight" is to assign a value to the nodes, > with this you can have priority in the nodes
Re: [slurm-users] Slurm configuration, Weight Parameter
Hi Renfro I am testing this configuration, test configuration and as clean as possible: NodeName=devcn050 RealMemory=3007 Features=3007MB Weight=200 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn002 RealMemory=3007 Features=3007MB Weight=1 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn001 RealMemory=2000 Features=2000MB Weight=500 State=idle Sockets=2 CoresPerSocket=1 PartitionName=slims Nodes=devcn001,devcn002,devcn050 Default=yes Shared=yes State=up === In your config is necessary one plugin extra or parameter for option Weight? The configuration does not work as expected. Regards, El sáb., 30 nov. 2019 a las 10:30, Renfro, Michael () escribió: > We’ve been using that weighting scheme for a year or so, and it works as > expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines > like you have, but here’s our node settings and a subset of our partition > settings. > > In our environment, we’d often have lots of idle cores on GPU nodes, since > those jobs tend to be GPU-bound rather than CPU-bound. So in one of our > interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU > node. Additionally, we have three memory configurations in our main batch > partition. We want to bias jobs to running on the smaller-memory nodes by > default. And the same principle applies to our GPU partition, where the > smaller-memory GPU nodes get jobs before the larger-memory GPU node. > > = > > NodeName=gpunode[001-003] CoresPerSocket=14 RealMemory=382000 Sockets=2 > ThreadsPerCore=1 Weight=10011 Gres=gpu:2 > NodeName=gpunode004 CoresPerSocket=14 RealMemory=894000 Sockets=2 > ThreadsPerCore=1 Weight=10021 Gres=gpu:2 > NodeName=node[001-022] CoresPerSocket=14 RealMemory=62000 Sockets=2 > ThreadsPerCore=1 Weight=10201 > NodeName=node[023-034] CoresPerSocket=14 RealMemory=126000 Sockets=2 > ThreadsPerCore=1 Weight=10211 > NodeName=node[035-040] CoresPerSocket=14 RealMemory=254000 Sockets=2 > ThreadsPerCore=1 Weight=10221 > > PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 > MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 > DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 > PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL > LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 > State=UP Nodes=node[001-040],gpunode[001-004] > > PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 > DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL > PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO > Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 > AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO > OverTimeLimit=0 State=UP Nodes=node[001-040] > > PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 > MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 > DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 > PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL > LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO > OverTimeLimit=0 State=UP Nodes=gpunode[001-004] > > = > > > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC wrote: > > > > External Email Warning > > This email originated from outside the university. Please use caution > when opening attachments, clicking links, or responding to requests. > > Hi All, > > > > Thanks all for your posts > > > > Reading the documentation of Slurm and other sites like Niflheim > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole > Holm Nielsen) the parameter "Weight" is to assign a value to the nodes, > with this you can have priority in the nodes. But I have not obtained > positive results. > > > > Thanks in advance > > > > Regards > > > > El sáb., 23 nov. 2019 a las 14:18, Chris Samuel () > escribió: > > On 23/11/19 9:14 am, Chris Samuel wrote: > > > > > My gut instinct (and I've never tried this) is to make the 3GB nodes > be > > > in a separate partition that is guarded by AllowQos=3GB and have a QOS > > > called "3GB" that uses MinTRESPerJob to require jobs to ask for more > > > than 2GB of RAM to be allowed into the QOS. > > > > Of course there's nothing to stop a user requesting more memory than > > they need to get access to these nodes, but that's a social issue not a > > technical one. :-) > > > > -- > > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > >
Re: [slurm-users] Slurm configuration, Weight Parameter
We’ve been using that weighting scheme for a year or so, and it works as expected. Not sure how Slurm would react to multiple NodeName=DEFAULT lines like you have, but here’s our node settings and a subset of our partition settings. In our environment, we’d often have lots of idle cores on GPU nodes, since those jobs tend to be GPU-bound rather than CPU-bound. So in one of our interactive partitions, we let non-GPU jobs take up to 12 cores of a GPU node. Additionally, we have three memory configurations in our main batch partition. We want to bias jobs to running on the smaller-memory nodes by default. And the same principle applies to our GPU partition, where the smaller-memory GPU nodes get jobs before the larger-memory GPU node. = NodeName=gpunode[001-003] CoresPerSocket=14 RealMemory=382000 Sockets=2 ThreadsPerCore=1 Weight=10011 Gres=gpu:2 NodeName=gpunode004 CoresPerSocket=14 RealMemory=894000 Sockets=2 ThreadsPerCore=1 Weight=10021 Gres=gpu:2 NodeName=node[001-022] CoresPerSocket=14 RealMemory=62000 Sockets=2 ThreadsPerCore=1 Weight=10201 NodeName=node[023-034] CoresPerSocket=14 RealMemory=126000 Sockets=2 ThreadsPerCore=1 Weight=10211 NodeName=node[035-040] CoresPerSocket=14 RealMemory=254000 Sockets=2 ThreadsPerCore=1 Weight=10221 PartitionName=any-interactive Default=NO MinNodes=1 MaxNodes=4 MaxTime=02:00:00 AllowGroups=ALL PriorityJobFactor=3 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=12 ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040],gpunode[001-004] PartitionName=batch Default=YES MinNodes=1 MaxNodes=40 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-040] PartitionName=gpu Default=NO MinNodes=1 DefaultTime=1-00:00:00 MaxTime=30-00:00:00 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=2000 AllowAccounts=ALL AllowQos=ALL LLN=NO MaxCPUsPerNode=16 QoS=gpu ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpunode[001-004] = > On Nov 29, 2019, at 8:09 AM, Sistemas NLHPC wrote: > > External Email Warning > This email originated from outside the university. Please use caution when > opening attachments, clicking links, or responding to requests. > Hi All, > > Thanks all for your posts > > Reading the documentation of Slurm and other sites like Niflheim > https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole Holm > Nielsen) the parameter "Weight" is to assign a value to the nodes, with this > you can have priority in the nodes. But I have not obtained positive results. > > Thanks in advance > > Regards > > El sáb., 23 nov. 2019 a las 14:18, Chris Samuel () > escribió: > On 23/11/19 9:14 am, Chris Samuel wrote: > > > My gut instinct (and I've never tried this) is to make the 3GB nodes be > > in a separate partition that is guarded by AllowQos=3GB and have a QOS > > called "3GB" that uses MinTRESPerJob to require jobs to ask for more > > than 2GB of RAM to be allowed into the QOS. > > Of course there's nothing to stop a user requesting more memory than > they need to get access to these nodes, but that's a social issue not a > technical one. :-) > > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA >
Re: [slurm-users] Slurm configuration, Weight Parameter
Hi All, Thanks all for your posts Reading the documentation of Slurm and other sites like Niflheim https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#node-weight (Ole Holm Nielsen) the parameter "Weight" is to assign a value to the nodes, with this you can have priority in the nodes. But I have not obtained positive results. Thanks in advance Regards El sáb., 23 nov. 2019 a las 14:18, Chris Samuel () escribió: > On 23/11/19 9:14 am, Chris Samuel wrote: > > > My gut instinct (and I've never tried this) is to make the 3GB nodes be > > in a separate partition that is guarded by AllowQos=3GB and have a QOS > > called "3GB" that uses MinTRESPerJob to require jobs to ask for more > > than 2GB of RAM to be allowed into the QOS. > > Of course there's nothing to stop a user requesting more memory than > they need to get access to these nodes, but that's a social issue not a > technical one. :-) > > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > >
Re: [slurm-users] Slurm configuration, Weight Parameter
On 23/11/19 9:14 am, Chris Samuel wrote: My gut instinct (and I've never tried this) is to make the 3GB nodes be in a separate partition that is guarded by AllowQos=3GB and have a QOS called "3GB" that uses MinTRESPerJob to require jobs to ask for more than 2GB of RAM to be allowed into the QOS. Of course there's nothing to stop a user requesting more memory than they need to get access to these nodes, but that's a social issue not a technical one. :-) -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Slurm configuration, Weight Parameter
On 21/11/19 7:25 am, Sistemas NLHPC wrote: Currently we have two types of nodes, one with 3GB and another with 2GB of RAM, it is required that in nodes of 3 GB it is not allowed to execute tasks with less than 2GB, to avoid underutilization of resources. My gut instinct (and I've never tried this) is to make the 3GB nodes be in a separate partition that is guarded by AllowQos=3GB and have a QOS called "3GB" that uses MinTRESPerJob to require jobs to ask for more than 2GB of RAM to be allowed into the QOS. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Slurm configuration, Weight Parameter
Can't you just set the usage priority to be higher for the 2GB machines? This way, if the requested memory is less than 2GB those machines will be used first, and larger jobs skip to the higher memory machines. On 11/21/19 9:44 AM, Jim Prewett wrote: > > Hi Sistemas, > > I could be mistaken, but I don't think there is a way to require jobs on > the 3GB nodes to request more than 2GB! > > https://slurm.schedmd.com/slurm.conf.html states this: "Note that if a > job allocation request can not be satisfied using the nodes with the > lowest weight, the set of nodes with the next lowest weight is added to > the set of nodes under consideration for use (repeat as needed for > higher weight values)." > > I read that to mean "if there are only 3GB nodes available, jobs will be > run there reguardless of the memory needed." We had a similar request > but were unable to find a solution (and, ultimately the particular user > is happier to not have idle machines when there's work to be done!). > > If I'm misunderstanding, I'd love to know! > > HTH, > Jim > > On Thu, 21 Nov 2019, Sistemas NLHPC wrote: > >> Hi all, >> >> Currently we have two types of nodes, one with 3GB and another with >> 2GB of >> RAM, it is required that in nodes of 3 GB it is not allowed to execute >> tasks with less than 2GB, to avoid underutilization of resources. >> >> This, because we have nodes that can fulfill the condition of executing >> tasks with 2GB or less. >> >> I try in the nodes configuration with the option "Weight".I send >> multiples >> jobs but slurm not asigned by "Weight", it's arbitrary in the order how >> send jobs. Some configuration and logs: >> >> slurm.conf >> >> NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle >> Sockets=2 CoresPerSocket=1 >> NodeName=devcn050 >> >> NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle >> Sockets=2 CoresPerSocket=1 >> NodeName=devcn002 >> >> NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle >> Sockets=2 CoresPerSocket=1 >> NodeName=devcn001 >> >> Extra information, I see that slurm assing Weight in the node. >> >> # sinfo -N -l >> >> NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK >> WEIGHT >> AVAIL_FE REASON >> devcn001 1 slims* idle 2 >> 2:1:1 2000 0 1 2000MB none >> >> devcn002 1 slims* idle 2 >> 2:1:1 3007 0 100 3007MB none >> >> devcn050 1 slims* idle 2 >> 2:1:1 3007 0 500 3007MB none >> >> I test other settings, such as the TRESWeigths parameter with no results, >> for example: >> >> NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB" >> >> Too PriorityType=priority/multifactor plugin is also activated and >> deactivated to test, but in all these cases it does not work. >> >> Thanks in advance. >> >> Regards. >> > > James E. Prewett j...@prewett.org downl...@hpc.unm.edu > Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ > Designated Security Officer OpenPGP key: pub 1024D/31816D93 > HPC Systems Engineer III UNM HPC 505.277.8210 > >>> This message is from an external sender. Learn more about why this << >>> matters at https://links.utexas.edu/rtyclf. << >
Re: [slurm-users] Slurm configuration, Weight Parameter
Hi Sistemas, I could be mistaken, but I don't think there is a way to require jobs on the 3GB nodes to request more than 2GB! https://slurm.schedmd.com/slurm.conf.html states this: "Note that if a job allocation request can not be satisfied using the nodes with the lowest weight, the set of nodes with the next lowest weight is added to the set of nodes under consideration for use (repeat as needed for higher weight values)." I read that to mean "if there are only 3GB nodes available, jobs will be run there reguardless of the memory needed." We had a similar request but were unable to find a solution (and, ultimately the particular user is happier to not have idle machines when there's work to be done!). If I'm misunderstanding, I'd love to know! HTH, Jim On Thu, 21 Nov 2019, Sistemas NLHPC wrote: Hi all, Currently we have two types of nodes, one with 3GB and another with 2GB of RAM, it is required that in nodes of 3 GB it is not allowed to execute tasks with less than 2GB, to avoid underutilization of resources. This, because we have nodes that can fulfill the condition of executing tasks with 2GB or less. I try in the nodes configuration with the option "Weight".I send multiples jobs but slurm not asigned by "Weight", it's arbitrary in the order how send jobs. Some configuration and logs: slurm.conf NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn050 NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn002 NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn001 Extra information, I see that slurm assing Weight in the node. # sinfo -N -l NODELIST NODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON devcn001 1 slims* idle 2 2:1:1 2000 01 2000MBnone devcn002 1 slims* idle 2 2:1:1 3007 0 1003007MBnone devcn050 1 slims* idle 2 2:1:1 3007 0 5003007MBnone I test other settings, such as the TRESWeigths parameter with no results, for example: NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB" Too PriorityType=priority/multifactor plugin is also activated and deactivated to test, but in all these cases it does not work. Thanks in advance. Regards. James E. Prewettj...@prewett.org downl...@hpc.unm.edu Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/ Designated Security Officer OpenPGP key: pub 1024D/31816D93 HPC Systems Engineer III UNM HPC 505.277.8210
[slurm-users] Slurm configuration, Weight Parameter
Hi all, Currently we have two types of nodes, one with 3GB and another with 2GB of RAM, it is required that in nodes of 3 GB it is not allowed to execute tasks with less than 2GB, to avoid underutilization of resources. This, because we have nodes that can fulfill the condition of executing tasks with 2GB or less. I try in the nodes configuration with the option "Weight".I send multiples jobs but slurm not asigned by "Weight", it's arbitrary in the order how send jobs. Some configuration and logs: slurm.conf NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=500 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn050 NodeName=DEFAULT RealMemory=3007 Features=3007MB Weight=100 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn002 NodeName=DEFAULT RealMemory=2000 Features=2000MB Weight=1 State=idle Sockets=2 CoresPerSocket=1 NodeName=devcn001 Extra information, I see that slurm assing Weight in the node. # sinfo -N -l NODELIST NODES PARTITION STATE CPUSS:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON devcn001 1 slims* idle 2 2:1:1 2000 01 2000MBnone devcn002 1 slims* idle 2 2:1:1 3007 0 1003007MBnone devcn050 1 slims* idle 2 2:1:1 3007 0 5003007MBnone I test other settings, such as the TRESWeigths parameter with no results, for example: NodeName=devcn001 TRESWeights="CPU=2.0,Mem=2000MB" Too PriorityType=priority/multifactor plugin is also activated and deactivated to test, but in all these cases it does not work. Thanks in advance. Regards.