[slurm-dev] Re: Dynamic partitions on Linux cluster
I would also recommend QOS if you absolutely can't use fairshare. Set up a QOS per institute with a GrpNodes limit that is the correct ratio and only allow institute members to their QOS (make it their default too). Alternatively you can also do one account per institute and set GrpNodes there, though that is less flexible than a QOS. Ryan On 08/14/2014 07:48 AM, Paul Edmon wrote: We have a bit of a similar situation here. A possible solution that may work for you is QoS. The QoS's behave like a synthetic partition. That way you can have a single partition but multiple QoS's which can flex around down nodes. From the experimentation I have done with them this may be a good solution for you. -Paul Edmon- On 08/14/2014 09:25 AM, Uwe Sauter wrote: I would totally agree with you but university administration has to justify the part of the first institute (because it was paid with federal money) while the other institute paid for themselves and can do with their part what they want. This is the reason for the current unflexible mapping between partition and nodes. To get away from that for better availability I'm looking for a way to have a dynamic mapping that just enforces the ratio between the institutes while flexibly allocate the nodes from the whole pool. I know its a waste of resources but I am bound to this decision... Regards, Uwe Am 14.08.2014 um 14:59 schrieb Bill Barth: Yes, yes it does. I don't mean to be harsh, but doing it their way is a potentially huge waste of resources. Why not get each institute to agree to share the whole machine in proportion to what they paid? Each institute gets an allocation of time (through accounting) and a fairshare fraction in the ratio of their contribution, but is allowed to use the whole machine. If both institutes have periods of down time, then the machine will be less likely to sit idle and more work will get done. I'll get off my soapbox now. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 8/14/14, 7:48 AM, "Uwe Sauter" wrote: Hi Bill, if I understand the concept of fairshare correctly, this could result in a situation where one institute uses all resources. Because of this fairshare is out of the question as I have to enforce the ratio between the institutes - I cannot allow usage that would result in one institute using more than what they paied for. If an institute doesn't use the resources they have to run idle (or power down). You could compare my situation with running two clusters that use the same base infrastructure. What I want to do is enable users of both institutes to use both clusters - but for each point in time use a maximum of nodes that belong to "their" cluster. Regards, Uwe Am 14.08.2014 um 14:34 schrieb Bill Barth: Why not make one partition and use fairshare to balance the usage over time? That way both institutes can run large jobs that span the whole machine when others are not using it. Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 8/14/14, 4:11 AM, "Uwe Sauter" wrote: Hi all, I got a question about a configuration detail: "dynamic partitions" Situation: I operate a Linux cluster of currently 54 nodes for a cooperation of two different institutes at the university. To reflect the ratio of cash those institutes invested I configured SLURM with two partition, one for each institute. Those partitions have assigned different numbers of nodes in a hard way, e.g. PartitionName=InstA Nodes=n[01-20] PartitionName=InstB Nodes=n[21-54] To improve availability in case nodes break (and perhaps save some power) I'd like to configure SLURM in a way that jobs can be assigned nodes from the whole pool, respecting the number of nodes each institute bought. Research so far: There is an option for partition configuration called "MaxNodes" but the man pages state that this restricts the maximum number of nodes PER JOB. It probably is possible to get something similar working using limit enforcment through accounting, but I haven't understood that part of SLURM 100% yet. BlueGene systems seem to have the ability for something alike but then this is for IBM systems only. Question: Is it possible to configure SLURM so that both partitions could utilize all nodes but respect a maximum number of nodes that may be used the same time? Something like: PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20 PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34 So is there a way to achieve this using the confg file? Do I have to use accounting to enfoce the limits? Or is there another way that I don't see? Best regards, Uwe Sauter -- Ryan Cox Operations Director Fulton Supercomputing Lab Brigham Young University
[slurm-dev] Re: Dynamic partitions on Linux cluster
We have a bit of a similar situation here. A possible solution that may work for you is QoS. The QoS's behave like a synthetic partition. That way you can have a single partition but multiple QoS's which can flex around down nodes. From the experimentation I have done with them this may be a good solution for you. -Paul Edmon- On 08/14/2014 09:25 AM, Uwe Sauter wrote: I would totally agree with you but university administration has to justify the part of the first institute (because it was paid with federal money) while the other institute paid for themselves and can do with their part what they want. This is the reason for the current unflexible mapping between partition and nodes. To get away from that for better availability I'm looking for a way to have a dynamic mapping that just enforces the ratio between the institutes while flexibly allocate the nodes from the whole pool. I know its a waste of resources but I am bound to this decision... Regards, Uwe Am 14.08.2014 um 14:59 schrieb Bill Barth: Yes, yes it does. I don't mean to be harsh, but doing it their way is a potentially huge waste of resources. Why not get each institute to agree to share the whole machine in proportion to what they paid? Each institute gets an allocation of time (through accounting) and a fairshare fraction in the ratio of their contribution, but is allowed to use the whole machine. If both institutes have periods of down time, then the machine will be less likely to sit idle and more work will get done. I'll get off my soapbox now. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 8/14/14, 7:48 AM, "Uwe Sauter" wrote: Hi Bill, if I understand the concept of fairshare correctly, this could result in a situation where one institute uses all resources. Because of this fairshare is out of the question as I have to enforce the ratio between the institutes - I cannot allow usage that would result in one institute using more than what they paied for. If an institute doesn't use the resources they have to run idle (or power down). You could compare my situation with running two clusters that use the same base infrastructure. What I want to do is enable users of both institutes to use both clusters - but for each point in time use a maximum of nodes that belong to "their" cluster. Regards, Uwe Am 14.08.2014 um 14:34 schrieb Bill Barth: Why not make one partition and use fairshare to balance the usage over time? That way both institutes can run large jobs that span the whole machine when others are not using it. Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 8/14/14, 4:11 AM, "Uwe Sauter" wrote: Hi all, I got a question about a configuration detail: "dynamic partitions" Situation: I operate a Linux cluster of currently 54 nodes for a cooperation of two different institutes at the university. To reflect the ratio of cash those institutes invested I configured SLURM with two partition, one for each institute. Those partitions have assigned different numbers of nodes in a hard way, e.g. PartitionName=InstA Nodes=n[01-20] PartitionName=InstB Nodes=n[21-54] To improve availability in case nodes break (and perhaps save some power) I'd like to configure SLURM in a way that jobs can be assigned nodes from the whole pool, respecting the number of nodes each institute bought. Research so far: There is an option for partition configuration called "MaxNodes" but the man pages state that this restricts the maximum number of nodes PER JOB. It probably is possible to get something similar working using limit enforcment through accounting, but I haven't understood that part of SLURM 100% yet. BlueGene systems seem to have the ability for something alike but then this is for IBM systems only. Question: Is it possible to configure SLURM so that both partitions could utilize all nodes but respect a maximum number of nodes that may be used the same time? Something like: PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20 PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34 So is there a way to achieve this using the confg file? Do I have to use accounting to enfoce the limits? Or is there another way that I don't see? Best regards, Uwe Sauter
[slurm-dev] Re: Dynamic partitions on Linux cluster
I would totally agree with you but university administration has to justify the part of the first institute (because it was paid with federal money) while the other institute paid for themselves and can do with their part what they want. This is the reason for the current unflexible mapping between partition and nodes. To get away from that for better availability I'm looking for a way to have a dynamic mapping that just enforces the ratio between the institutes while flexibly allocate the nodes from the whole pool. I know its a waste of resources but I am bound to this decision... Regards, Uwe Am 14.08.2014 um 14:59 schrieb Bill Barth: > > Yes, yes it does. I don't mean to be harsh, but doing it their way is a > potentially huge waste of resources. Why not get each institute to agree > to share the whole machine in proportion to what they paid? Each institute > gets an allocation of time (through accounting) and a fairshare fraction > in the ratio of their contribution, but is allowed to use the whole > machine. If both institutes have periods of down time, then the machine > will be less likely to sit idle and more work will get done. > > I'll get off my soapbox now. > > Best, > Bill. > -- > Bill Barth, Ph.D., Director, HPC > bba...@tacc.utexas.edu| Phone: (512) 232-7069 > Office: ROC 1.435 | Fax: (512) 475-9445 > > > > > > > > On 8/14/14, 7:48 AM, "Uwe Sauter" wrote: > >> >> Hi Bill, >> >> if I understand the concept of fairshare correctly, this could result in >> a situation where one institute uses all resources. >> >> Because of this fairshare is out of the question as I have to enforce >> the ratio between the institutes - I cannot allow usage that would >> result in one institute using more than what they paied for. If an >> institute doesn't use the resources they have to run idle (or power down). >> >> You could compare my situation with running two clusters that use the >> same base infrastructure. What I want to do is enable users of both >> institutes to use both clusters - but for each point in time use a >> maximum of nodes that belong to "their" cluster. >> >> >> Regards, >> >> Uwe >> >> >> Am 14.08.2014 um 14:34 schrieb Bill Barth: >>> >>> Why not make one partition and use fairshare to balance the usage over >>> time? That way both institutes can run large jobs that span the whole >>> machine when others are not using it. >>> >>> Bill. >>> -- >>> Bill Barth, Ph.D., Director, HPC >>> bba...@tacc.utexas.edu| Phone: (512) 232-7069 >>> Office: ROC 1.435 | Fax: (512) 475-9445 >>> >>> >>> >>> >>> >>> >>> >>> On 8/14/14, 4:11 AM, "Uwe Sauter" wrote: >>> Hi all, I got a question about a configuration detail: "dynamic partitions" Situation: I operate a Linux cluster of currently 54 nodes for a cooperation of two different institutes at the university. To reflect the ratio of cash those institutes invested I configured SLURM with two partition, one for each institute. Those partitions have assigned different numbers of nodes in a hard way, e.g. PartitionName=InstA Nodes=n[01-20] PartitionName=InstB Nodes=n[21-54] To improve availability in case nodes break (and perhaps save some power) I'd like to configure SLURM in a way that jobs can be assigned nodes from the whole pool, respecting the number of nodes each institute bought. Research so far: There is an option for partition configuration called "MaxNodes" but the man pages state that this restricts the maximum number of nodes PER JOB. It probably is possible to get something similar working using limit enforcment through accounting, but I haven't understood that part of SLURM 100% yet. BlueGene systems seem to have the ability for something alike but then this is for IBM systems only. Question: Is it possible to configure SLURM so that both partitions could utilize all nodes but respect a maximum number of nodes that may be used the same time? Something like: PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20 PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34 So is there a way to achieve this using the confg file? Do I have to use accounting to enfoce the limits? Or is there another way that I don't see? Best regards, Uwe Sauter
[slurm-dev] Re: Dynamic partitions on Linux cluster
Yes, yes it does. I don't mean to be harsh, but doing it their way is a potentially huge waste of resources. Why not get each institute to agree to share the whole machine in proportion to what they paid? Each institute gets an allocation of time (through accounting) and a fairshare fraction in the ratio of their contribution, but is allowed to use the whole machine. If both institutes have periods of down time, then the machine will be less likely to sit idle and more work will get done. I'll get off my soapbox now. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 8/14/14, 7:48 AM, "Uwe Sauter" wrote: > >Hi Bill, > >if I understand the concept of fairshare correctly, this could result in >a situation where one institute uses all resources. > >Because of this fairshare is out of the question as I have to enforce >the ratio between the institutes - I cannot allow usage that would >result in one institute using more than what they paied for. If an >institute doesn't use the resources they have to run idle (or power down). > >You could compare my situation with running two clusters that use the >same base infrastructure. What I want to do is enable users of both >institutes to use both clusters - but for each point in time use a >maximum of nodes that belong to "their" cluster. > > >Regards, > > Uwe > > >Am 14.08.2014 um 14:34 schrieb Bill Barth: >> >> Why not make one partition and use fairshare to balance the usage over >> time? That way both institutes can run large jobs that span the whole >> machine when others are not using it. >> >> Bill. >> -- >> Bill Barth, Ph.D., Director, HPC >> bba...@tacc.utexas.edu| Phone: (512) 232-7069 >> Office: ROC 1.435 | Fax: (512) 475-9445 >> >> >> >> >> >> >> >> On 8/14/14, 4:11 AM, "Uwe Sauter" wrote: >> >>> >>> Hi all, >>> >>> I got a question about a configuration detail: "dynamic partitions" >>> >>> Situation: >>> I operate a Linux cluster of currently 54 nodes for a cooperation of >>>two >>> different institutes at the university. To reflect the ratio of cash >>> those institutes invested I configured SLURM with two partition, one >>>for >>> each institute. Those partitions have assigned different numbers of >>> nodes in a hard way, e.g. >>> >>> PartitionName=InstA Nodes=n[01-20] >>> PartitionName=InstB Nodes=n[21-54] >>> >>> To improve availability in case nodes break (and perhaps save some >>> power) I'd like to configure SLURM in a way that jobs can be assigned >>> nodes from the whole pool, respecting the number of nodes each >>>institute >>> bought. >>> >>> >>> Research so far: >>> There is an option for partition configuration called "MaxNodes" but >>>the >>> man pages state that this restricts the maximum number of nodes PER >>>JOB. >>> It probably is possible to get something similar working using limit >>> enforcment through accounting, but I haven't understood that part of >>> SLURM 100% yet. >>> BlueGene systems seem to have the ability for something alike but then >>> this is for IBM systems only. >>> >>> >>> Question: >>> Is it possible to configure SLURM so that both partitions could utilize >>> all nodes but respect a maximum number of nodes that may be used the >>> same time? Something like: >>> >>> PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20 >>> PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34 >>> >>> >>> So is there a way to achieve this using the confg file? Do I have to >>>use >>> accounting to enfoce the limits? Or is there another way that I don't >>>see? >>> >>> >>> Best regards, >>> >>> Uwe Sauter
[slurm-dev] Re: Dynamic partitions on Linux cluster
Hi Bill, if I understand the concept of fairshare correctly, this could result in a situation where one institute uses all resources. Because of this fairshare is out of the question as I have to enforce the ratio between the institutes - I cannot allow usage that would result in one institute using more than what they paied for. If an institute doesn't use the resources they have to run idle (or power down). You could compare my situation with running two clusters that use the same base infrastructure. What I want to do is enable users of both institutes to use both clusters - but for each point in time use a maximum of nodes that belong to "their" cluster. Regards, Uwe Am 14.08.2014 um 14:34 schrieb Bill Barth: > > Why not make one partition and use fairshare to balance the usage over > time? That way both institutes can run large jobs that span the whole > machine when others are not using it. > > Bill. > -- > Bill Barth, Ph.D., Director, HPC > bba...@tacc.utexas.edu| Phone: (512) 232-7069 > Office: ROC 1.435 | Fax: (512) 475-9445 > > > > > > > > On 8/14/14, 4:11 AM, "Uwe Sauter" wrote: > >> >> Hi all, >> >> I got a question about a configuration detail: "dynamic partitions" >> >> Situation: >> I operate a Linux cluster of currently 54 nodes for a cooperation of two >> different institutes at the university. To reflect the ratio of cash >> those institutes invested I configured SLURM with two partition, one for >> each institute. Those partitions have assigned different numbers of >> nodes in a hard way, e.g. >> >> PartitionName=InstA Nodes=n[01-20] >> PartitionName=InstB Nodes=n[21-54] >> >> To improve availability in case nodes break (and perhaps save some >> power) I'd like to configure SLURM in a way that jobs can be assigned >> nodes from the whole pool, respecting the number of nodes each institute >> bought. >> >> >> Research so far: >> There is an option for partition configuration called "MaxNodes" but the >> man pages state that this restricts the maximum number of nodes PER JOB. >> It probably is possible to get something similar working using limit >> enforcment through accounting, but I haven't understood that part of >> SLURM 100% yet. >> BlueGene systems seem to have the ability for something alike but then >> this is for IBM systems only. >> >> >> Question: >> Is it possible to configure SLURM so that both partitions could utilize >> all nodes but respect a maximum number of nodes that may be used the >> same time? Something like: >> >> PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20 >> PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34 >> >> >> So is there a way to achieve this using the confg file? Do I have to use >> accounting to enfoce the limits? Or is there another way that I don't see? >> >> >> Best regards, >> >> Uwe Sauter
[slurm-dev] Re: Dynamic partitions on Linux cluster
Why not make one partition and use fairshare to balance the usage over time? That way both institutes can run large jobs that span the whole machine when others are not using it. Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435 | Fax: (512) 475-9445 On 8/14/14, 4:11 AM, "Uwe Sauter" wrote: > >Hi all, > >I got a question about a configuration detail: "dynamic partitions" > >Situation: >I operate a Linux cluster of currently 54 nodes for a cooperation of two >different institutes at the university. To reflect the ratio of cash >those institutes invested I configured SLURM with two partition, one for >each institute. Those partitions have assigned different numbers of >nodes in a hard way, e.g. > >PartitionName=InstA Nodes=n[01-20] >PartitionName=InstB Nodes=n[21-54] > >To improve availability in case nodes break (and perhaps save some >power) I'd like to configure SLURM in a way that jobs can be assigned >nodes from the whole pool, respecting the number of nodes each institute >bought. > > >Research so far: >There is an option for partition configuration called "MaxNodes" but the >man pages state that this restricts the maximum number of nodes PER JOB. >It probably is possible to get something similar working using limit >enforcment through accounting, but I haven't understood that part of >SLURM 100% yet. >BlueGene systems seem to have the ability for something alike but then >this is for IBM systems only. > > >Question: >Is it possible to configure SLURM so that both partitions could utilize >all nodes but respect a maximum number of nodes that may be used the >same time? Something like: > >PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20 >PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34 > > >So is there a way to achieve this using the confg file? Do I have to use >accounting to enfoce the limits? Or is there another way that I don't see? > > >Best regards, > > Uwe Sauter