[slurm-dev] Dynamic partitions on Linux cluster

2014-08-14 Thread Uwe Sauter

Hi all,

I got a question about a configuration detail: dynamic partitions

Situation:
I operate a Linux cluster of currently 54 nodes for a cooperation of two
different institutes at the university. To reflect the ratio of cash
those institutes invested I configured SLURM with two partition, one for
each institute. Those partitions have assigned different numbers of
nodes in a hard way, e.g.

PartitionName=InstA Nodes=n[01-20]
PartitionName=InstB Nodes=n[21-54]

To improve availability in case nodes break (and perhaps save some
power) I'd like to configure SLURM in a way that jobs can be assigned
nodes from the whole pool, respecting the number of nodes each institute
bought.


Research so far:
There is an option for partition configuration called MaxNodes but the
man pages state that this restricts the maximum number of nodes PER JOB.
It probably is possible to get something similar working using limit
enforcment through accounting, but I haven't understood that part of
SLURM 100% yet.
BlueGene systems seem to have the ability for something alike but then
this is for IBM systems only.


Question:
Is it possible to configure SLURM so that both partitions could utilize
all nodes but respect a maximum number of nodes that may be used the
same time? Something like:

PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34


So is there a way to achieve this using the confg file? Do I have to use
accounting to enfoce the limits? Or is there another way that I don't see?


Best regards,

Uwe Sauter


[slurm-dev] Re: Dynamic partitions on Linux cluster

2014-08-14 Thread Bill Barth

Why not make one partition and use fairshare to balance the usage over
time? That way both institutes can run large jobs that span the whole
machine when others are not using it.

Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445







On 8/14/14, 4:11 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:


Hi all,

I got a question about a configuration detail: dynamic partitions

Situation:
I operate a Linux cluster of currently 54 nodes for a cooperation of two
different institutes at the university. To reflect the ratio of cash
those institutes invested I configured SLURM with two partition, one for
each institute. Those partitions have assigned different numbers of
nodes in a hard way, e.g.

PartitionName=InstA Nodes=n[01-20]
PartitionName=InstB Nodes=n[21-54]

To improve availability in case nodes break (and perhaps save some
power) I'd like to configure SLURM in a way that jobs can be assigned
nodes from the whole pool, respecting the number of nodes each institute
bought.


Research so far:
There is an option for partition configuration called MaxNodes but the
man pages state that this restricts the maximum number of nodes PER JOB.
It probably is possible to get something similar working using limit
enforcment through accounting, but I haven't understood that part of
SLURM 100% yet.
BlueGene systems seem to have the ability for something alike but then
this is for IBM systems only.


Question:
Is it possible to configure SLURM so that both partitions could utilize
all nodes but respect a maximum number of nodes that may be used the
same time? Something like:

PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34


So is there a way to achieve this using the confg file? Do I have to use
accounting to enfoce the limits? Or is there another way that I don't see?


Best regards,

   Uwe Sauter


[slurm-dev] Re: Dynamic partitions on Linux cluster

2014-08-14 Thread Uwe Sauter

Hi Bill,

if I understand the concept of fairshare correctly, this could result in
a situation where one institute uses all resources.

Because of this fairshare is out of the question as I have to enforce
the ratio between the institutes - I cannot allow usage that would
result in one institute using more than what they paied for. If an
institute doesn't use the resources they have to run idle (or power down).

You could compare my situation with running two clusters that use the
same base infrastructure. What I want to do is enable users of both
institutes to use both clusters - but for each point in time use a
maximum of nodes that belong to their cluster.


Regards,

Uwe


Am 14.08.2014 um 14:34 schrieb Bill Barth:
 
 Why not make one partition and use fairshare to balance the usage over
 time? That way both institutes can run large jobs that span the whole
 machine when others are not using it.
 
 Bill.
 --
 Bill Barth, Ph.D., Director, HPC
 bba...@tacc.utexas.edu|   Phone: (512) 232-7069
 Office: ROC 1.435 |   Fax:   (512) 475-9445
 
 
 
 
 
 
 
 On 8/14/14, 4:11 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:
 

 Hi all,

 I got a question about a configuration detail: dynamic partitions

 Situation:
 I operate a Linux cluster of currently 54 nodes for a cooperation of two
 different institutes at the university. To reflect the ratio of cash
 those institutes invested I configured SLURM with two partition, one for
 each institute. Those partitions have assigned different numbers of
 nodes in a hard way, e.g.

 PartitionName=InstA Nodes=n[01-20]
 PartitionName=InstB Nodes=n[21-54]

 To improve availability in case nodes break (and perhaps save some
 power) I'd like to configure SLURM in a way that jobs can be assigned
 nodes from the whole pool, respecting the number of nodes each institute
 bought.


 Research so far:
 There is an option for partition configuration called MaxNodes but the
 man pages state that this restricts the maximum number of nodes PER JOB.
 It probably is possible to get something similar working using limit
 enforcment through accounting, but I haven't understood that part of
 SLURM 100% yet.
 BlueGene systems seem to have the ability for something alike but then
 this is for IBM systems only.


 Question:
 Is it possible to configure SLURM so that both partitions could utilize
 all nodes but respect a maximum number of nodes that may be used the
 same time? Something like:

 PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
 PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34


 So is there a way to achieve this using the confg file? Do I have to use
 accounting to enfoce the limits? Or is there another way that I don't see?


 Best regards,

  Uwe Sauter


[slurm-dev] Re: Dynamic partitions on Linux cluster

2014-08-14 Thread Bill Barth

Yes, yes it does. I don't mean to be harsh, but doing it their way is a
potentially huge waste of resources. Why not get each institute to agree
to share the whole machine in proportion to what they paid? Each institute
gets an allocation of time (through accounting) and a fairshare fraction
in the ratio of their contribution, but is allowed to use the whole
machine. If both institutes have periods of down time, then the machine
will be less likely to sit idle and more work will get done.

I'll get off my soapbox now.

Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445







On 8/14/14, 7:48 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:


Hi Bill,

if I understand the concept of fairshare correctly, this could result in
a situation where one institute uses all resources.

Because of this fairshare is out of the question as I have to enforce
the ratio between the institutes - I cannot allow usage that would
result in one institute using more than what they paied for. If an
institute doesn't use the resources they have to run idle (or power down).

You could compare my situation with running two clusters that use the
same base infrastructure. What I want to do is enable users of both
institutes to use both clusters - but for each point in time use a
maximum of nodes that belong to their cluster.


Regards,

   Uwe


Am 14.08.2014 um 14:34 schrieb Bill Barth:
 
 Why not make one partition and use fairshare to balance the usage over
 time? That way both institutes can run large jobs that span the whole
 machine when others are not using it.
 
 Bill.
 --
 Bill Barth, Ph.D., Director, HPC
 bba...@tacc.utexas.edu|   Phone: (512) 232-7069
 Office: ROC 1.435 |   Fax:   (512) 475-9445
 
 
 
 
 
 
 
 On 8/14/14, 4:11 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:
 

 Hi all,

 I got a question about a configuration detail: dynamic partitions

 Situation:
 I operate a Linux cluster of currently 54 nodes for a cooperation of
two
 different institutes at the university. To reflect the ratio of cash
 those institutes invested I configured SLURM with two partition, one
for
 each institute. Those partitions have assigned different numbers of
 nodes in a hard way, e.g.

 PartitionName=InstA Nodes=n[01-20]
 PartitionName=InstB Nodes=n[21-54]

 To improve availability in case nodes break (and perhaps save some
 power) I'd like to configure SLURM in a way that jobs can be assigned
 nodes from the whole pool, respecting the number of nodes each
institute
 bought.


 Research so far:
 There is an option for partition configuration called MaxNodes but
the
 man pages state that this restricts the maximum number of nodes PER
JOB.
 It probably is possible to get something similar working using limit
 enforcment through accounting, but I haven't understood that part of
 SLURM 100% yet.
 BlueGene systems seem to have the ability for something alike but then
 this is for IBM systems only.


 Question:
 Is it possible to configure SLURM so that both partitions could utilize
 all nodes but respect a maximum number of nodes that may be used the
 same time? Something like:

 PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
 PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34


 So is there a way to achieve this using the confg file? Do I have to
use
 accounting to enfoce the limits? Or is there another way that I don't
see?


 Best regards,

 Uwe Sauter


[slurm-dev] Re: Dynamic partitions on Linux cluster

2014-08-14 Thread Ryan Cox


I would also recommend QOS if you absolutely can't use fairshare. Set up 
a QOS per institute with a GrpNodes limit that is the correct ratio and 
only allow institute members to their QOS (make it their default too).


Alternatively you can also do one account per institute and set GrpNodes 
there, though that is less flexible than a QOS.


Ryan

On 08/14/2014 07:48 AM, Paul Edmon wrote:


We have a bit of a similar situation here.  A possible solution that 
may work for you is QoS.  The QoS's behave like a synthetic 
partition.  That way you can have a single partition but multiple 
QoS's which can flex around down nodes.


From the experimentation I have done with them this may be a good 
solution for you.


-Paul Edmon-

On 08/14/2014 09:25 AM, Uwe Sauter wrote:

I would totally agree with you but university administration has to
justify the part of the first institute (because it was paid with
federal money) while the other institute paid for themselves and can do
with their part what they want.

This is the reason for the current unflexible mapping between partition
and nodes. To get away from that for better availability I'm looking for
a way to have a dynamic mapping that just enforces the ratio between the
institutes while flexibly  allocate the nodes from the whole pool.

I know its a waste of resources but I am bound to this decision...

Regards,

Uwe


Am 14.08.2014 um 14:59 schrieb Bill Barth:

Yes, yes it does. I don't mean to be harsh, but doing it their way is a
potentially huge waste of resources. Why not get each institute to 
agree
to share the whole machine in proportion to what they paid? Each 
institute
gets an allocation of time (through accounting) and a fairshare 
fraction

in the ratio of their contribution, but is allowed to use the whole
machine. If both institutes have periods of down time, then the machine
will be less likely to sit idle and more work will get done.

I'll get off my soapbox now.

Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445







On 8/14/14, 7:48 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:


Hi Bill,

if I understand the concept of fairshare correctly, this could 
result in

a situation where one institute uses all resources.

Because of this fairshare is out of the question as I have to enforce
the ratio between the institutes - I cannot allow usage that would
result in one institute using more than what they paied for. If an
institute doesn't use the resources they have to run idle (or power 
down).


You could compare my situation with running two clusters that use the
same base infrastructure. What I want to do is enable users of both
institutes to use both clusters - but for each point in time use a
maximum of nodes that belong to their cluster.


Regards,

Uwe


Am 14.08.2014 um 14:34 schrieb Bill Barth:
Why not make one partition and use fairshare to balance the usage 
over

time? That way both institutes can run large jobs that span the whole
machine when others are not using it.

Bill.
--
Bill Barth, Ph.D., Director, HPC
bba...@tacc.utexas.edu|   Phone: (512) 232-7069
Office: ROC 1.435 |   Fax:   (512) 475-9445







On 8/14/14, 4:11 AM, Uwe Sauter uwe.sauter...@gmail.com wrote:


Hi all,

I got a question about a configuration detail: dynamic partitions

Situation:
I operate a Linux cluster of currently 54 nodes for a cooperation of
two
different institutes at the university. To reflect the ratio of cash
those institutes invested I configured SLURM with two partition, one
for
each institute. Those partitions have assigned different numbers of
nodes in a hard way, e.g.

PartitionName=InstA Nodes=n[01-20]
PartitionName=InstB Nodes=n[21-54]

To improve availability in case nodes break (and perhaps save some
power) I'd like to configure SLURM in a way that jobs can be 
assigned

nodes from the whole pool, respecting the number of nodes each
institute
bought.


Research so far:
There is an option for partition configuration called MaxNodes but
the
man pages state that this restricts the maximum number of nodes PER
JOB.
It probably is possible to get something similar working using limit
enforcment through accounting, but I haven't understood that part of
SLURM 100% yet.
BlueGene systems seem to have the ability for something alike but 
then

this is for IBM systems only.


Question:
Is it possible to configure SLURM so that both partitions could 
utilize

all nodes but respect a maximum number of nodes that may be used the
same time? Something like:

PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34


So is there a way to achieve this using the confg file? Do I have to
use
accounting to enfoce the limits? Or is there another way that I 
don't

see?


Best regards,

Uwe Sauter


--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young 

[slurm-dev] Re: Change requested cpus of running job

2014-08-14 Thread jette


Quoting Christopher B Coffey chris.cof...@nau.edu:

Hi,

Is it possible with scontrol to change the number of cpus that were
granted to a job while its running?


Only if the user's program/script is cooperating. See:
http://slurm.schedmd.com/faq.html#job_size
--
Morris Moe Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html