[slurm-dev] Re: Connection Refused with job cancel

2015-01-21 Thread jette


Quoting Christopher B Coffey :

Hi guys,

One theory I have regarding the connection refused messages is that once a
users jobs are cancelled, the head slurm node ignores communication
referring to the cancelled jobs.  I’d like to see if this is true or not,
maybe one of the developers can speak on that.

I also see this bizarre msg from time to time:
slurmd[3383]: error: _step_connect: connect() failed dir
/var/spool/slurm/slurmd node cn4 job 939998 step -2 No such file or
directory


Likely a request to cancel a job is being processed while that job is  
in the process of terminating normally: the slurmctld daemon is trying  
to terminate the job through the slurmstepd (shepard for the job step  
or batch job), but the slurmstepd is already gone.

--
Morris "Moe" Jette
CTO, SchedMD LLC
Commercial Slurm Development and Support


[slurm-dev] RE: fairshare allocations

2015-01-21 Thread Bill Wichser


Okay, I get it now.  All the shares under a given ACCOUNT add up to 1.0. 
 The divvy, and why some are higher than the actual ACCOUNT's number, 
is merely the effect from this allocation amongst the users underneath.


So if the ACCOUNT gets 20% (0.20 or nearly so), then all the users 
underneath, when summed, have a value of 1.0.  Because a user here gets 
a bigger cut by assignment of a fairshare value >1 then just because 
that user has a value exceeding the ACCOUNT value is not to be confused 
with a fairshare exceeding that of it's parent.  Only that the value of 
the parent's fairshare, that user gets this percentage of the cut.


Got it!

Thanks much,
Bill

On 1/21/2015 12:57 PM, Ryan Cox wrote:



On 01/21/2015 09:23 AM, Bill Wichser wrote:


A user underneath gets the expected 0.009091 normalized shares since
there are a lot of fairshare=1 users there.  The user3 gets basically
25x this value as the fairshare for user3=25

Yet the normalized shares is actually MORE than the normalized shares
for the account as a whole.  What should I make of this?



This is actually by design in Fair Tree and is different from other
algorithms.  The manpage for sshare covers this under "FAIR_TREE
MODIFICATIONS".The manpage states that Norm Shares is "The shares
assigned to the user or account normalized to the total number of
assigned shares within the level."  Basically, the Norm Shares is the
association's raw shares value divided by the sum of it and its sibling
associations' assigned raw shares values.  For example, if an account
has 10 users, each having 1 assigned raw share, the Norm Shares value
will be .1 for each of those users under Fair Tree.

Fair Tree only uses Norm Shares and Effective Usage (the other sshare
field that's modified) when comparing sibling associations. Our Slurm UG
presentation slides also mention this on pages 35 and 76
(http://slurm.schedmd.com/SUG14/fair_tree.pdf).

Ryan


[slurm-dev] RE: fairshare allocations

2015-01-21 Thread Ryan Cox



On 01/21/2015 09:23 AM, Bill Wichser wrote:


A user underneath gets the expected 0.009091 normalized shares since 
there are a lot of fairshare=1 users there.  The user3 gets basically 
25x this value as the fairshare for user3=25


Yet the normalized shares is actually MORE than the normalized shares 
for the account as a whole.  What should I make of this?




This is actually by design in Fair Tree and is different from other 
algorithms.  The manpage for sshare covers this under "FAIR_TREE 
MODIFICATIONS".The manpage states that Norm Shares is "The shares 
assigned to the user or account normalized to the total number of 
assigned shares within the level."  Basically, the Norm Shares is the 
association's raw shares value divided by the sum of it and its sibling 
associations' assigned raw shares values.  For example, if an account 
has 10 users, each having 1 assigned raw share, the Norm Shares value 
will be .1 for each of those users under Fair Tree.


Fair Tree only uses Norm Shares and Effective Usage (the other sshare 
field that's modified) when comparing sibling associations. Our Slurm UG 
presentation slides also mention this on pages 35 and 76 
(http://slurm.schedmd.com/SUG14/fair_tree.pdf).


Ryan


[slurm-dev] Re: Connection Refused with job cancel

2015-01-21 Thread Christopher B Coffey
Hi guys,

The compute nodes write their state files locally.

The head node(s) are in HA mode and write their state files to a NFSv3
share.  I don’t think the issue is here though.

One theory I have regarding the connection refused messages is that once a
users jobs are cancelled, the head slurm node ignores communication
referring to the cancelled jobs.  I’d like to see if this is true or not,
maybe one of the developers can speak on that.

We are now running Slurm 14.11.3, Centos 6.6, 2.6.32-504.3.3.el6.x86_64,
Mellanox OFED: MLNX_OFED_LINUX-2.3-2.0.5-rhel6.6-x86_64.  Things seem
stable, yet cgroup initialization and cleanup definitely takes some
noticeable time.

Example, this node is reserved, and idle:

Submit 24 jobs to a 32 cpu reserved node.  Each job is identical, and the
job simply echos “hi!”.  The 24 jobs take ~40 seconds to fully complete.
Submit only takes .27 seconds.

Submit 24 jobs to a 32 cpu reserved node.  Each job is identical, and the
job simply echos “hi!”.  The 24 jobs take ~20 seconds to fully cancel.

Now, we aren’t doing any huge HTC so this isn’t a big problem right now,
but it did seem a bit strange.


The spinlock issue that I spoke about in the docs is here:
http://slurm.schedmd.com/cgroups.html under “General Usage Notes".  I’d
imagine the fixes required for the cgroup spinlock has been back ported to
the el6 2.6.32 kernel series but I haven’t confirmed this.

I also see this bizarre msg from time to time:

slurmd[3383]: error: _step_connect: connect() failed dir
/var/spool/slurm/slurmd node cn4 job 939998 step -2 No such file or
directory

It’s almost like things get out of sync sometimes and the state file gets
cleaned up too soon?


Do you guys see any messages like that?

All in all things are running pretty smooth so I can’t complain too much.

Happy hump day!

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167




On 1/16/15, 5:07 PM, "Trey Dockendorf"  wrote:

>We had an issue with cgroups causing some lock ups of nodes at the end of
>jobs but found it was due to storing the release agent files used by the
>SLURM cgroups on a NFSv4 share.  Once we moved to NFSv3 then eventually
>to local disk the locking
> stopped.
>
>
>We are on CentOS 6.5 and haven't since had issues with connections except
>when large batches (many 1000s) of jobs are submitted.  That has been
>handled by tuning on the slurmctld host.
>
>
>- Trey
>
>=
>
>
>Trey Dockendorf 
>Systems Analyst I 
>Texas A&M University
>Academy for Advanced Telecommunications and Learning Technologies
>Phone: (979)458-2396
>Email: treyd...@tamu.edu
>Jabber: treyd...@tamu.edu
>
>
>
>
>On Fri, Jan 16, 2015 at 5:23 PM, Christopher B Coffey
> wrote:
>
>Hi Thanks for the response.
>
>The nodes can communicate to that port (6817).  The nodes and cluster are
>functioning normally - atleast I think :).  This connection refused error
>only happens when a job is cancelled.  Maybe the slurm agent hangs during
>the cgroup cleanup.  I’m wondering if my hardware suffers from that cgroup
>spinlock issue that was in the docs.  And that’s also why the jobs take
>forever to clean up.
>
>Best,
>Chris
>
>—
>Christopher Coffey
>High-Performance Computing
>Northern Arizona University
>928-523-1167
>
>
>
>
>On 1/16/15, 4:00 PM, "Trey Dockendorf"  wrote:
>
>>I ran into something similar in the 14.03 series when we set
>>SlurmctldPort from 6817 to 6816-6817.  We had neglected to fully restart
>>slurmctld and so some nodes would periodically try to talk to slurmctld
>>on the port that wasn't yet being listened
>> on.  May not apply to your situation but did see something similar when
>>we moved to using multiple ports.
>>
>>
>>- Trey
>>
>>
>>=
>>
>>
>>Trey Dockendorf
>>Systems Analyst I
>>Texas A&M University
>>Academy for Advanced Telecommunications and Learning Technologies
>>Phone: (979)458-2396
>>Email: treyd...@tamu.edu
>>Jabber: treyd...@tamu.edu
>>
>>
>>
>>
>>On Thu, Jan 8, 2015 at 10:38 AM, Christopher B Coffey
>> wrote:
>>
>>Hi,
>>
>>Happy new year!
>>
>>I ran into these messages while diagnosing a bug in cgroup with kernel
>>2.6.32-431.29.2.el6 where a bunch of jobs being cancelled caused the
>>system to crash.  Anyhoo, after updating the kernel the node is stable in
>>the event of mass job cancel.  But I noticed these messages that occur
>>during a job cancel:
>>
>>Jan  8 09:03:41 cn6 slurmstepd[45357]: done with job
>>Jan  8 09:03:41 cn6 slurmstepd[45049]: done with job
>>Jan  8 09:03:42 cn6 slurmstepd[45115]: sending
>>REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 15
>>Jan  8 09:03:42 cn6 slurmstepd[45115]: done with job
>>Jan  8 09:03:42 cn6 slurmstepd[45704]: error: Failed to send
>>MESSAGE_TASK_EXIT: Connection refused
>>Jan  8 09:03:42 cn6 slurmstepd[45704]: done with job
>>Jan  8 09:03:42 cn6 slurmstepd[45593]: error: Failed to send
>>MESSAGE_TASK_EXIT: Connection refused
>>Jan  8 09:03:42 cn6 slurmstepd[45593]: done with job
>>Jan  8 

[slurm-dev] RE: fairshare allocations

2015-01-21 Thread Lipari, Don
> -Original Message-
> From: Bill Wichser [mailto:b...@princeton.edu]
> Sent: Wednesday, January 21, 2015 8:23 AM
> To: slurm-dev
> Subject: [slurm-dev] RE: fairshare allocations
> 
> 
> 
> 
> On 01/21/2015 11:07 AM, Lipari, Don wrote:
> >
> >
> >> -Original Message-
> >> From: Bill Wichser [mailto:b...@princeton.edu]
> >> Sent: Wednesday, January 21, 2015 5:20 AM
> >> To: slurm-dev
> >> Subject: [slurm-dev] fairshare allocations
> >>
> >>
> >> The algorithm I use is fairtree under 14.11 but I believe that my
> >> question relates to any method.
> >>
> >> As a University, we have many investments into a given cluster.  At the
> >> most simplistic level, lets assume there are but two two allocations.
> >> The method I have been using is to assign a value, as a percentage of
> >> ownership, to the various ACCOUNTs such that when summed across all
> >> accounts, they add to 100.
> >>
> >> So chemistry might have a fairshare value of 20 as they contributed 20%
> >> of the funding.  Physics has a value of 10.  And so forth, with many
> >> having a fairshare value of 1 since no money was contributed.
> >>
> >> In the past, I simply assigned either a fairshare value of parent to
> the
> >> users or assigned them a value of 1.
> >>
> >> So lets take a user, call him Bill, who has a fairshare value of 1
> under
> >> the account=chem.  It appears to me that this 1 share is actually a 1
> >> share of the total and not a 1 share of what the account=chem owns.  Am
> >> I reading this correctly here?
> >
> > A share of 1 for Bill is a share of the total shares assigned to users
> > (or accounts) under the chem account.  Chem can have 1000 users, each
> with
> > 1 share, but chem users' combined usage of the system will be throttled
> > to 20% based on job priorities calculated by the fair-share factor.
> >
> > That works both ways:  if only one user from chem is submitting jobs,
> that
> > user can receive 20% of the resources of the cluster, even though they
> have
> > only one share of chem.
> >
> > The most common practice is to assign a share of 1 to every user in an
> > account.  You can assign greater share values to users who are entitled
> > to more than their peers.
> >
> > Don Lipari
> >
> >>
> >> Thanks,
> >> Bill
> 
> So that was my expectation.  But lets look at this account, truncated,
> with a user with a fairshare of 20 (using sshare -a -l -A ee -p)
> 
> Account|User|Raw Shares|Norm Shares|Raw Usage|Norm Usage|Effectv
> Usage|FairShare|Level FS|GrpCPUMins|CPURunMins|
> ee||261|0.218227|189272064|0.047197|0.047197||4.623757||50912|
> 
> ee|user1|1|0.009091|24151307|0.006022|0.127601|0.771261|0.071245||24605|
>   ee|user2|1|0.009091|652289|0.000163|0.003446|0.780059|2.637872||0|
>   ee|user3|25|0.227273|15684228|0.003911|0.082866|0.781525|2.742652||0|
> ...
> 
> 
> 
> 
> 
> 
> So ee as an account gets fairshare=261 and gets a 0.218227 normalized
> share count.
> 
> A user underneath gets the expected 0.009091 normalized shares since
> there are a lot of fairshare=1 users there.  The user3 gets basically
> 25x this value as the fairshare for user3=25
> 
> Yet the normalized shares is actually MORE than the normalized shares
> for the account as a whole.  What should I make of this?

That looks like a bug.  I don't see that behavior on our systems running slurm 
14.03.11.
Don

> Bill


[slurm-dev] RE: fairshare allocations

2015-01-21 Thread Bill Wichser




On 01/21/2015 11:07 AM, Lipari, Don wrote:




-Original Message-
From: Bill Wichser [mailto:b...@princeton.edu]
Sent: Wednesday, January 21, 2015 5:20 AM
To: slurm-dev
Subject: [slurm-dev] fairshare allocations


The algorithm I use is fairtree under 14.11 but I believe that my
question relates to any method.

As a University, we have many investments into a given cluster.  At the
most simplistic level, lets assume there are but two two allocations.
The method I have been using is to assign a value, as a percentage of
ownership, to the various ACCOUNTs such that when summed across all
accounts, they add to 100.

So chemistry might have a fairshare value of 20 as they contributed 20%
of the funding.  Physics has a value of 10.  And so forth, with many
having a fairshare value of 1 since no money was contributed.

In the past, I simply assigned either a fairshare value of parent to the
users or assigned them a value of 1.

So lets take a user, call him Bill, who has a fairshare value of 1 under
the account=chem.  It appears to me that this 1 share is actually a 1
share of the total and not a 1 share of what the account=chem owns.  Am
I reading this correctly here?


A share of 1 for Bill is a share of the total shares assigned to users
(or accounts) under the chem account.  Chem can have 1000 users, each with
1 share, but chem users' combined usage of the system will be throttled
to 20% based on job priorities calculated by the fair-share factor.

That works both ways:  if only one user from chem is submitting jobs, that
user can receive 20% of the resources of the cluster, even though they have
only one share of chem.

The most common practice is to assign a share of 1 to every user in an
account.  You can assign greater share values to users who are entitled
to more than their peers.

Don Lipari



Thanks,
Bill


So that was my expectation.  But lets look at this account, truncated, 
with a user with a fairshare of 20 (using sshare -a -l -A ee -p)


Account|User|Raw Shares|Norm Shares|Raw Usage|Norm Usage|Effectv 
Usage|FairShare|Level FS|GrpCPUMins|CPURunMins|

ee||261|0.218227|189272064|0.047197|0.047197||4.623757||50912|
   ee|user1|1|0.009091|24151307|0.006022|0.127601|0.771261|0.071245||24605|
 ee|user2|1|0.009091|652289|0.000163|0.003446|0.780059|2.637872||0|
 ee|user3|25|0.227273|15684228|0.003911|0.082866|0.781525|2.742652||0|
...






So ee as an account gets fairshare=261 and gets a 0.218227 normalized 
share count.


A user underneath gets the expected 0.009091 normalized shares since 
there are a lot of fairshare=1 users there.  The user3 gets basically 
25x this value as the fairshare for user3=25


Yet the normalized shares is actually MORE than the normalized shares 
for the account as a whole.  What should I make of this?


Bill


[slurm-dev] RE: fairshare allocations

2015-01-21 Thread Lipari, Don


> -Original Message-
> From: Bill Wichser [mailto:b...@princeton.edu]
> Sent: Wednesday, January 21, 2015 5:20 AM
> To: slurm-dev
> Subject: [slurm-dev] fairshare allocations
> 
> 
> The algorithm I use is fairtree under 14.11 but I believe that my
> question relates to any method.
> 
> As a University, we have many investments into a given cluster.  At the
> most simplistic level, lets assume there are but two two allocations.
> The method I have been using is to assign a value, as a percentage of
> ownership, to the various ACCOUNTs such that when summed across all
> accounts, they add to 100.
> 
> So chemistry might have a fairshare value of 20 as they contributed 20%
> of the funding.  Physics has a value of 10.  And so forth, with many
> having a fairshare value of 1 since no money was contributed.
> 
> In the past, I simply assigned either a fairshare value of parent to the
> users or assigned them a value of 1.
> 
> So lets take a user, call him Bill, who has a fairshare value of 1 under
> the account=chem.  It appears to me that this 1 share is actually a 1
> share of the total and not a 1 share of what the account=chem owns.  Am
> I reading this correctly here?

A share of 1 for Bill is a share of the total shares assigned to users
(or accounts) under the chem account.  Chem can have 1000 users, each with
1 share, but chem users' combined usage of the system will be throttled
to 20% based on job priorities calculated by the fair-share factor.

That works both ways:  if only one user from chem is submitting jobs, that
user can receive 20% of the resources of the cluster, even though they have
only one share of chem.

The most common practice is to assign a share of 1 to every user in an
account.  You can assign greater share values to users who are entitled
to more than their peers.

Don Lipari

> 
> Thanks,
> Bill


[slurm-dev] fairshare allocations

2015-01-21 Thread Bill Wichser


The algorithm I use is fairtree under 14.11 but I believe that my 
question relates to any method.


As a University, we have many investments into a given cluster.  At the 
most simplistic level, lets assume there are but two two allocations. 
The method I have been using is to assign a value, as a percentage of 
ownership, to the various ACCOUNTs such that when summed across all 
accounts, they add to 100.


So chemistry might have a fairshare value of 20 as they contributed 20% 
of the funding.  Physics has a value of 10.  And so forth, with many 
having a fairshare value of 1 since no money was contributed.


In the past, I simply assigned either a fairshare value of parent to the 
users or assigned them a value of 1.


So lets take a user, call him Bill, who has a fairshare value of 1 under 
the account=chem.  It appears to me that this 1 share is actually a 1 
share of the total and not a 1 share of what the account=chem owns.  Am 
I reading this correctly here?


Thanks,
Bill