[slurm-users] How to find the average downtime of compute nodes in a cluster?

2020-04-02 Thread Sudeep Narayan Banerjee

Like any node is *down* state

(not drng or drain or IDLE or ALLOC)

--
Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA



Re: [slurm-users] How to get the Average number of CPU cores used by jobs per day?

2020-04-02 Thread Sudeep Narayan Banerjee
Dear Steven: Yes, but am unable to get the desired data. Not sure which 
flags to use.


Thanks & Regards,
Sudeep Narayan Banerjee

On 03/04/20 10:42 am, Steven Dick wrote:

Have you looked at sreport?

On Fri, Apr 3, 2020 at 1:09 AM Sudeep Narayan Banerjee
 wrote:

How to get the Average number of CPU cores used by jobs per day by a particular 
group?

By group means: say faculty group1, group2 etc. all those groups are having a 
certain number of students

--
Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA


Re: [slurm-users] Executing slurm command from Lua job_submit script?

2020-04-02 Thread Marcus Wagner

Hi Chansup,

could you provde a code snippet?

Best
Marcus

Am 02.04.2020 um 19:43 schrieb CB:

Hi,

I'm running Slurm 19.05.

I'm trying to execute some Slurm commands from the Lua job_submit script 
for a certain condition.

But, I found that it's not executed and return nothing.
For example, I tried to execute a "sinfo" command from an external shell 
script but it didn't work.


Does Slurm prohibit to execute any Slurm command from the Lua job_submit 
command?


Thanks,
- Chansup





Re: [slurm-users] How to get the Average number of CPU cores used by jobs per day?

2020-04-02 Thread Steven Dick
Have you looked at sreport?

On Fri, Apr 3, 2020 at 1:09 AM Sudeep Narayan Banerjee
 wrote:
>
> How to get the Average number of CPU cores used by jobs per day by a 
> particular group?
>
> By group means: say faculty group1, group2 etc. all those groups are having a 
> certain number of students
>
> --
> Thanks & Regards,
> Sudeep Narayan Banerjee
> System Analyst | Scientist B
> Information System Technology Facility
> Academic Block 5 | Room 110
> Indian Institute of Technology Gandhinagar
> Palaj, Gujarat 382355 INDIA



[slurm-users] How to get the Average number of CPU cores used by jobs per day?

2020-04-02 Thread Sudeep Narayan Banerjee
How to get the Average number of CPU cores used by jobs per day by a 
particular group?


By group means: say faculty group1, group2 etc. all those groups are 
having a certain number of students


--
Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA



[slurm-users] Executing slurm command from Lua job_submit script?

2020-04-02 Thread CB
Hi,

I'm running Slurm 19.05.

I'm trying to execute some Slurm commands from the Lua job_submit script
for a certain condition.
But, I found that it's not executed and return nothing.
For example, I tried to execute a "sinfo" command from an external shell
script but it didn't work.

Does Slurm prohibit to execute any Slurm command from the Lua job_submit
command?

Thanks,
- Chansup


[slurm-users] Interaction between suspension/preemption and core affinity

2020-04-02 Thread Kai Krueger
Hello everyone,

I have setup slurm (19.05) to use suspension to be able to pre-empt jobs
in a lower priority queue by a higher priority queue. However, jobs
aren't resuming as I would expect. If a higher priority job completes
and frees up resources for the lower priority jobs, often the suspended
jobs don't resume, and instead another pending job in the lower queue
gets launched. Furthermore, even when resources are actually available
the suspended jobs do not resume.

My best guess of what is currently happening is that this is an
interaction with task/core affinity. I use the cgroup task plugin to
constrain that a job can not use more cores than it requests. However I
suspect that the suspended job when launched was e.g. bound to core 1.
Now in the above scenarios, presumably the job that pre-empted the
suspended job was then bound to core 1. When e.g. core 2 becomes
available the suspended job cannot be placed on that core, as it is
already bound to core 1, and thus core 2 remains idle even though there
is a job waiting.

As for my applications, I do care about constraining the total number of
cores a job uses to what they requested (to make sure you don't
accidentally consume more resources than requested and thus effecting
other jobs on the node), but I don't care which core they run on. (These
are typically single thread/core jobs, so they don't need appropriate
core placements), I was wondering

a) if my interpretation of what is happening in scheduling is correct?

c) can task/core affinity be reset on suspend/resume? To make sure that
a suspended job can resume on any of the available cores on the same
node? Can one use cgroups to constrain cores without core affinity?

Thank you,

Kai Krueger



pEpkey.asc
Description: application/pgp-keys


Re: [slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Paul Edmon
I would recommend setting up XDMoD as it will calculate this, plus a 
variety of other useful facts:


https://open.xdmod.org/8.5/index.html

Also if you like grafana you can use this:

https://github.com/fasrc/slurm-diamond-collector

-Paul Edmon-


On 4/2/2020 8:31 AM, Sudeep Narayan Banerjee wrote:


Dear Peter: I am trying with *sacct* and multiple flags.. but am not 
getting the desired output as per the query...


Thanks & Regards,
Sudeep Narayan Banerjee
On 02/04/20 5:23 pm, Peter Kjellström wrote:

On Thu, 2 Apr 2020 16:57:46 +0530
Sudeep Narayan Banerjee  wrote:


any help in getting the right flags ?

You may need to clarify that question a bit...

How many users ran jobs on each day? (weekly, monthly average?)

How many jobs/per day did each user run? (weekly, monthly average?)

And what counts as job activity for a day? Started a job that day?
completed a job? Had at least one job running?

/Peter


Re: [slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Sudeep Narayan Banerjee
Dear Peter: I am trying with *sacct* and multiple flags.. but am not 
getting the desired output as per the query...


Thanks & Regards,
Sudeep Narayan Banerjee

On 02/04/20 5:23 pm, Peter Kjellström wrote:

On Thu, 2 Apr 2020 16:57:46 +0530
Sudeep Narayan Banerjee  wrote:


any help in getting the right flags ?

You may need to clarify that question a bit...

How many users ran jobs on each day? (weekly, monthly average?)

How many jobs/per day did each user run? (weekly, monthly average?)

And what counts as job activity for a day? Started a job that day?
completed a job? Had at least one job running?

/Peter


Re: [slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Ole Holm Nielsen

On 02-04-2020 14:16, Sudeep Narayan Banerjee wrote:
Well I am looking for, How many users ran jobs on each day on an average 
(day average) with at least one job running?


Exactly!  The NEXT_JOB_ID increases by 1 for every job submitted.
You may simply read and store this number every day at 00:00.  Put the 
numbers into an Excel spreadsheet.


As I said, you should look into proper Slurm accounting so that you can 
get answers to relevant questions.


/Ole


On 02/04/20 5:34 pm, Ole Holm Nielsen wrote:

On 02-04-2020 13:27, Sudeep Narayan Banerjee wrote:

any help in getting the right flags ?


The question is not well-defined.  If you just want to know the JobID 
number in the cluster, you could run this command every day and watch 
the NEXT_JOB_ID increase:


# scontrol show config | grep NEXT_JOB_ID
NEXT_JOB_ID = 2377393

Job accounting is probably what you are looking for.  You may take a 
look at my Slurm Wiki page 
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting





Re: [slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Sudeep Narayan Banerjee
Well I am looking for, How many users ran jobs on each day on an average 
(day average) with at least one job running?


Thanks & Regards,
Sudeep Narayan Banerjee

On 02/04/20 5:34 pm, Ole Holm Nielsen wrote:

On 02-04-2020 13:27, Sudeep Narayan Banerjee wrote:

any help in getting the right flags ?


The question is not well-defined.  If you just want to know the JobID 
number in the cluster, you could run this command every day and watch 
the NEXT_JOB_ID increase:


# scontrol show config | grep NEXT_JOB_ID
NEXT_JOB_ID = 2377393

Job accounting is probably what you are looking for.  You may take a 
look at my Slurm Wiki page 
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting


/Ole



Re: [slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Ole Holm Nielsen

On 02-04-2020 13:27, Sudeep Narayan Banerjee wrote:

any help in getting the right flags ?


The question is not well-defined.  If you just want to know the JobID 
number in the cluster, you could run this command every day and watch 
the NEXT_JOB_ID increase:


# scontrol show config | grep NEXT_JOB_ID
NEXT_JOB_ID = 2377393

Job accounting is probably what you are looking for.  You may take a 
look at my Slurm Wiki page 
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting


/Ole



Re: [slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Peter Kjellström
On Thu, 2 Apr 2020 16:57:46 +0530
Sudeep Narayan Banerjee  wrote:

> any help in getting the right flags ?

You may need to clarify that question a bit...

How many users ran jobs on each day? (weekly, monthly average?)

How many jobs/per day did each user run? (weekly, monthly average?)

And what counts as job activity for a day? Started a job that day?
completed a job? Had at least one job running?

/Peter



[slurm-users] How many users are running jobs per day on average in slurm ?

2020-04-02 Thread Sudeep Narayan Banerjee

any help in getting the right flags ?

--
Thanks & Regards,
Sudeep Narayan Banerjee
System Analyst | Scientist B
Information System Technology Facility
Academic Block 5 | Room 110
Indian Institute of Technology Gandhinagar
Palaj, Gujarat 382355 INDIA