[slurm-users] Inconsistent cpu bindings with cpu-bind=none

2020-02-17 Thread Marcus Boden
Hi everyone,

I am facing a bit of a weird issue with CPU bindings and mpirun:
My jobscript:
#SBATCH -N 20
#SBATCH --tasks-per-node=40
#SBATCH -p medium40
#SBATCH -t 30 
#SBATCH -o out/%J.out
#SBATCH -e out/%J.err
#SBATCH --reservation=root_98

module load impi/2019.4 2>&1

export I_MPI_DEBUG=6
export SLURM_CPU_BIND=none

. 
/sw/comm/impi/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpivars.sh
 realease
BENCH=/sw/comm/impi/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/IMB-MPI1

mpirun -np 800 $BENCH -npmin 800 -iter 50 -time 120 -msglog 16:18 -include 
Allreduce Bcast Barrier Exchange Gather PingPing PingPong Reduce Scatter 
Allgather Alltoall Reduce_scatter

My output is as follows:
[...]
[0] MPI startup(): 37  154426   gcn1311{37,77}
[0] MPI startup(): 38  154427   gcn1311{38,78}
[0] MPI startup(): 39  154428   gcn1311{39,79}
[0] MPI startup(): 40  161061   gcn1312{0}
[0] MPI startup(): 41  161062   gcn1312{40}
[0] MPI startup(): 42  161063   gcn1312{0}
[0] MPI startup(): 43  161064   gcn1312{40}
[0] MPI startup(): 44  161065   gcn1312{0}
[...]

On 8 out of 20 nodes I got the wrong pinning. In the slurmd logs I found
that on nodes, where the pinning was correct, manual binding was
communicated correctly:
  lllp_distribution jobid [2065227] manual binding: none
On those, where it did not work, not so much:
  lllp_distribution jobid [2065227] default auto binding: cores, dist 1

So, for some reason, slurm told some task to use CPU bindings and for
some, the cpu binding was (correctly) disabled.

Any ideas what could cause this?

Best,
Marcus
-- 
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience
Tel.:   +49 (0)551 201-2191
E-Mail: mbo...@gwdg.de
---
Gesellschaft fuer wissenschaftliche
Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:http://www.gwdg.de
E-Mail: g...@gwdg.de
Tel.:   +49 (0)551 201-1510
Fax:+49 (0)551 201-2150
Geschaeftsfuehrer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender:
Prof. Dr. Christian Griesinger
Sitz der Gesellschaft: Goettingen
Registergericht: Goettingen
Handelsregister-Nr. B 598
---


smime.p7s
Description: S/MIME cryptographic signature


[slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi Team,

i have an issue with the slurm job limit. i applied the Maxjobs limit on
user using

 sacctmgr modify user navin1 set maxjobs=3

but still i see this is not getting applied. i am still bale to submit more
jobs.
Slurm version is 17.11.x

Let me know what setting is required to implement this.

Regards
Navin.


Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread Ole Holm Nielsen

On 2/17/20 11:16 AM, navin srivastava wrote:
i have an issue with the slurm job limit. i applied the Maxjobs limit on 
user using


  sacctmgr modify user navin1 set maxjobs=3

but still i see this is not getting applied. i am still bale to submit 
more jobs.

Slurm version is 17.11.x

Let me know what setting is required to implement this.


The tool "showuserlimits" tells you all user limits in the Slurm database. 
 You can download it from 
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits 
and give it a try:


$ showuserlimits -u navin1

/Ole



Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi,

Thanks for your script.
with this i am able to show the limit what i set. but this limt is
not working.

MaxJobs =3, current value = 0

Regards
Navin.

On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen 
wrote:

> On 2/17/20 11:16 AM, navin srivastava wrote:
> > i have an issue with the slurm job limit. i applied the Maxjobs limit on
> > user using
> >
> >   sacctmgr modify user navin1 set maxjobs=3
> >
> > but still i see this is not getting applied. i am still bale to submit
> > more jobs.
> > Slurm version is 17.11.x
> >
> > Let me know what setting is required to implement this.
>
> The tool "showuserlimits" tells you all user limits in the Slurm database.
>   You can download it from
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
> and give it a try:
>
> $ showuserlimits -u navin1
>
> /Ole
>
>


Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread Ole Holm Nielsen

Hi Navin,

Why do you think the limit is not working?  The MaxJobs limits the number 
of running jobs to 3, but you can still submit as many jobs as you like!


See "man sacctmgr" for definitions of the limits MaxJobs as well as 
MaxSubmitJobs.


/Ole

On 2/17/20 12:04 PM, navin srivastava wrote:

Hi,

Thanks for your script.
with this i am able to show the limit what i set. but this limt is 
not working.


MaxJobs =        3, current value = 0

Regards
Navin.

On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen 
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


On 2/17/20 11:16 AM, navin srivastava wrote:
 > i have an issue with the slurm job limit. i applied the Maxjobs
limit on
 > user using
 >
 >   sacctmgr modify user navin1 set maxjobs=3
 >
 > but still i see this is not getting applied. i am still bale to submit
 > more jobs.
 > Slurm version is 17.11.x
 >
 > Let me know what setting is required to implement this.

The tool "showuserlimits" tells you all user limits in the Slurm
database.
   You can download it from
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
and give it a try:

$ showuserlimits -u navin1




Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi ole,

i am submitting 100 of jobs are i see all jobs starting at the same time
and all job is going into the run state.
if Maxjobs limit is set it should allow only 3 jobs at any point of time.

Regards
Navin.




On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen 
wrote:

> Hi Navin,
>
> Why do you think the limit is not working?  The MaxJobs limits the number
> of running jobs to 3, but you can still submit as many jobs as you like!
>
> See "man sacctmgr" for definitions of the limits MaxJobs as well as
> MaxSubmitJobs.
>
> /Ole
>
> On 2/17/20 12:04 PM, navin srivastava wrote:
> > Hi,
> >
> > Thanks for your script.
> > with this i am able to show the limit what i set. but this limt is
> > not working.
> >
> > MaxJobs =3, current value = 0
> >
> > Regards
> > Navin.
> >
> > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
> > mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
> >
> > On 2/17/20 11:16 AM, navin srivastava wrote:
> >  > i have an issue with the slurm job limit. i applied the Maxjobs
> > limit on
> >  > user using
> >  >
> >  >   sacctmgr modify user navin1 set maxjobs=3
> >  >
> >  > but still i see this is not getting applied. i am still bale to
> submit
> >  > more jobs.
> >  > Slurm version is 17.11.x
> >  >
> >  > Let me know what setting is required to implement this.
> >
> > The tool "showuserlimits" tells you all user limits in the Slurm
> > database.
> >You can download it from
> >
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
> > and give it a try:
> >
> > $ showuserlimits -u navin1
>
>


Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread Ole Holm Nielsen

Hi Navin,

I wonder if you have configured the Slurm database and the slurmdbd 
daemon?  I think the limit enforcement requires the use of the database.


What is the output of:

$ scontrol show config | grep AccountingStorageEnforce

See also https://slurm.schedmd.com/accounting.html#limit-enforcement

Limit Enforcement

Various limits and limit enforcement are described in the Resource Limits 
web page.


To enable any limit enforcement you must at least have 
AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if you 
have limits set, they will not be enforced. Other options for 
AccountingStorageEnforce and the explanation for each are found on the 
Resource Limits document.


/Ole

On 2/17/20 12:20 PM, navin srivastava wrote:

Hi ole,

i am submitting 100 of jobs are i see all jobs starting at the same time 
and all job is going into the run state.

if Maxjobs limit is set it should allow only 3 jobs at any point of time.

Regards
Navin.




On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen 
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


Hi Navin,

Why do you think the limit is not working?  The MaxJobs limits the number
of running jobs to 3, but you can still submit as many jobs as you like!

See "man sacctmgr" for definitions of the limits MaxJobs as well as
MaxSubmitJobs.

/Ole

On 2/17/20 12:04 PM, navin srivastava wrote:
 > Hi,
 >
 > Thanks for your script.
 > with this i am able to show the limit what i set. but this limt is
 > not working.
 >
 > MaxJobs =        3, current value = 0
 >
 > Regards
 > Navin.
 >
 > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
 > mailto:ole.h.niel...@fysik.dtu.dk>
>> wrote:
 >
 >     On 2/17/20 11:16 AM, navin srivastava wrote:
 >      > i have an issue with the slurm job limit. i applied the Maxjobs
 >     limit on
 >      > user using
 >      >
 >      >   sacctmgr modify user navin1 set maxjobs=3
 >      >
 >      > but still i see this is not getting applied. i am still bale
to submit
 >      > more jobs.
 >      > Slurm version is 17.11.x
 >      >
 >      > Let me know what setting is required to implement this.
 >
 >     The tool "showuserlimits" tells you all user limits in the Slurm
 >     database.
 >        You can download it from
 >
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
 >     and give it a try:
 >
 >     $ showuserlimits -u navin1






[slurm-users] Cluster usage with Slurm

2020-02-17 Thread Parag Khuraswar
Hi Team,

 

Does Slurm  provide cluster usage reports like mentioned below ?

 

Detailed reports about cluster usage statistics.
Reports of every user and jobs including their
monthly usage, node usage, percentage of
utilization, History tracking, number of completed,
failed, queued and running jobs; estimated delay
and average job duration.

 

 

Regards,

Parag

 



Re: [slurm-users] Cluster usage with Slurm

2020-02-17 Thread Ole Holm Nielsen

On 2/17/20 1:19 PM, Parag Khuraswar wrote:

Hi Team,

Does Slurm  provide cluster usage reports like mentioned below ?

Detailed reports about cluster usage statistics.
Reports of every user and jobs including their
monthly usage, node usage, percentage of
utilization, History tracking, number of completed,
failed, queued and running jobs; estimated delay
and average job duration.


Yes, see the Slurm sreport command: https://slurm.schedmd.com/sreport.html

Information about setting up the accounting database can be seen here:

https://slurm.schedmd.com/accounting.html
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting

/Ole



Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread navin srivastava
Hi ole,

Thanks Ole.
After setting the Enforce it worked.
I am new to slurm to thanks for helping me.


Regards
Navin


On Mon, Feb 17, 2020 at 5:36 PM Ole Holm Nielsen 
wrote:

> Hi Navin,
>
> I wonder if you have configured the Slurm database and the slurmdbd
> daemon?  I think the limit enforcement requires the use of the database.
>
> What is the output of:
>
> $ scontrol show config | grep AccountingStorageEnforce
>
> See also https://slurm.schedmd.com/accounting.html#limit-enforcement
>
> Limit Enforcement
>
> Various limits and limit enforcement are described in the Resource Limits
> web page.
>
> To enable any limit enforcement you must at least have
> AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if you
> have limits set, they will not be enforced. Other options for
> AccountingStorageEnforce and the explanation for each are found on the
> Resource Limits document.
>
> /Ole
>
> On 2/17/20 12:20 PM, navin srivastava wrote:
> > Hi ole,
> >
> > i am submitting 100 of jobs are i see all jobs starting at the same time
> > and all job is going into the run state.
> > if Maxjobs limit is set it should allow only 3 jobs at any point of time.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> > On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen
> > mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
> >
> > Hi Navin,
> >
> > Why do you think the limit is not working?  The MaxJobs limits the
> number
> > of running jobs to 3, but you can still submit as many jobs as you
> like!
> >
> > See "man sacctmgr" for definitions of the limits MaxJobs as well as
> > MaxSubmitJobs.
> >
> > /Ole
> >
> > On 2/17/20 12:04 PM, navin srivastava wrote:
> >  > Hi,
> >  >
> >  > Thanks for your script.
> >  > with this i am able to show the limit what i set. but this limt is
> >  > not working.
> >  >
> >  > MaxJobs =3, current value = 0
> >  >
> >  > Regards
> >  > Navin.
> >  >
> >  > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
> >  > mailto:ole.h.niel...@fysik.dtu.dk>
> >  > >> wrote:
> >  >
> >  > On 2/17/20 11:16 AM, navin srivastava wrote:
> >  >  > i have an issue with the slurm job limit. i applied the
> Maxjobs
> >  > limit on
> >  >  > user using
> >  >  >
> >  >  >   sacctmgr modify user navin1 set maxjobs=3
> >  >  >
> >  >  > but still i see this is not getting applied. i am still
> bale
> > to submit
> >  >  > more jobs.
> >  >  > Slurm version is 17.11.x
> >  >  >
> >  >  > Let me know what setting is required to implement this.
> >  >
> >  > The tool "showuserlimits" tells you all user limits in the
> Slurm
> >  > database.
> >  >You can download it from
> >  >
> >
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
> >  > and give it a try:
> >  >
> >  > $ showuserlimits -u navin1
> >
>
>
>


Re: [slurm-users] Job limit in slurm.

2020-02-17 Thread Ole Holm Nielsen

Hi Navin,

Since you are new to Slurm, you may perhaps find my Slurm Wiki pages useful:
https://wiki.fysik.dtu.dk/niflheim/SLURM

These pages assume CentOS 7 Linux, but much of the information should be 
valid for other Linux variants as well.


/Ole

On 2/17/20 1:55 PM, navin srivastava wrote:

Hi ole,

Thanks Ole.
After setting the Enforce it worked.
I am new to slurm to thanks for helping me.


Regards
Navin


On Mon, Feb 17, 2020 at 5:36 PM Ole Holm Nielsen 
mailto:ole.h.niel...@fysik.dtu.dk>> wrote:


Hi Navin,

I wonder if you have configured the Slurm database and the slurmdbd
daemon?  I think the limit enforcement requires the use of the database.

What is the output of:

$ scontrol show config | grep AccountingStorageEnforce

See also https://slurm.schedmd.com/accounting.html#limit-enforcement

Limit Enforcement

Various limits and limit enforcement are described in the Resource Limits
web page.

To enable any limit enforcement you must at least have
AccountingStorageEnforce=limits in your slurm.conf, otherwise, even if
you
have limits set, they will not be enforced. Other options for
AccountingStorageEnforce and the explanation for each are found on the
Resource Limits document.

/Ole

On 2/17/20 12:20 PM, navin srivastava wrote:
 > Hi ole,
 >
 > i am submitting 100 of jobs are i see all jobs starting at the same
time
 > and all job is going into the run state.
 > if Maxjobs limit is set it should allow only 3 jobs at any point of
time.
 >
 > Regards
 > Navin.
 >
 >
 >
 >
 > On Mon, Feb 17, 2020 at 4:48 PM Ole Holm Nielsen
 > mailto:ole.h.niel...@fysik.dtu.dk>
>> wrote:
 >
 >     Hi Navin,
 >
 >     Why do you think the limit is not working?  The MaxJobs limits
the number
 >     of running jobs to 3, but you can still submit as many jobs as
you like!
 >
 >     See "man sacctmgr" for definitions of the limits MaxJobs as well as
 >     MaxSubmitJobs.
 >
 >     /Ole
 >
 >     On 2/17/20 12:04 PM, navin srivastava wrote:
 >      > Hi,
 >      >
 >      > Thanks for your script.
 >      > with this i am able to show the limit what i set. but this
limt is
 >      > not working.
 >      >
 >      > MaxJobs =        3, current value = 0
 >      >
 >      > Regards
 >      > Navin.
 >      >
 >      > On Mon, Feb 17, 2020 at 4:13 PM Ole Holm Nielsen
 >      > mailto:ole.h.niel...@fysik.dtu.dk> >
 >     
 >           >
 >      >     On 2/17/20 11:16 AM, navin srivastava wrote:
 >      >      > i have an issue with the slurm job limit. i applied
the Maxjobs
 >      >     limit on
 >      >      > user using
 >      >      >
 >      >      >   sacctmgr modify user navin1 set maxjobs=3
 >      >      >
 >      >      > but still i see this is not getting applied. i am
still bale
 >     to submit
 >      >      > more jobs.
 >      >      > Slurm version is 17.11.x
 >      >      >
 >      >      > Let me know what setting is required to implement this.
 >      >
 >      >     The tool "showuserlimits" tells you all user limits in
the Slurm
 >      >     database.
 >      >        You can download it from
 >      >
 >
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits
 >      >     and give it a try:
 >      >
 >      >     $ showuserlimits -u navin1
 >






Re: [slurm-users] Cluster usage with Slurm

2020-02-17 Thread Chris Samuel

On 17/2/20 4:19 am, Parag Khuraswar wrote:


Does Slurm  provide cluster usage reports like mentioned below ?


For the detailed info you're being asked for I'd probably suggest 
looking at the OpenXDMoD project.


https://open.xdmod.org/

Its "shredder" data importer can import data from a bunch of different 
batch systems, including Slurm.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Inconsistent cpu bindings with cpu-bind=none

2020-02-17 Thread Chris Samuel

On 17/2/20 12:48 am, Marcus Boden wrote:


I am facing a bit of a weird issue with CPU bindings and mpirun:


I think if you want Slurm to have any control over bindings you'll be 
wanting to use srun to launch your MPI program, not mpirun.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Cluster usage with Slurm

2020-02-17 Thread Paul Edmon
Also if you want tracking of fairshare and other stats in graphite, you 
can use these:


https://github.com/fasrc/slurm-diamond-collector

-Paul Edmon-

On 2/17/2020 8:57 AM, Chris Samuel wrote:

On 17/2/20 4:19 am, Parag Khuraswar wrote:


Does Slurm  provide cluster usage reports like mentioned below ?


For the detailed info you're being asked for I'd probably suggest 
looking at the OpenXDMoD project.


https://open.xdmod.org/

Its "shredder" data importer can import data from a bunch of different 
batch systems, including Slurm.


All the best,
Chris