[slurm-users] Re: [ext] Restricting local disk storage of jobs

2024-02-06 Thread Hagdorn, Magnus Karl Moritz via slurm-users
Hi Tim,
we are using the container/tmpfs plugin to map /tmp to a local NVMe
drive which works great. I did consider setting up directory quotas. I
thought the InitScript [1] option should do the trick. Alas, I didn't
get it to work. If I remember correctly, slurm complained about the
option being present. In the end we recommend our users to make
exclusive use a node if they are going to use a lot of local scratch
space. I don't think this happens very often if at all.
Regards
magnus

[1] 
https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript


On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote:
> Hi,
> 
> In our SLURM cluster, we are using the job_container/tmpfs plugin to 
> ensure that each user can use /tmp and it gets cleaned up after them.
> Currently, we are mapping /tmp into the nodes RAM, which means that
> the 
> cgroups make sure that users can only use a certain amount of storage
> inside /tmp.
> 
> Now we would like to use of the node's local SSD instead of its RAM
> to 
> hold the files in /tmp. I have seen people define local storage as
> GRES, 
> but I am wondering how to make sure that users do not exceed the
> storage 
> space they requested in a job. Does anyone have an idea how to
> configure 
> local storage as a proper tracked resource?
> 
> Thanks a lot in advance!
> 
> Best,
> 
> Tim
> 
> 

-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Mitte
BALTIC - Invalidenstraße 120/121
10115 Berlin
 
magnus.hagd...@charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpd...@charite.de


smime.p7s
Description: S/MIME cryptographic signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [ext] Restricting local disk storage of jobs

2024-02-06 Thread Tim Schneider via slurm-users

Hi Magnus,

thanks for your reply! If you can, would you mind sharing the InitScript 
of your attempt at getting it to work?


Best,

Tim

On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote:

Hi Tim,
we are using the container/tmpfs plugin to map /tmp to a local NVMe
drive which works great. I did consider setting up directory quotas. I
thought the InitScript [1] option should do the trick. Alas, I didn't
get it to work. If I remember correctly, slurm complained about the
option being present. In the end we recommend our users to make
exclusive use a node if they are going to use a lot of local scratch
space. I don't think this happens very often if at all.
Regards
magnus

[1]
https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript


On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote:

Hi,

In our SLURM cluster, we are using the job_container/tmpfs plugin to
ensure that each user can use /tmp and it gets cleaned up after them.
Currently, we are mapping /tmp into the nodes RAM, which means that
the
cgroups make sure that users can only use a certain amount of storage
inside /tmp.

Now we would like to use of the node's local SSD instead of its RAM
to
hold the files in /tmp. I have seen people define local storage as
GRES,
but I am wondering how to make sure that users do not exceed the
storage
space they requested in a job. Does anyone have an idea how to
configure
local storage as a proper tracked resource?

Thanks a lot in advance!

Best,

Tim




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [ext] Restricting local disk storage of jobs

2024-02-06 Thread Hagdorn, Magnus Karl Moritz via slurm-users
Hi Tim,
in the end the InitScript didn't contain anything useful because 

slurmd: error: _parse_next_key: Parsing error at unrecognized key:
InitScript

At this stage I gave up. This was with SLURM 23.02. My plan was to
setup the local scratch directory with XFS and then get the script to
apply a project quota, ie quota attached to the directory.

I would start by checking if slurm recognises the InitScript option. 

Regards
magnus

On Tue, 2024-02-06 at 15:24 +0100, Tim Schneider wrote:
> Hi Magnus,
> 
> thanks for your reply! If you can, would you mind sharing the
> InitScript 
> of your attempt at getting it to work?
> 
> Best,
> 
> Tim
> 
> On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote:
> > Hi Tim,
> > we are using the container/tmpfs plugin to map /tmp to a local NVMe
> > drive which works great. I did consider setting up directory
> > quotas. I
> > thought the InitScript [1] option should do the trick. Alas, I
> > didn't
> > get it to work. If I remember correctly, slurm complained about the
> > option being present. In the end we recommend our users to make
> > exclusive use a node if they are going to use a lot of local
> > scratch
> > space. I don't think this happens very often if at all.
> > Regards
> > magnus
> > 
> > [1]
> > https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript
> > 
> > 
> > On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users
> > wrote:
> > > Hi,
> > > 
> > > In our SLURM cluster, we are using the job_container/tmpfs plugin
> > > to
> > > ensure that each user can use /tmp and it gets cleaned up after
> > > them.
> > > Currently, we are mapping /tmp into the nodes RAM, which means
> > > that
> > > the
> > > cgroups make sure that users can only use a certain amount of
> > > storage
> > > inside /tmp.
> > > 
> > > Now we would like to use of the node's local SSD instead of its
> > > RAM
> > > to
> > > hold the files in /tmp. I have seen people define local storage
> > > as
> > > GRES,
> > > but I am wondering how to make sure that users do not exceed the
> > > storage
> > > space they requested in a job. Does anyone have an idea how to
> > > configure
> > > local storage as a proper tracked resource?
> > > 
> > > Thanks a lot in advance!
> > > 
> > > Best,
> > > 
> > > Tim
> > > 
> > > 

-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Mitte
BALTIC - Invalidenstraße 120/121
10115 Berlin
 
magnus.hagd...@charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpd...@charite.de


smime.p7s
Description: S/MIME cryptographic signature

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [ext] Restricting local disk storage of jobs

2024-02-06 Thread Tim Schneider via slurm-users

Hi Magnus,

I understand. Thanks a lot for your suggestion.

Best,

Tim

On 06.02.24 15:34, Hagdorn, Magnus Karl Moritz wrote:

Hi Tim,
in the end the InitScript didn't contain anything useful because

slurmd: error: _parse_next_key: Parsing error at unrecognized key:
InitScript

At this stage I gave up. This was with SLURM 23.02. My plan was to
setup the local scratch directory with XFS and then get the script to
apply a project quota, ie quota attached to the directory.

I would start by checking if slurm recognises the InitScript option.

Regards
magnus

On Tue, 2024-02-06 at 15:24 +0100, Tim Schneider wrote:

Hi Magnus,

thanks for your reply! If you can, would you mind sharing the
InitScript
of your attempt at getting it to work?

Best,

Tim

On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote:

Hi Tim,
we are using the container/tmpfs plugin to map /tmp to a local NVMe
drive which works great. I did consider setting up directory
quotas. I
thought the InitScript [1] option should do the trick. Alas, I
didn't
get it to work. If I remember correctly, slurm complained about the
option being present. In the end we recommend our users to make
exclusive use a node if they are going to use a lot of local
scratch
space. I don't think this happens very often if at all.
Regards
magnus

[1]
https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript


On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users
wrote:

Hi,

In our SLURM cluster, we are using the job_container/tmpfs plugin
to
ensure that each user can use /tmp and it gets cleaned up after
them.
Currently, we are mapping /tmp into the nodes RAM, which means
that
the
cgroups make sure that users can only use a certain amount of
storage
inside /tmp.

Now we would like to use of the node's local SSD instead of its
RAM
to
hold the files in /tmp. I have seen people define local storage
as
GRES,
but I am wondering how to make sure that users do not exceed the
storage
space they requested in a job. Does anyone have an idea how to
configure
local storage as a proper tracked resource?

Thanks a lot in advance!

Best,

Tim




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com