[slurm-users] Re: [ext] Restricting local disk storage of jobs
Hi Tim, we are using the container/tmpfs plugin to map /tmp to a local NVMe drive which works great. I did consider setting up directory quotas. I thought the InitScript [1] option should do the trick. Alas, I didn't get it to work. If I remember correctly, slurm complained about the option being present. In the end we recommend our users to make exclusive use a node if they are going to use a lot of local scratch space. I don't think this happens very often if at all. Regards magnus [1] https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote: > Hi, > > In our SLURM cluster, we are using the job_container/tmpfs plugin to > ensure that each user can use /tmp and it gets cleaned up after them. > Currently, we are mapping /tmp into the nodes RAM, which means that > the > cgroups make sure that users can only use a certain amount of storage > inside /tmp. > > Now we would like to use of the node's local SSD instead of its RAM > to > hold the files in /tmp. I have seen people define local storage as > GRES, > but I am wondering how to make sure that users do not exceed the > storage > space they requested in a job. Does anyone have an idea how to > configure > local storage as a proper tracked resource? > > Thanks a lot in advance! > > Best, > > Tim > > -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] Restricting local disk storage of jobs
Hi Magnus, thanks for your reply! If you can, would you mind sharing the InitScript of your attempt at getting it to work? Best, Tim On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote: Hi Tim, we are using the container/tmpfs plugin to map /tmp to a local NVMe drive which works great. I did consider setting up directory quotas. I thought the InitScript [1] option should do the trick. Alas, I didn't get it to work. If I remember correctly, slurm complained about the option being present. In the end we recommend our users to make exclusive use a node if they are going to use a lot of local scratch space. I don't think this happens very often if at all. Regards magnus [1] https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote: Hi, In our SLURM cluster, we are using the job_container/tmpfs plugin to ensure that each user can use /tmp and it gets cleaned up after them. Currently, we are mapping /tmp into the nodes RAM, which means that the cgroups make sure that users can only use a certain amount of storage inside /tmp. Now we would like to use of the node's local SSD instead of its RAM to hold the files in /tmp. I have seen people define local storage as GRES, but I am wondering how to make sure that users do not exceed the storage space they requested in a job. Does anyone have an idea how to configure local storage as a proper tracked resource? Thanks a lot in advance! Best, Tim -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] Restricting local disk storage of jobs
Hi Tim, in the end the InitScript didn't contain anything useful because slurmd: error: _parse_next_key: Parsing error at unrecognized key: InitScript At this stage I gave up. This was with SLURM 23.02. My plan was to setup the local scratch directory with XFS and then get the script to apply a project quota, ie quota attached to the directory. I would start by checking if slurm recognises the InitScript option. Regards magnus On Tue, 2024-02-06 at 15:24 +0100, Tim Schneider wrote: > Hi Magnus, > > thanks for your reply! If you can, would you mind sharing the > InitScript > of your attempt at getting it to work? > > Best, > > Tim > > On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote: > > Hi Tim, > > we are using the container/tmpfs plugin to map /tmp to a local NVMe > > drive which works great. I did consider setting up directory > > quotas. I > > thought the InitScript [1] option should do the trick. Alas, I > > didn't > > get it to work. If I remember correctly, slurm complained about the > > option being present. In the end we recommend our users to make > > exclusive use a node if they are going to use a lot of local > > scratch > > space. I don't think this happens very often if at all. > > Regards > > magnus > > > > [1] > > https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript > > > > > > On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users > > wrote: > > > Hi, > > > > > > In our SLURM cluster, we are using the job_container/tmpfs plugin > > > to > > > ensure that each user can use /tmp and it gets cleaned up after > > > them. > > > Currently, we are mapping /tmp into the nodes RAM, which means > > > that > > > the > > > cgroups make sure that users can only use a certain amount of > > > storage > > > inside /tmp. > > > > > > Now we would like to use of the node's local SSD instead of its > > > RAM > > > to > > > hold the files in /tmp. I have seen people define local storage > > > as > > > GRES, > > > but I am wondering how to make sure that users do not exceed the > > > storage > > > space they requested in a job. Does anyone have an idea how to > > > configure > > > local storage as a proper tracked resource? > > > > > > Thanks a lot in advance! > > > > > > Best, > > > > > > Tim > > > > > > -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] Restricting local disk storage of jobs
Hi Magnus, I understand. Thanks a lot for your suggestion. Best, Tim On 06.02.24 15:34, Hagdorn, Magnus Karl Moritz wrote: Hi Tim, in the end the InitScript didn't contain anything useful because slurmd: error: _parse_next_key: Parsing error at unrecognized key: InitScript At this stage I gave up. This was with SLURM 23.02. My plan was to setup the local scratch directory with XFS and then get the script to apply a project quota, ie quota attached to the directory. I would start by checking if slurm recognises the InitScript option. Regards magnus On Tue, 2024-02-06 at 15:24 +0100, Tim Schneider wrote: Hi Magnus, thanks for your reply! If you can, would you mind sharing the InitScript of your attempt at getting it to work? Best, Tim On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote: Hi Tim, we are using the container/tmpfs plugin to map /tmp to a local NVMe drive which works great. I did consider setting up directory quotas. I thought the InitScript [1] option should do the trick. Alas, I didn't get it to work. If I remember correctly, slurm complained about the option being present. In the end we recommend our users to make exclusive use a node if they are going to use a lot of local scratch space. I don't think this happens very often if at all. Regards magnus [1] https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote: Hi, In our SLURM cluster, we are using the job_container/tmpfs plugin to ensure that each user can use /tmp and it gets cleaned up after them. Currently, we are mapping /tmp into the nodes RAM, which means that the cgroups make sure that users can only use a certain amount of storage inside /tmp. Now we would like to use of the node's local SSD instead of its RAM to hold the files in /tmp. I have seen people define local storage as GRES, but I am wondering how to make sure that users do not exceed the storage space they requested in a job. Does anyone have an idea how to configure local storage as a proper tracked resource? Thanks a lot in advance! Best, Tim -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com