[slurm-users] Re: [ext] API - Specify GPUs
On Fr, 2024-07-26 at 19:34 +, jpuerto--- via slurm-users wrote: > It does not seem that the REST API allows for folks to configure > their jobs to utilize GPUs, using the traditional methods. IE, there > does not appear to be an equivalent between the --gpus (or --gres) > flag on sbatch/srun and the REST API's job submission endpoint. Can > anyone point me towards what should be used if the version we are on > does not support tres specifications? I think the API only supports tres specification. that is certainly how I got it to work. smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] scrontab question
Hm, strange. I don't see a problem with the time specs, although I would use */5 * * * * to run something every 5 minutes. In my scrontab I also specify a partition, etc. But I don't think that is necessary. regards magnus On Di, 2024-05-07 at 12:06 -0500, Sandor via slurm-users wrote: > I am working out the details of scrontab. My initial testing is > giving me an unsolvable question > Within scrontab editor I have the following example from the slurm > documentation: > > 0,5,10,15,20,25,30,35,40,45,50,55 * * * * > /directory/subdirectory/crontest.sh > > When I save it, scrontab marks the line with #BAD: I do not > understand why. The only difference I have is the directory > structure. > > Is there an underlying assumption that traditional Linux crontab is > available to the general user? > smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?
On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote: > for us, we put a load balancer in front of the login nodes with > session > affinity enabled. This makes them land on the same backend node each > time. Hi Brian, that sounds interesting - how did you implement session affinity? cheers magnus -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] Restricting local disk storage of jobs
Hi Tim, in the end the InitScript didn't contain anything useful because slurmd: error: _parse_next_key: Parsing error at unrecognized key: InitScript At this stage I gave up. This was with SLURM 23.02. My plan was to setup the local scratch directory with XFS and then get the script to apply a project quota, ie quota attached to the directory. I would start by checking if slurm recognises the InitScript option. Regards magnus On Tue, 2024-02-06 at 15:24 +0100, Tim Schneider wrote: > Hi Magnus, > > thanks for your reply! If you can, would you mind sharing the > InitScript > of your attempt at getting it to work? > > Best, > > Tim > > On 06.02.24 15:19, Hagdorn, Magnus Karl Moritz wrote: > > Hi Tim, > > we are using the container/tmpfs plugin to map /tmp to a local NVMe > > drive which works great. I did consider setting up directory > > quotas. I > > thought the InitScript [1] option should do the trick. Alas, I > > didn't > > get it to work. If I remember correctly, slurm complained about the > > option being present. In the end we recommend our users to make > > exclusive use a node if they are going to use a lot of local > > scratch > > space. I don't think this happens very often if at all. > > Regards > > magnus > > > > [1] > > https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript > > > > > > On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users > > wrote: > > > Hi, > > > > > > In our SLURM cluster, we are using the job_container/tmpfs plugin > > > to > > > ensure that each user can use /tmp and it gets cleaned up after > > > them. > > > Currently, we are mapping /tmp into the nodes RAM, which means > > > that > > > the > > > cgroups make sure that users can only use a certain amount of > > > storage > > > inside /tmp. > > > > > > Now we would like to use of the node's local SSD instead of its > > > RAM > > > to > > > hold the files in /tmp. I have seen people define local storage > > > as > > > GRES, > > > but I am wondering how to make sure that users do not exceed the > > > storage > > > space they requested in a job. Does anyone have an idea how to > > > configure > > > local storage as a proper tracked resource? > > > > > > Thanks a lot in advance! > > > > > > Best, > > > > > > Tim > > > > > > -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: [ext] Restricting local disk storage of jobs
Hi Tim, we are using the container/tmpfs plugin to map /tmp to a local NVMe drive which works great. I did consider setting up directory quotas. I thought the InitScript [1] option should do the trick. Alas, I didn't get it to work. If I remember correctly, slurm complained about the option being present. In the end we recommend our users to make exclusive use a node if they are going to use a lot of local scratch space. I don't think this happens very often if at all. Regards magnus [1] https://slurm.schedmd.com/job_container.conf.html#OPT_InitScript On Tue, 2024-02-06 at 14:39 +0100, Tim Schneider via slurm-users wrote: > Hi, > > In our SLURM cluster, we are using the job_container/tmpfs plugin to > ensure that each user can use /tmp and it gets cleaned up after them. > Currently, we are mapping /tmp into the nodes RAM, which means that > the > cgroups make sure that users can only use a certain amount of storage > inside /tmp. > > Now we would like to use of the node's local SSD instead of its RAM > to > hold the files in /tmp. I have seen people define local storage as > GRES, > but I am wondering how to make sure that users do not exceed the > storage > space they requested in a job. Does anyone have an idea how to > configure > local storage as a proper tracked resource? > > Thanks a lot in advance! > > Best, > > Tim > > -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Mitte BALTIC - Invalidenstraße 120/121 10115 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com