Re: [slurm-users] [ext] Re: Cleanup of job_container/tmpfs
I just upgrade slurm to 23.02 on our test cluster to try out the new job_container/tmpfs stuff. I can confirm it works with autofs (hurrah!) but you need to set the Shared=true option in the job_container.conf file. Cheers magnus On Tue, 2023-03-07 at 09:19 +0100, Ole Holm Nielsen wrote: > Hi Brian, > > Presumably the users' home directory is NFS automounted using autofs, > and > therefore it doesn't exist when the job starts. > > The job_container/tmpfs plugin ought to work correctly with autofs, > but > maybe this is still broken in 23.02? > > /Ole > > > On 3/6/23 21:06, Brian Andrus wrote: > > That looks like the users' home directory doesn't exist on the > > node. > > > > If you are not using a shared home for the nodes, your onboarding > > process > > should be looked at to ensure it can handle any issues that may > > arise. > > > > If you are using a shared home, you should do the above and have > > the node > > ensure the shared filesystems are mounted before allowing jobs. > > > > -Brian Andrus > > > > On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote: > > > Hi all > > > > > > Seems there still are some issues with the autofs - > > > job_container/tmpfs > > > functionality in Slurm 23.02. > > > If the required directories aren't mounted on the allocated > > > node(s) > > > before jobstart, we get: > > > > > > slurmstepd: error: couldn't chdir to `/users/lutest': No such > > > file or > > > directory: going to /tmp instead > > > slurmstepd: error: couldn't chdir to `/users/lutest': No such > > > file or > > > directory: going to /tmp instead > > > > > > An easy workaround however, is to include this line in the slurm > > > prolog > > > on the slurmd -nodes: > > > > > > /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true > > > > > > -but there might exist a better way to solve the problem? > -- Magnus Hagdorn Charité – Universitätsmedizin Berlin Geschäftsbereich IT | Scientific Computing Campus Charité Virchow Klinikum Forum 4 | Ebene 02 | Raum 2.020 Augustenburger Platz 1 13353 Berlin magnus.hagd...@charite.de https://www.charite.de HPC Helpdesk: sc-hpc-helpd...@charite.de smime.p7s Description: S/MIME cryptographic signature
Re: [slurm-users] [ext] Re: Cleanup of job_container/tmpfs
That was exactly the bit I was missing. Thank you very much, Magnus! Best Niels Carl On 3/7/23 3:13 PM, Hagdorn, Magnus Karl Moritz wrote: I just upgrade slurm to 23.02 on our test cluster to try out the new job_container/tmpfs stuff. I can confirm it works with autofs (hurrah!) but you need to set the Shared=true option in the job_container.conf file. Cheers magnus On Tue, 2023-03-07 at 09:19 +0100, Ole Holm Nielsen wrote: Hi Brian, Presumably the users' home directory is NFS automounted using autofs, and therefore it doesn't exist when the job starts. The job_container/tmpfs plugin ought to work correctly with autofs, but maybe this is still broken in 23.02? /Ole On 3/6/23 21:06, Brian Andrus wrote: That looks like the users' home directory doesn't exist on the node. If you are not using a shared home for the nodes, your onboarding process should be looked at to ensure it can handle any issues that may arise. If you are using a shared home, you should do the above and have the node ensure the shared filesystems are mounted before allowing jobs. -Brian Andrus On 3/6/2023 1:15 AM, Niels Carl W. Hansen wrote: Hi all Seems there still are some issues with the autofs - job_container/tmpfs functionality in Slurm 23.02. If the required directories aren't mounted on the allocated node(s) before jobstart, we get: slurmstepd: error: couldn't chdir to `/users/lutest': No such file or directory: going to /tmp instead slurmstepd: error: couldn't chdir to `/users/lutest': No such file or directory: going to /tmp instead An easy workaround however, is to include this line in the slurm prolog on the slurmd -nodes: /usr/bin/su - $SLURM_JOB_USER -c /usr/bin/true -but there might exist a better way to solve the problem?