[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo
Now what would be causing this? The srun just hangs and these are the only logs from slurmctld: [2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node007 [2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node006 [2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node005 [2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node009 [2024-02-24T23:23:26.003] error: Orphan StepId=463.extern reported on node node008 [2024-02-24T23:43:21.183] _slurm_rpc_complete_job_allocation: JobId=563 error Job/step already completing or completed [465.extern] error: common_file_write_content: unable to open '/sys/fs/cgroup/system.slice/slurmstepd.scope/job_463/step_extern/user/cgroup.freeze' for writing: Permission denied On Sat, Feb 24, 2024 at 12:09 PM Robert Kudyba wrote: > << > > Ah yes thanks for pointing that out. Hope this helps someone down the > line...perhaps the error detection could be more explicit in slurmctld? > > On Sat, Feb 24, 2024, 12:07 PM Chris Samuel via slurm-users < > slurm-users@lists.schedmd.com> wrote: > >> On 24/2/24 06:14, Robert Kudyba via slurm-users wrote: >> >> > For now I just set it to chmod 777 on /tmp and that fixed the errors. >> Is >> > there a better option? >> >> Traditionally /tmp and /var/tmp have been 1777 (that "1" being the >> sticky bit, originally invented to indicate that the OS should attempt >> to keep a frequently used binary in memory but then adopted to indicate >> special handling of a world writeable directory so users can only unlink >> objects they own and not others). >> >> Hope that helps! >> >> All the best, >> Chris >> -- >> Chris Samuel : >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.csamuel.org_=DwICAg=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY=1dr8K8YEcCyc4UDmIvmXWNuOled6fEZ424zSwluePPfhXD2Q5JVklrCrDUQU-mSW=ZbSiWLCu-81ZY1xhscjqczszYgOmqxUbVa6f2qUEd-o= >> : Berkeley, CA, USA >> >> >> -- >> slurm-users mailing list -- slurm-users@lists.schedmd.com >> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >> > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo
<< wrote: > On 24/2/24 06:14, Robert Kudyba via slurm-users wrote: > > > For now I just set it to chmod 777 on /tmp and that fixed the errors. Is > > there a better option? > > Traditionally /tmp and /var/tmp have been 1777 (that "1" being the > sticky bit, originally invented to indicate that the OS should attempt > to keep a frequently used binary in memory but then adopted to indicate > special handling of a world writeable directory so users can only unlink > objects they own and not others). > > Hope that helps! > > All the best, > Chris > -- > Chris Samuel : > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.csamuel.org_=DwICAg=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY=1dr8K8YEcCyc4UDmIvmXWNuOled6fEZ424zSwluePPfhXD2Q5JVklrCrDUQU-mSW=ZbSiWLCu-81ZY1xhscjqczszYgOmqxUbVa6f2qUEd-o= > : Berkeley, CA, USA > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo
On 24/2/24 06:14, Robert Kudyba via slurm-users wrote: For now I just set it to chmod 777 on /tmp and that fixed the errors. Is there a better option? Traditionally /tmp and /var/tmp have been 1777 (that "1" being the sticky bit, originally invented to indicate that the OS should attempt to keep a frequently used binary in memory but then adopted to indicate special handling of a world writeable directory so users can only unlink objects they own and not others). Hope that helps! All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo
<< wrote: > Hi Robert, > > On 2/23/24 17:38, Robert Kudyba via slurm-users wrote: > > > We switched over from using systemctl for tmp.mount and change to zram, > > e.g., > > modprobe zram > > echo 20GB > /sys/block/zram0/disksize > > mkfs.xfs /dev/zram0 > > mount -o discard /dev/zram0 /tmp > [...] > > [2024-02-23T20:26:15.881] [530.extern] error: setup_x11_forward: > failed to create temporary XAUTHORITY file: Permission denied > > Where do you set the permissions on /tmp ? What do you set them to? > > All the best, > Chris > -- > Chris Samuel : > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.csamuel.org_=DwICAg=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM=X0jL9y0sL4r4iU_qVtR3lLNo4tOL1ry_m7-psV3GejY=dmeaMvnkyzcOflY8XQKXwHbYw7wooGy71JGyj1fwEKHls6zdAR5Q2C5DxN-CFzsa=REC8OGrY-7z6qJAyYetQhVU6LQdDBV6ajjKgtqH0_jU= > : Berkeley, CA, USA > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com > -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: slurm-23.11.3-1 with X11 and zram causing permission errors: error: _forkexec_slurmstepd: slurmstepd failed to send return code got 0: Resource temporarily unavailable; Requeue of Jo
Hi Robert, On 2/23/24 17:38, Robert Kudyba via slurm-users wrote: We switched over from using systemctl for tmp.mount and change to zram, e.g., modprobe zram echo 20GB > /sys/block/zram0/disksize mkfs.xfs /dev/zram0 mount -o discard /dev/zram0 /tmp [...] > [2024-02-23T20:26:15.881] [530.extern] error: setup_x11_forward: failed to create temporary XAUTHORITY file: Permission denied Where do you set the permissions on /tmp ? What do you set them to? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com