[slurm-users] scontrol issue: Update of this parameter is not supported: StepId=
I am trying to change the JobName of a running task, like this scontrol update StepId=24236.0 JobName='my new name' and getting errors in the log file like this: Update of this parameter is not supported: StepId=24236.0 I get the same error when I try: scontrol update step StepId=24239.0 JobName='my new name' The following works to change the whole job's JobName scontrol update JobID=24239 JobName='my new name' Should updating the JobName for steps not be expected to work? Perhaps I have the syntax incorrect?
Re: [slurm-users] help with canceling or deleteing a job
👍 Best, Feng On Wed, Sep 20, 2023 at 7:29 AM Wagner, Marcus wrote: > Even after rebooting, sometimes nodes are stuck because of "completing > jobs". > > What helps then is to set the node down and resume it afterwards: > > scontrol update nodename= state=drain reason=stuck; scontrol > update nodename= state=resume > > > Best > Marcus > > Am 20.09.2023 um 09:11 schrieb Ole Holm Nielsen: > > On 9/20/23 01:39, Feng Zhang wrote: > >> Restarting the slurmd dameon of the compute node should work, if the > >> node is still online and normal. > > > > Probably not. If the filesystem used by the job is hung, the node > > must probably be rebooted, and the filesystem must be checked. > > > > /Ole > > > >> On Tue, Sep 19, 2023 at 8:03 AM Felix wrote: > >>> > >>> Hello > >>> > >>> I have a job on my system which is running more than its time, more > >>> than > >>> 4 days. > >>> > >>> 1808851 debug gridjob atlas01 CG 4-00:00:19 1 awn-047 > >>> > >>> I'm trying to cancel it > >>> > >>> [@arc7-node ~]# scancel 1808851 > >>> > >>> I get no message as if the job was canceled but when getting > >>> information > >>> about the job, the job is still there > >>> > >>> [@arc7-node ~]# squeue | grep awn-047 > >>> 1808851 debug gridjob atlas01 CG 4-00:00:19 1 > >>> awn-047 > >>> > >>> Can I do any other thinks to kill end the job? > > >
Re: [slurm-users] help with canceling or deleteing a job
Even after rebooting, sometimes nodes are stuck because of "completing jobs". What helps then is to set the node down and resume it afterwards: scontrol update nodename= state=drain reason=stuck; scontrol update nodename= state=resume Best Marcus Am 20.09.2023 um 09:11 schrieb Ole Holm Nielsen: On 9/20/23 01:39, Feng Zhang wrote: Restarting the slurmd dameon of the compute node should work, if the node is still online and normal. Probably not. If the filesystem used by the job is hung, the node must probably be rebooted, and the filesystem must be checked. /Ole On Tue, Sep 19, 2023 at 8:03 AM Felix wrote: Hello I have a job on my system which is running more than its time, more than 4 days. 1808851 debug gridjob atlas01 CG 4-00:00:19 1 awn-047 I'm trying to cancel it [@arc7-node ~]# scancel 1808851 I get no message as if the job was canceled but when getting information about the job, the job is still there [@arc7-node ~]# squeue | grep awn-047 1808851 debug gridjob atlas01 CG 4-00:00:19 1 awn-047 Can I do any other thinks to kill end the job? smime.p7s Description: Kryptografische S/MIME-Signatur
Re: [slurm-users] help with canceling or deleteing a job
On 9/20/23 01:39, Feng Zhang wrote: Restarting the slurmd dameon of the compute node should work, if the node is still online and normal. Probably not. If the filesystem used by the job is hung, the node must probably be rebooted, and the filesystem must be checked. /Ole On Tue, Sep 19, 2023 at 8:03 AM Felix wrote: Hello I have a job on my system which is running more than its time, more than 4 days. 1808851 debug gridjob atlas01 CG 4-00:00:19 1 awn-047 I'm trying to cancel it [@arc7-node ~]# scancel 1808851 I get no message as if the job was canceled but when getting information about the job, the job is still there [@arc7-node ~]# squeue | grep awn-047 1808851 debug gridjob atlas01 CG 4-00:00:19 1 awn-047 Can I do any other thinks to kill end the job?