date:20210128

Re: [slurm-users] associations, limits,qos

2021-01-28 Thread Diego Zuccato

Il 25/01/21 14:46, Durai Arasan ha scritto:

> Jobs submitted with sbatch cannot run on multiple partitions. The job
> will be submitted to the partition where it can start first. (from
> sbatch reference)
Did I misunderstand or heterogeneous jobs can workaround this limitation?

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Re: [slurm-users] Fairshare tree after SLURM upgrade

2021-01-28 Thread Ole Holm Nielsen


On 1/29/21 8:03 AM, Gestió Servidors wrote:
I’m going to upgrade my SLURM version from 17.11.5 to 19.05.1. I know this 
is not the last version, but I manage another cluster that is running, 
also, this version. My question is: during the process, I need to upgrade 
“slurmdbd”. All the fairshare tree (with rawusage, effectvusage, 
fairshare, etc), will be kept in the new version after upgrading?


Beware: You can as a maximum upgrade only by 2 Slurm major versions!
This is well known, see a summary in 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm


All data in your database will be migrated correctly to the new Slurm 
version.  This assumes that your upgrade process worked without errors! 
Older MySQL versions may have problems!


Therefore it is critical to first test the database upgrade on a test 
system!  Please see the above page for advice on the upgrade testing.


/Ole

Re: [slurm-users] how do array jobs stored in slurmdb database?

2021-01-28 Thread Ole Holm Nielsen


On 1/29/21 3:51 AM, taleinterve...@sjtu.edu.cn wrote:

The reason we need to delete job record from database is our billing system 
will calculate user cost from these historical records. But after a slurm 
system faulty there will be some specific jobs which should not be charged. it 
seems the best practical solution is to directly modify the database since 
slurm does not provide commend to delete job records.


I think the sreport command is normally used to generate accounting 
reports.  I have described this in my Wiki page 
https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting#accounting-reports


I would like to understand how you have chosen to calculate user cost of a 
given job using the sacct command?  The sacct command will report 
accounting for each individual job, so which sacct options do you use to 
get the total cost value for a user with many jobs?


/Ole



-邮件原件-
发件人: Ole Holm Nielsen 
发送时间: 2021年1月29日 0:14
收件人: slurm-users@lists.schedmd.com
主题: Re: [slurm-users] how do array jobs stored in slurmdb database?

On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote:

  From query command such as ‘sacct -j 123456’ I can see a series of
jobs named 123456_1, 123456_2, etc. And I need to delete these job
records from mysql database for some reason.

But in job_table of slurmdb, there is only one record with id_job=123456.
not any record has a id like 123456_2. After I delete the
id_job=123456 record, sacct result show the 123456_1 job disappeared,
but other jobs in the array still exist. So how do these array job recorded in 
the database?
And how to completely delete all the jobs in a array?


I think you need to study how job arrays are implemented in Slurm, please read 
https://slurm.schedmd.com/job_array.html

You will discover that job arrays, when each individual jobs start running, 
become independent jobs and obtain their own unique JobIDs.  It must be those 
JobIDs that will appear in the Slurm database.

This command illustrates the different JobID types (please read the squeue 
manual page about ArrayJobID,JobArrayID,JobID):

$ squeue  -j 3394902 -O ArrayJobID,JobArrayID,JobID
ARRAY_JOB_IDJOBID   JOBID
3394902 3394902_[18-91] 3394902
3394902 3394902_17  3394919
3394902 3394902_16  3394918
3394902 3394902_15  3394917
3394902 3394902_14  3394916

The last 4 jobs are running, while the first job i still pending.

Perhaps you may find my "showjob" script useful:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
In this script you can see how I work with array jobs.

I did not answer your question about how to delete array jobs in the Slurm 
database.  But in most cases manipulating the database directly is probably a 
bad idea.  I wonder why you want to delete jobs in the database at all?

Best regards,
Ole

Re: [slurm-users] how do array jobs stored in slurmdb database?

2021-01-28 Thread Ole Holm Nielsen


On 1/29/21 3:51 AM, taleinterve...@sjtu.edu.cn wrote:

Thanks for the help. The doc page is useful and we can get the actual job id 
now.


I'm glad that you solved the problem.


The reason we need to delete job record from database is our billing system 
will calculate user cost from these historical records. But after a slurm 
system faulty there will be some specific jobs which should not be charged. it 
seems the best practical solution is to directly modify the database since 
slurm does not provide commend to delete job records.


I understand, but I think there ought to be a way to set the charging cost 
(money) of specific jobs in the database to zero.  I will post a separate 
message about that.


/Ole

-邮件原件-> 





--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620

发件人: Ole Holm Nielsen 
发送时间: 2021年1月29日 0:14
收件人: slurm-users@lists.schedmd.com
主题: Re: [slurm-users] how do array jobs stored in slurmdb database?

On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote:

  From query command such as ‘sacct -j 123456’ I can see a series of
jobs named 123456_1, 123456_2, etc. And I need to delete these job
records from mysql database for some reason.

But in job_table of slurmdb, there is only one record with id_job=123456.
not any record has a id like 123456_2. After I delete the
id_job=123456 record, sacct result show the 123456_1 job disappeared,
but other jobs in the array still exist. So how do these array job recorded in 
the database?
And how to completely delete all the jobs in a array?


I think you need to study how job arrays are implemented in Slurm, please read 
https://slurm.schedmd.com/job_array.html

You will discover that job arrays, when each individual jobs start running, 
become independent jobs and obtain their own unique JobIDs.  It must be those 
JobIDs that will appear in the Slurm database.

This command illustrates the different JobID types (please read the squeue 
manual page about ArrayJobID,JobArrayID,JobID):

$ squeue  -j 3394902 -O ArrayJobID,JobArrayID,JobID
ARRAY_JOB_IDJOBID   JOBID
3394902 3394902_[18-91] 3394902
3394902 3394902_17  3394919
3394902 3394902_16  3394918
3394902 3394902_15  3394917
3394902 3394902_14  3394916

The last 4 jobs are running, while the first job i still pending.

Perhaps you may find my "showjob" script useful:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
In this script you can see how I work with array jobs.

I did not answer your question about how to delete array jobs in the Slurm 
database.  But in most cases manipulating the database directly is probably a 
bad idea.  I wonder why you want to delete jobs in the database at all?

Best regards,
Ole

[slurm-users] Fairshare tree after SLURM upgrade

2021-01-28 Thread Gestió Servidors

Hello,

I'm going to upgrade my SLURM version from 17.11.5 to 19.05.1. I know this is 
not the last version, but I manage another cluster that is running, also, this 
version. My question is: during the process, I need to upgrade "slurmdbd". All 
the fairshare tree (with rawusage, effectvusage, fairshare, etc), will be kept 
in the new version after upgrading?

Thanks.

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler


Thanks for the explanation Brian.  Seems turning on IOMMU helped, as well as 
added sharing to slurm.conf:

SelectType=select/cons_res
SelectTypeParameters=CR_CPU

Now all the CPUs are being used on all the compute nodes so things are working 
as expected.

Thanks to everyone else on the list who helped also, Andy, Ole, Chris, 
appreciate it!  Looking forward to helping out where I can as well.

Brian Andrus wrote on 1/28/21 15:50:

External Email

Yep, Looks like you are on the right track.

If the CPU count does not make sense to slurm, it will drain the node and jobs 
will not be able to start on them.

There does seem more to it though. Detailed info about a job and node would 
help.

The 'priority' pending jobs, you can ignore. Those aren't starting because 
another job is supposed to go first. That is the one with 'Resources' as the 
reason.

Resources means the scheduler has allocated the resources on the node such that 
there aren't any left to be used.
My bet here is that you aren't specifying memory. If you don't specify it, 
slurm assumes all memory on the node for the job. So, even if you are only 
using 1 cpu, all the memory is allocated, leaving none for any other job to run 
on the unallocated cpus.

Brian Andrus

On 1/28/2021 2:15 PM, Chandler wrote:


Brian Andrus wrote on 1/28/21 13:59:

What are the specific requests for resources from a job?
Nodes, Cores, Memory, threads, etc?


Well the jobs are only asking for 16 CPUs each.  The 255 threads is weird 
though, seems to be related to this,
https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads

The vendor recommended to turn on IOMMU in the BIOS so I will try that and see 
if it helps

Re: [slurm-users] how do array jobs stored in slurmdb database?

2021-01-28 Thread taleintervenor

Thanks for the help. The doc page is useful and we can get the actual job id 
now.

The reason we need to delete job record from database is our billing system 
will calculate user cost from these historical records. But after a slurm 
system faulty there will be some specific jobs which should not be charged. it 
seems the best practical solution is to directly modify the database since 
slurm does not provide commend to delete job records.

-邮件原件-
发件人: Ole Holm Nielsen  
发送时间: 2021年1月29日 0:14
收件人: slurm-users@lists.schedmd.com
主题: Re: [slurm-users] how do array jobs stored in slurmdb database?

On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote:
>  From query command such as ‘sacct -j 123456’ I can see a series of 
> jobs named 123456_1, 123456_2, etc. And I need to delete these job 
> records from mysql database for some reason.
> 
> But in job_table of slurmdb, there is only one record with id_job=123456. 
> not any record has a id like 123456_2. After I delete the 
> id_job=123456 record, sacct result show the 123456_1 job disappeared, 
> but other jobs in the array still exist. So how do these array job recorded 
> in the database?
> And how to completely delete all the jobs in a array?

I think you need to study how job arrays are implemented in Slurm, please read 
https://slurm.schedmd.com/job_array.html

You will discover that job arrays, when each individual jobs start running, 
become independent jobs and obtain their own unique JobIDs.  It must be those 
JobIDs that will appear in the Slurm database.

This command illustrates the different JobID types (please read the squeue 
manual page about ArrayJobID,JobArrayID,JobID):

$ squeue  -j 3394902 -O ArrayJobID,JobArrayID,JobID
ARRAY_JOB_IDJOBID   JOBID
3394902 3394902_[18-91] 3394902
3394902 3394902_17  3394919
3394902 3394902_16  3394918
3394902 3394902_15  3394917
3394902 3394902_14  3394916

The last 4 jobs are running, while the first job i still pending.

Perhaps you may find my "showjob" script useful:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
In this script you can see how I work with array jobs.

I did not answer your question about how to delete array jobs in the Slurm 
database.  But in most cases manipulating the database directly is probably a 
bad idea.  I wonder why you want to delete jobs in the database at all?

Best regards,
Ole

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Brian Andrus


Yep, Looks like you are on the right track.

If the CPU count does not make sense to slurm, it will drain the node 
and jobs will not be able to start on them.


There does seem more to it though. Detailed info about a job and node 
would help.


The 'priority' pending jobs, you can ignore. Those aren't starting 
because another job is supposed to go first. That is the one with 
'Resources' as the reason.


Resources means the scheduler has allocated the resources on the node 
such that there aren't any left to be used.
My bet here is that you aren't specifying memory. If you don't specify 
it, slurm assumes all memory on the node for the job. So, even if you 
are only using 1 cpu, all the memory is allocated, leaving none for any 
other job to run on the unallocated cpus.


Brian Andrus

On 1/28/2021 2:15 PM, Chandler wrote:


Brian Andrus wrote on 1/28/21 13:59:

What are the specific requests for resources from a job?
Nodes, Cores, Memory, threads, etc?


Well the jobs are only asking for 16 CPUs each.  The 255 threads is 
weird though, seems to be related to this,
https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads 



The vendor recommended to turn on IOMMU in the BIOS so I will try that 
and see if it helps

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler




Brian Andrus wrote on 1/28/21 13:59:

What are the specific requests for resources from a job?
Nodes, Cores, Memory, threads, etc?


Well the jobs are only asking for 16 CPUs each.  The 255 threads is weird 
though, seems to be related to this,
https://askubuntu.com/questions/1182818/dual-amd-epyc-7742-cpus-show-only-255-threads

The vendor recommended to turn on IOMMU in the BIOS so I will try that and see 
if it helps

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Brian Andrus


You are getting close :)
You can see why n010 is able to have multiple jobs. It shows more 
resources available.


What are the specific requests for resources from a job?
Nodes, Cores, Memory, threads, etc?

Brian Andrus

On 1/28/2021 12:52 PM, Chandler wrote:

OK I'm getting this same output on nodes n[011-013]:

# slurmd -C
NodeName=n011 slurmd: error: FastSchedule will be removed in 20.02, as 
will the FastSchedule=0 functionality. Please consider removing this 
from your configuration now.

slurmd: Considering each NUMA node as a socket
slurmd: error: Thread count (255) not multiple of core count (128)
CPUs=255 Boards=1 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 
RealMemory=1031878

UpTime=86-20:59:54
#

but on n010 it looks like:

# slurmd -C
NodeName=n010 CPUs=256 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 
ThreadsPerCore=2 RealMemory=1031887

UpTime=20-00:01:31
#

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler


OK I'm getting this same output on nodes n[011-013]:

# slurmd -C
NodeName=n011 slurmd: error: FastSchedule will be removed in 20.02, as will the 
FastSchedule=0 functionality. Please consider removing this from your 
configuration now.
slurmd: Considering each NUMA node as a socket
slurmd: error: Thread count (255) not multiple of core count (128)
CPUs=255 Boards=1 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 
RealMemory=1031878
UpTime=86-20:59:54
#

but on n010 it looks like:

# slurmd -C
NodeName=n010 CPUs=256 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 
ThreadsPerCore=2 RealMemory=1031887
UpTime=20-00:01:31
#

Re: [slurm-users] only 1 job running

2021-01-28 Thread Chandler


Christopher Samuel wrote on 1/28/21 12:50:

Did you restart the slurm daemons when you added the new node?  Some internal 
data structures (bitmaps) are build based on the number of nodes and they need 
to be rebuild with a restart in this situation.

https://slurm.schedmd.com/faq.html#add_nodes


Ok this seems to have helped a little.  After restarting the services and 
killing the task and restarting it as well, more jobs are running.

Still,

1. n[011-013] were in drain state, had to update those to "idle".
2. n[011-013] are also now running 8 jobs, but they should be running 16

n010 is running 16 jobs now as expected.

The squeue looks like:

 JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
   247  defq cromwell smrtanal PD   0:00  1 (Resources)
   248  defq cromwell smrtanal PD   0:00  1 (Priority)
   ...
   278  defq cromwell smrtanal PD   0:00  1 (Priority)
   207  defq cromwell smrtanal  R   5:01  1 n010
   ...
   222  defq cromwell smrtanal  R   5:01  1 n010
   223  defq cromwell smrtanal  R   4:55  1 n011
   ...
   230  defq cromwell smrtanal  R   4:55  1 n011
   231  defq cromwell smrtanal  R   4:55  1 n012
   ...
   238  defq cromwell smrtanal  R   4:55  1 n012
   239  defq cromwell smrtanal  R   4:55  1 n013
   ...
   246  defq cromwell smrtanal  R   4:55  1 n013

need to get the (Priority) and (Resources) jobs running on n[011-013]...

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Brian Andrus


Ahh.

One one of the new nodes do:
slurmd -C

The output of that will tell you what those settings should be. I 
suspect they are off, which forces them into drain mode.


Brian Andrus

On 1/28/2021 12:25 PM, Chandler wrote:

Andy Riebs wrote on 1/28/21 07:53:

If the only changes to your system have been the slurm.conf
configuration and the addition of a new node, the easiest way to
track this down is probably to show us the diffs between the previous
and current versions of slurm.conf, and a note about what's different
about the new node that you want to address.


Well I don't know what's different, that was months ago.  Pretty sure 
i just added n010 to PartitionName and updated NodeName=n[010-013] 
Procs=256 CoresPerSocket=64 Sockets=2 ThreadsPerCore=2  since I turned 
on multi-threading.

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Ole Holm Nielsen


On 28-01-2021 21:21, Chandler wrote:

Brian Andrus wrote on 1/28/21 12:07:

scontrol update state=resume nodename=n[011-013]


I tried that but got,
slurm_update error: Invalid node state specified


As Chris Samuel said, you must restart the Slurm daemons when adding (or 
removing) nodes!


See a summary in 
https://wiki.fysik.dtu.dk/niflheim/SLURM#add-and-remove-nodes


/Ole

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler


Andy Riebs wrote on 1/28/21 07:53:

If the only changes to your system have been the slurm.conf
configuration and the addition of a new node, the easiest way to
track this down is probably to show us the diffs between the previous
and current versions of slurm.conf, and a note about what's different
about the new node that you want to address.


Well I don't know what's different, that was months ago.  Pretty sure i just 
added n010 to PartitionName and updated NodeName=n[010-013] Procs=256 
CoresPerSocket=64 Sockets=2 ThreadsPerCore=2  since I turned on multi-threading.

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler


Brian Andrus wrote on 1/28/21 12:07:

scontrol update state=resume nodename=n[011-013]


I tried that but got,
slurm_update error: Invalid node state specified

Re: [slurm-users] only 1 job running

2021-01-28 Thread Christopher Samuel


On 1/27/21 9:28 pm, Chandler wrote:

Hi list, we have a new cluster setup with Bright cluster manager. 
Looking into a support contract there, but trying to get community 
support in the mean time.  I'm sure things were working when the cluster 
was delivered, but I provisioned an additional node and now the 
scheduler isn't quite working right.


Did you restart the slurm daemons when you added the new node?  Some 
internal data structures (bitmaps) are build based on the number of 
nodes and they need to be rebuild with a restart in this situation.


https://slurm.schedmd.com/faq.html#add_nodes

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Re: [slurm-users] only 1 job running

2021-01-28 Thread Brian Andrus


Heh. Your nodes are drained.

do:

scontrol update state=resume nodename=n[011-013]

If they go back into a drained state, you need to look into why. That 
will be in the slurmctld log. You can also see it with 'sinfo -R'


Brian Andrus

On 1/27/2021 10:18 PM, Chandler wrote:

Made a little bit of progress by running sinfo:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*    up   infinite  3  drain n[011-013]
defq*    up   infinite  1  alloc n010

not sure why n[011-013] are in drain state, that needs to be fixed.

After some searching, I ran:

scontrol update nodename=n[011-013] state=idle

and now 1 additional job has started on each of the n[011-013], so now 
4 jobs are running but the rest are still queued.  They should all be 
running.  After some more searching, I guess resource sharing needs to 
be turned on?  Can help with doing that?  I also attached the slurm.conf.


Thanks

[slurm-users] sbatch output logs get truncated

2021-01-28 Thread Timo Rothenpieler


This has started happening after upgrading slurm from 20.02 to latest 20.11.
It seems like something exits too early, before slurm, or whatever else 
is writing that file, has a chance to flush the final output buffer to disk.


For example, take this very simple batch script, which gets submitted 
via sbatch:



#!/bin/bash
#SBATCH --job-name=test
#SBATCH --ntasks=1
#SBATCH --exclusive
set -e

echo A
echo B
sleep 5
echo C


The resulting slurm-$jobid.out file is only

> A
> B

The final echo never gets written to the output file.

A lot of users print a final result status at the end, which then never 
hits the logs. So this is a major for them.


The scripts run to completion just fine, it's only the log being missing 
the end.
For example touching some file after the "echo C" will touch that file 
as expected.


The behaviour is also not at all consistent. Sometimes the output log is 
written as expected, with no recognizable pattern. Though this seems to 
be the exception, majority of the time it's truncated.


This was never an issue before the recent slurm update.

Re: [slurm-users] how do array jobs stored in slurmdb database?

2021-01-28 Thread Ole Holm Nielsen


On 1/28/21 11:59 AM, taleinterve...@sjtu.edu.cn wrote:
 From query command such as ‘sacct -j 123456’ I can see a series of jobs 
named 123456_1, 123456_2, etc. And I need to delete these job records from 
mysql database for some reason.


But in job_table of slurmdb, there is only one record with id_job=123456. 
not any record has a id like 123456_2. After I delete the id_job=123456 
record, sacct result show the 123456_1 job disappeared, but other jobs in 
the array still exist. So how do these array job recorded in the database? 
And how to completely delete all the jobs in a array?


I think you need to study how job arrays are implemented in Slurm, please 
read https://slurm.schedmd.com/job_array.html


You will discover that job arrays, when each individual jobs start 
running, become independent jobs and obtain their own unique JobIDs.  It 
must be those JobIDs that will appear in the Slurm database.


This command illustrates the different JobID types (please read the squeue 
manual page about ArrayJobID,JobArrayID,JobID):


$ squeue  -j 3394902 -O ArrayJobID,JobArrayID,JobID
ARRAY_JOB_IDJOBID   JOBID
3394902 3394902_[18-91] 3394902
3394902 3394902_17  3394919
3394902 3394902_16  3394918
3394902 3394902_15  3394917
3394902 3394902_14  3394916

The last 4 jobs are running, while the first job i still pending.

Perhaps you may find my "showjob" script useful:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
In this script you can see how I work with array jobs.

I did not answer your question about how to delete array jobs in the Slurm 
database.  But in most cases manipulating the database directly is 
probably a bad idea.  I wonder why you want to delete jobs in the database 
at all?


Best regards,
Ole

Re: [slurm-users] only 1 job running

2021-01-28 Thread Andy Riebs


Hi Chandler,

If the only changes to your system have been the slurm.conf 
configuration and the addition of a new node, the easiest way to track 
this down is probably to show us the diffs between the previous and 
current versions of slurm.conf, and a note about what's different about 
the new node that you want to address.


Andy


On 1/28/2021 1:18 AM, Chandler wrote:

Made a little bit of progress by running sinfo:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*    up   infinite  3  drain n[011-013]
defq*    up   infinite  1  alloc n010

not sure why n[011-013] are in drain state, that needs to be fixed.

After some searching, I ran:

scontrol update nodename=n[011-013] state=idle

and now 1 additional job has started on each of the n[011-013], so now 
4 jobs are running but the rest are still queued.  They should all be 
running.  After some more searching, I guess resource sharing needs to 
be turned on?  Can help with doing that?  I also attached the slurm.conf.


Thanks

[slurm-users] how do array jobs stored in slurmdb database?

2021-01-28 Thread taleintervenor

Hello,

 

The question background is:

>From query command such as 'sacct -j 123456' I can see a series of jobs
named 123456_1, 123456_2, etc. And I need to delete these job records from
mysql database for some reason.

 

But in job_table of slurmdb, there is only one record with id_job=123456.
not any record has a id like 123456_2. After I delete the id_job=123456
record, sacct result show the 123456_1 job disappeared, but other jobs in
the array still exist. So how do these array job recorded in the database?
And how to completely delete all the jobs in a array?

 

Thanks.

Re: [slurm-users] associations, limits,qos

Re: [slurm-users] Fairshare tree after SLURM upgrade

Re: [slurm-users] how do array jobs stored in slurmdb database?

Re: [slurm-users] how do array jobs stored in slurmdb database?

[slurm-users] Fairshare tree after SLURM upgrade

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] how do array jobs stored in slurmdb database?

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] [EXT]Re: only 1 job running

Re: [slurm-users] only 1 job running

Re: [slurm-users] only 1 job running

[slurm-users] sbatch output logs get truncated

Re: [slurm-users] how do array jobs stored in slurmdb database?

Re: [slurm-users] only 1 job running

[slurm-users] how do array jobs stored in slurmdb database?

22 matches

Site Navigation

Mail list logo

Footer information