[slurm-dev] Requested node configuration is not available
Hi, We encounter an annoying issue here. Basically within the same partition we have two types of nodes, one with 24 cores and the other one with 28 cores, so we use node feature to distinguish them, savio2_c24 and savio2_c28. Slurm is reporting the proper configuration and feature from all angles, such as sinfo, scontrol, etc. But if I try to request the resource by node feature, only the savio2_c24 node feature is being honored, savio2_c28 will give this error, srun: error: Unable to allocate resources: Requested node configuration is not available The commands that I ran were, 1. Good $ srun -p savio2 -C savio2_c24 -N1 -t 2:0 --pty bash 2. Bad $ srun -p savio2 -C savio2_c28 -N1 -t 2:0 --pty bash srun: error: Unable to allocate resources: Requested node configuration is not available $ sinfo -p savio2 -o "%N|%b"|less NODELIST|ACTIVE_FEATURES n0027.savio2,n0028.savio2,n0029.savio2,n0030.savio2,n0031.savio2,n0032.savio2,n0033.savio2,n0034.savio2,n0035.savio2,n0036.savio2,n0037.savio2,n0038.savio2,n0039.savio2,n0040.savio2,n0041.savio2,n0042.savio2,n0043.savio2,n0044.savio2,n0045.savio2,n0046.savio2,n0047.savio2,n0048.savio2,n0049.savio2,n0050.savio2,n0051.savio2,n0052.savio2,n0053.savio2,n0054.savio2,n0055.savio2,n0056.savio2,n0057.savio2,n0058.savio2,n0059.savio2,n0060.savio2,n0061.savio2,n0062.savio2,n0063.savio2,n0064.savio2,n0065.savio2,n0066.savio2,n0067.savio2,n0068.savio2,n0069.savio2,n0070.savio2,n0071.savio2,n0072.savio2,n0073.savio2,n0074.savio2,n0075.savio2,n0076.savio2,n0077.savio2,n0078.savio2,n0079.savio2,n0080.savio2,n0081.savio2,n0082.savio2,n0083.savio2,n0084.savio2,n0085.savio2,n0086.savio2,n0087.savio2,n0088.savio2,n0089.savio2,n0090.savio2,n0091.savio2,n0092.savio2,n0093.savio2,n0094.savio2,n0095.savio2,n0096.savio2,n0097.savio2,n0098.savio2,n0099.savio2,n0100.savio2,n0101.savio2,n0102.savio2,n0103.savio2,n0104.savio2,n0105.savio2,n0106.savio2,n0107.savio2,n0108.savio2,n0109.savio2,n0110.savio2,n0111.savio2,n0112.savio2,n0113.savio2,n0114.savio2,n0115.savio2,n0116.savio2,n0117.savio2,n0118.savio2,n0119.savio2,n0120.savio2,n0121.savio2,n0122.savio2,n0123.savio2,n0124.savio2,n0125.savio2,n0126.savio2,n0127.savio2,n0128.savio2,n0129.savio2,n0130.savio2,n0131.savio2,n0132.savio2,n0133.savio2,n0134.savio2,n0135.savio2,n0136.savio2,n0137.savio2,n0138.savio2,n0139.savio2,n0140.savio2,n0141.savio2,n0142.savio2,n0143.savio2,n0144.savio2,n0145.savio2,n0146.savio2,n0147.savio2,n0148.savio2,n0149.savio2,n0150.savio2,n0151.savio2,n0152.savio2,n0153.savio2,n0154.savio2,n0155.savio2,n0156.savio2,n0157.savio2,n0158.savio2,n0159.savio2,n0160.savio2,n0161.savio2,n0162.savio2,n0183.savio2,n0184.savio2,n0185.savio2,n0186.savio2|savio2,savio2_c24 n0187.savio2,n0188.savio2,n0189.savio2,n0190.savio2,n0191.savio2,n0192.savio2,n0193.savio2,n0194.savio2,n0195.savio2,n0196.savio2,n0197.savio2,n0198.savio2,n0199.savio2,n0200.savio2,n0201.savio2,n0202.savio2|savio2,savio2_c28 $ scontrol show node n0187.savio2 NodeName=n0187.savio2 Arch=x86_64 CoresPerSocket=14 CPUAlloc=0 CPUErr=0 CPUTot=28 CPULoad=0.02 AvailableFeatures=savio2,savio2_c28 ActiveFeatures=savio2,savio2_c28 Slurmctld debugging shows, [2017-05-25T11:34:36.646] _build_node_list: No nodes satisfy job 1343450 requirements in partition savio2 [2017-05-25T11:34:36.646] _slurm_rpc_allocate_resources: Requested node configuration is not available Any suggestions on how to debug or fix this issue? Thanks, Yong Qin
[slurm-dev] Best practice for allocation/accounting management
Hi, I'm seeking advice on what would be the best approach to manage account/user allocation. For example, if we want to give each account 1M cpu hours at the beginning of the year, and reset it until next year. And a fraction of that to each user, such as 100K cpu hours for the same period. When an account, or user runs out of this allocation the jobs should be put on hold. I understand that many big centers have their own allocation system to keep track of the usage to serve this purpose, but curious if an open sourced solution is available for easy use or not? Thanks, Yong Qin
[slurm-dev] SPANK plugin to access job info at submission stage
Hi, I'm trying to write a plugin to filter jobs at submission time (accept or deny with an error msg). I have to admit that I have not started reading the job submission plugin architecture yet and I will do that if there's really no way to implement it as a SPANK plugin. My understanding up to this point is, to achieve this goal the most likely callback is slurm_spank_init() (local context). However at this stage there is no way to access any job related information until the job is allocated. Ideally I would like to access the job submission line in its original form (-n 4 -t 20:0:0 --mem 2g, etc.) so that I can be as thorough as possible when parsing it. Is there any way to access that information as I describe? Thanks for shedding the light. Yong Qin
[slurm-dev] Re: SPANK prolog not run via sbatch (bug?)
Hi Doug, Thanks for the tip. After reading the documentation it makes a lot of sense now. On Fri, Jul 8, 2016 at 12:00 PM, Douglas Jacobsen <dmjacob...@lbl.gov> wrote: > Hello, > > Do you have "PrologFlags=alloc" in slurm.conf? If not, you'll need it, > otherwise the privileged prologs won't run until the first step is executed > on a node. > > -Doug > > > Doug Jacobsen, Ph.D. > NERSC Computer Systems Engineer > National Energy Research Scientific Computing Center > <http://www.nersc.gov> > dmjacob...@lbl.gov > > - __o > -- _ '\<,_ > --(_)/ (_)__ > > > On Fri, Jul 8, 2016 at 11:20 AM, Yong Qin <yong@gmail.com> wrote: > >> Hi, >> >> We implemented our own private /tmp solution via a spank plugin. The >> implementation was developed and tested on 15.08.6 and it went well. >> However when we move it to the production system which we just upgraded to >> 15.08.9, it appears that the slurm_spank_job_prolog() and >> slurm_spank_task_init_privileged() functions are not executed if the job is >> submitted via sbatch but slurm_spank_job_epilog() is. This is all fine if >> the job is submitted via srun though. >> >> I tried to search it in the bugzilla but couldn't find any report on it, >> or maybe I'm not searching with the right keywords? If it is an existing >> bug can anybody provide a pointer to it? If it's not a known bug, I'm >> wondering if other sites are seeing the same behavior as we do. >> >> Thanks, >> >> Yong Qin >> > >
[slurm-dev] SPANK prolog not run via sbatch (bug?)
Hi, We implemented our own private /tmp solution via a spank plugin. The implementation was developed and tested on 15.08.6 and it went well. However when we move it to the production system which we just upgraded to 15.08.9, it appears that the slurm_spank_job_prolog() and slurm_spank_task_init_privileged() functions are not executed if the job is submitted via sbatch but slurm_spank_job_epilog() is. This is all fine if the job is submitted via srun though. I tried to search it in the bugzilla but couldn't find any report on it, or maybe I'm not searching with the right keywords? If it is an existing bug can anybody provide a pointer to it? If it's not a known bug, I'm wondering if other sites are seeing the same behavior as we do. Thanks, Yong Qin
[slurm-dev] Slurmdbd crashed (15.08.6)
Hi, We just had a slurmdbd crash yesterday with the following log. [2016-02-10T07:00:20.066] error: mysql_query failed: 1030 Got error 28 from storage engine select job.job_db_inx, job.id_assoc, job.id_wckey, job.array_task_pending, job.time_eligible, job.time_start, job.time_end, job.time_suspended, job.cpus_req, job.id_resv, job.tres_alloc, SUM(step.consumed_energy) from "perceus-00_job_table" as job left outer join "perceus-00_step_table" as step on job.job_db_inx=step.job_db_inx and (step.id_step>=0) where (job.time_eligible && job.time_eligible < 1455116400 && (job.time_end >= 1455112800 || job.time_end = 0)) group by job.job_db_inx order by job.id_assoc, job.time_eligible This was on 15.08.6. We are also seeing a bunch of errors similar to the following. [2016-02-10T06:00:22.249] error: We have more allocated time than is possible (108445785192 > 6307200) for cluster perceus-00(1752) from 2016-02-10T05:00:00 - 2016-02-10T06:00:00 tres 2 [2016-02-10T06:00:22.262] error: We have more time than is possible (6307200+745499+0)(7052699) > 6307200 for cluster perceus-00(1752) from 2016-02-10T05:00:00 - 2016-02-10T06:00:00 tres 2 I see a bug report and it's marked as resolved in 15.08.3 ( http://bugs.schedmd.com/show_bug.cgi?id=2068). How do we fix it? Thanks, Yong Qin
[slurm-dev] Layout and CPU distribution
Hi, I'm investigating some job issues and would like to figure out the exact CPU distribution of the job from the accounting info. Right now SLURM does not offer something like the exec_host field in Torque. This makes it difficult to achieve this task. I can only make a guess from the NodeList, AllocCPUS, and Layout fields but after some testing I find this approach extremely unreliable. For example, on a shared cluster, if I acquire the resources with --ntasks=13, I would get 8 processes running on node 0 and 5 on node 1. However by examining the Layout of the job I see that it is registered as Cyclic fashion instead of Block as I would image. If nodes are partially used, it may even end up with 3 5 5 fashion, so I have no idea how many processes were actually launched on a node. So my question here is, how to recreate the CPU distribution for a job from the accounting info? This will be extremely useful for people to debug a job in a shared environment after something bad happened. If no way under the current framework, would that be possible to add this as an extra field for the accounting info? Thanks, Yong Qin
[slurm-dev] Re: How does sacct honor the -S and -E option?
Ah, that makes sense now. And the -s R option is useful as well. However I previously interpreted as Job *currently has* an allocation instead of Job had an allocation for the given time range. Now it is clear. Thanks for all your help. On Fri, Aug 23, 2013 at 12:51 AM, Bjørn-Helge Mevik b.h.me...@usit.uio.nowrote: Yong Qin yong@gmail.com writes: Thanks, but this still doesn't make sense to me. The same job is reported in both these two commands. sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,submit,eligible,start,end sacct -a -S 2013-05-12T00:00:00 -E 2013-05-13T00:00:00 -o jobid,submit,eligible,start,end 4173 2013-05-11T23:45:26 2013-05-11T23:45:26 2013-05-12T23:03:59 2013-05-13T11:53:42 The job was pending between 2013-05-11T23:45:26 and 2013-05-12T23:03:59, which means it was eligible some time between 2013-05-11T00:00:00 and 2013-05-12T00:00:00 (namely between 2013-05-11T23:45:26 and 2013-05-12T00:00:00). Thus it should be included in the first output. It should also be included in the second output, because it was running in part of the period from 2013-05-12T00:00:00 to 2013-05-13T00:00:00 (namely between 2013-05-12T23:03:59 and 2013-05-13T00:00:00). Running is also considered eligible. I totally agree your comment on that sacct lacks on the way to filter jobs that are actually within the time interval. As Danny said: add --state=RUNNING. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
[slurm-dev] Re: How does sacct honor the -S and -E option?
Thanks, but this still doesn't make sense to me. The same job is reported in both these two commands. sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,submit,eligible,start,end sacct -a -S 2013-05-12T00:00:00 -E 2013-05-13T00:00:00 -o jobid,submit,eligible,start,end 4173 2013-05-11T23:45:26 2013-05-11T23:45:26 2013-05-12T23:03:59 2013-05-13T11:53:42 I totally agree your comment on that sacct lacks on the way to filter jobs that are actually within the time interval. If the --starttime and --endtime are eligible time instead of the real start and end time, that's very counter-intuitive. On Thu, Aug 22, 2013 at 1:21 AM, Bjørn-Helge Mevik b.h.me...@usit.uio.nowrote: Yong Qin yong@gmail.com writes: This has been puzzling me for a while. So I'm hoping somebody can clarify it for me. In short, when I use sacct -S $T1 -E $T2 I often get lots of jobs that are completely out of the range of ($T1, $T2). For example, $ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,start,end I got a job output: 4173 2013-05-12T23:03:59 2013-05-13T11:53:42 I might be wrong, but I believe -S and -E refers to the time period a job was _eligible_ to run, not when it started and ended. Being eligible (in this context) seems to mean that it has been submitted (using #SBATCH --begin might change this), and has not ended. So a job that was pending or running between -S and -E will show up in the output. Try using -o jobid,submit,eligible,start,end and see if that makes sense. It would have been nice to have the possibility to select jobs that were _running_ (or _started_) in an interval, but I don't think it's there. -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
[slurm-dev] How does sacct honor the -S and -E option?
Hi, This has been puzzling me for a while. So I'm hoping somebody can clarify it for me. In short, when I use sacct -S $T1 -E $T2 I often get lots of jobs that are completely out of the range of ($T1, $T2). For example, $ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,start,end I got a job output: 4173 2013-05-12T23:03:59 2013-05-13T11:53:42 This doesn't make sense to me. If I use -T option this is even worse because it will modify the endtime to be earlier than the starttime. For example, 4173 2013-05-12T23:03:59 2013-05-12T00:00:00 Can anybody shed a light here? We are running 2.5.7. Thanks, Yong Qin