[slurm-dev] Requested node configuration is not available

2017-05-25 Thread Yong Qin
Hi,

We encounter an annoying issue here. Basically within the same partition we
have two types of nodes, one with 24 cores and the other one with 28 cores,
so we use node feature to distinguish them, savio2_c24 and savio2_c28.
Slurm is reporting the proper configuration and feature from all angles,
such as sinfo, scontrol, etc. But if I try to request the resource by node
feature, only the savio2_c24 node feature is being honored, savio2_c28 will
give this error,

srun: error: Unable to allocate resources: Requested node configuration is
not available


The commands that I ran were,

1. Good
$ srun -p savio2 -C savio2_c24 -N1 -t 2:0 --pty bash

2. Bad
$ srun -p savio2 -C savio2_c28 -N1 -t 2:0 --pty bash
srun: error: Unable to allocate resources: Requested node configuration is
not available


$ sinfo -p savio2 -o "%N|%b"|less
NODELIST|ACTIVE_FEATURES
n0027.savio2,n0028.savio2,n0029.savio2,n0030.savio2,n0031.savio2,n0032.savio2,n0033.savio2,n0034.savio2,n0035.savio2,n0036.savio2,n0037.savio2,n0038.savio2,n0039.savio2,n0040.savio2,n0041.savio2,n0042.savio2,n0043.savio2,n0044.savio2,n0045.savio2,n0046.savio2,n0047.savio2,n0048.savio2,n0049.savio2,n0050.savio2,n0051.savio2,n0052.savio2,n0053.savio2,n0054.savio2,n0055.savio2,n0056.savio2,n0057.savio2,n0058.savio2,n0059.savio2,n0060.savio2,n0061.savio2,n0062.savio2,n0063.savio2,n0064.savio2,n0065.savio2,n0066.savio2,n0067.savio2,n0068.savio2,n0069.savio2,n0070.savio2,n0071.savio2,n0072.savio2,n0073.savio2,n0074.savio2,n0075.savio2,n0076.savio2,n0077.savio2,n0078.savio2,n0079.savio2,n0080.savio2,n0081.savio2,n0082.savio2,n0083.savio2,n0084.savio2,n0085.savio2,n0086.savio2,n0087.savio2,n0088.savio2,n0089.savio2,n0090.savio2,n0091.savio2,n0092.savio2,n0093.savio2,n0094.savio2,n0095.savio2,n0096.savio2,n0097.savio2,n0098.savio2,n0099.savio2,n0100.savio2,n0101.savio2,n0102.savio2,n0103.savio2,n0104.savio2,n0105.savio2,n0106.savio2,n0107.savio2,n0108.savio2,n0109.savio2,n0110.savio2,n0111.savio2,n0112.savio2,n0113.savio2,n0114.savio2,n0115.savio2,n0116.savio2,n0117.savio2,n0118.savio2,n0119.savio2,n0120.savio2,n0121.savio2,n0122.savio2,n0123.savio2,n0124.savio2,n0125.savio2,n0126.savio2,n0127.savio2,n0128.savio2,n0129.savio2,n0130.savio2,n0131.savio2,n0132.savio2,n0133.savio2,n0134.savio2,n0135.savio2,n0136.savio2,n0137.savio2,n0138.savio2,n0139.savio2,n0140.savio2,n0141.savio2,n0142.savio2,n0143.savio2,n0144.savio2,n0145.savio2,n0146.savio2,n0147.savio2,n0148.savio2,n0149.savio2,n0150.savio2,n0151.savio2,n0152.savio2,n0153.savio2,n0154.savio2,n0155.savio2,n0156.savio2,n0157.savio2,n0158.savio2,n0159.savio2,n0160.savio2,n0161.savio2,n0162.savio2,n0183.savio2,n0184.savio2,n0185.savio2,n0186.savio2|savio2,savio2_c24
n0187.savio2,n0188.savio2,n0189.savio2,n0190.savio2,n0191.savio2,n0192.savio2,n0193.savio2,n0194.savio2,n0195.savio2,n0196.savio2,n0197.savio2,n0198.savio2,n0199.savio2,n0200.savio2,n0201.savio2,n0202.savio2|savio2,savio2_c28


$ scontrol show node n0187.savio2
NodeName=n0187.savio2 Arch=x86_64 CoresPerSocket=14
   CPUAlloc=0 CPUErr=0 CPUTot=28 CPULoad=0.02
   AvailableFeatures=savio2,savio2_c28
   ActiveFeatures=savio2,savio2_c28


Slurmctld debugging shows,

[2017-05-25T11:34:36.646] _build_node_list: No nodes satisfy job 1343450
requirements in partition savio2
[2017-05-25T11:34:36.646] _slurm_rpc_allocate_resources: Requested node
configuration is not available


Any suggestions on how to debug or fix this issue?

Thanks,

Yong Qin


[slurm-dev] Best practice for allocation/accounting management

2016-12-05 Thread Yong Qin
Hi,

I'm seeking advice on what would be the best approach to manage
account/user allocation. For example, if we want to give each account 1M
cpu hours at the beginning of the year, and reset it until next year. And a
fraction of that to each user, such as 100K cpu hours for the same period.
When an account, or user runs out of this allocation the jobs should be put
on hold. I understand that many big centers have their own allocation
system to keep track of the usage to serve this purpose, but curious if an
open sourced solution is available for easy use or not?

Thanks,

Yong Qin


[slurm-dev] SPANK plugin to access job info at submission stage

2016-07-18 Thread Yong Qin
Hi,

I'm trying to write a plugin to filter jobs at submission time (accept or
deny with an error msg). I have to admit that I have not started reading
the job submission plugin architecture yet and I will do that if there's
really no way to implement it as a SPANK plugin.

My understanding up to this point is, to achieve this goal the most likely
callback is slurm_spank_init() (local context). However at this stage there
is no way to access any job related information until the job is allocated.
Ideally I would like to access the job submission line in its original form
(-n 4 -t 20:0:0 --mem 2g, etc.) so that I can be as thorough as possible
when parsing it. Is there any way to access that information as I describe?
Thanks for shedding the light.

Yong Qin


[slurm-dev] Re: SPANK prolog not run via sbatch (bug?)

2016-07-08 Thread Yong Qin
Hi Doug,

Thanks for the tip. After reading the documentation it makes a lot of sense
now.


On Fri, Jul 8, 2016 at 12:00 PM, Douglas Jacobsen <dmjacob...@lbl.gov>
wrote:

> Hello,
>
> Do you have "PrologFlags=alloc" in slurm.conf?  If not, you'll need it,
> otherwise the privileged prologs won't run until the first step is executed
> on a node.
>
> -Doug
>
> 
> Doug Jacobsen, Ph.D.
> NERSC Computer Systems Engineer
> National Energy Research Scientific Computing Center
> <http://www.nersc.gov>
> dmjacob...@lbl.gov
>
> - __o
> -- _ '\<,_
> --(_)/  (_)__
>
>
> On Fri, Jul 8, 2016 at 11:20 AM, Yong Qin <yong@gmail.com> wrote:
>
>> Hi,
>>
>> We implemented our own private /tmp solution via a spank plugin. The
>> implementation was developed and tested on 15.08.6 and it went well.
>> However when we move it to the production system which we just upgraded to
>> 15.08.9, it appears that the slurm_spank_job_prolog() and
>> slurm_spank_task_init_privileged() functions are not executed if the job is
>> submitted via sbatch but slurm_spank_job_epilog() is. This is all fine if
>> the job is submitted via srun though.
>>
>> I tried to search it in the bugzilla but couldn't find any report on it,
>> or maybe I'm not searching with the right keywords? If it is an existing
>> bug can anybody provide a pointer to it? If it's not a known bug, I'm
>> wondering if other sites are seeing the same behavior as we do.
>>
>> Thanks,
>>
>> Yong Qin
>>
>
>


[slurm-dev] SPANK prolog not run via sbatch (bug?)

2016-07-08 Thread Yong Qin
Hi,

We implemented our own private /tmp solution via a spank plugin. The
implementation was developed and tested on 15.08.6 and it went well.
However when we move it to the production system which we just upgraded to
15.08.9, it appears that the slurm_spank_job_prolog() and
slurm_spank_task_init_privileged() functions are not executed if the job is
submitted via sbatch but slurm_spank_job_epilog() is. This is all fine if
the job is submitted via srun though.

I tried to search it in the bugzilla but couldn't find any report on it, or
maybe I'm not searching with the right keywords? If it is an existing bug
can anybody provide a pointer to it? If it's not a known bug, I'm wondering
if other sites are seeing the same behavior as we do.

Thanks,

Yong Qin


[slurm-dev] Slurmdbd crashed (15.08.6)

2016-02-11 Thread Yong Qin
Hi,

We just had a slurmdbd crash yesterday with the following log.

[2016-02-10T07:00:20.066] error: mysql_query failed: 1030 Got error 28 from
storage engine
select job.job_db_inx, job.id_assoc, job.id_wckey, job.array_task_pending,
job.time_eligible, job.time_start, job.time_end, job.time_suspended,
job.cpus_req, job.id_resv, job.tres_alloc, SUM(step.consumed_energy) from
"perceus-00_job_table" as job left outer join "perceus-00_step_table" as
step on job.job_db_inx=step.job_db_inx and (step.id_step>=0) where
(job.time_eligible && job.time_eligible < 1455116400 && (job.time_end >=
1455112800 || job.time_end = 0)) group by job.job_db_inx order by
job.id_assoc, job.time_eligible

This was on 15.08.6.

We are also seeing a bunch of errors similar to the following.

[2016-02-10T06:00:22.249] error: We have more allocated time than is
possible (108445785192 > 6307200) for cluster perceus-00(1752) from
2016-02-10T05:00:00 - 2016-02-10T06:00:00 tres 2
[2016-02-10T06:00:22.262] error: We have more time than is possible
(6307200+745499+0)(7052699) > 6307200 for cluster perceus-00(1752) from
2016-02-10T05:00:00 - 2016-02-10T06:00:00 tres 2

I see a bug report and it's marked as resolved in 15.08.3 (
http://bugs.schedmd.com/show_bug.cgi?id=2068). How do we fix it?

Thanks,

Yong Qin


[slurm-dev] Layout and CPU distribution

2013-11-06 Thread Yong Qin
Hi,

I'm investigating some job issues and would like to figure out the exact
CPU distribution of the job from the accounting info. Right now SLURM does
not offer something like the exec_host field in Torque. This makes it
difficult to achieve this task. I can only make a guess from the NodeList,
AllocCPUS, and Layout fields but after some testing I find this approach
extremely unreliable. For example, on a shared cluster, if I acquire the
resources with --ntasks=13, I would get 8 processes running on node 0 and
5 on node 1. However by examining the Layout of the job I see that it is
registered as Cyclic fashion instead of Block as I would image. If
nodes are partially used, it may even end up with 3 5 5 fashion, so I have
no idea how many processes were actually launched on a node.

So my question here is, how to recreate the CPU distribution for a job from
the accounting info? This will be extremely useful for people to debug a
job in a shared environment after something bad happened. If no way under
the current framework, would that be possible to add this as an extra field
for the accounting info?

Thanks,

Yong Qin


[slurm-dev] Re: How does sacct honor the -S and -E option?

2013-08-23 Thread Yong Qin
Ah, that makes sense now.

And the -s R option is useful as well. However I previously interpreted
as Job *currently has* an allocation instead of Job had an allocation
for the given time range. Now it is clear.

Thanks for all your help.



On Fri, Aug 23, 2013 at 12:51 AM, Bjørn-Helge Mevik
b.h.me...@usit.uio.nowrote:


 Yong Qin yong@gmail.com writes:

  Thanks, but this still doesn't make sense to me. The same job is reported
  in both these two commands.
 
  sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o
  jobid,submit,eligible,start,end
  sacct -a -S 2013-05-12T00:00:00 -E 2013-05-13T00:00:00 -o
  jobid,submit,eligible,start,end
 
  4173 2013-05-11T23:45:26 2013-05-11T23:45:26 2013-05-12T23:03:59
  2013-05-13T11:53:42

 The job was pending between 2013-05-11T23:45:26 and 2013-05-12T23:03:59,
 which means it was eligible some time between 2013-05-11T00:00:00 and
 2013-05-12T00:00:00 (namely between 2013-05-11T23:45:26 and
 2013-05-12T00:00:00).  Thus it should be included in the first output.

 It should also be included in the second output, because it was running
 in part of the period from 2013-05-12T00:00:00 to 2013-05-13T00:00:00
 (namely between 2013-05-12T23:03:59 and 2013-05-13T00:00:00).  Running
 is also considered eligible.

  I totally agree your comment on that sacct lacks on the way to filter
 jobs
  that are actually within the time interval.

 As Danny said: add --state=RUNNING. :)

 --
 Regards,
 Bjørn-Helge Mevik, dr. scient,
 Department for Research Computing, University of Oslo



[slurm-dev] Re: How does sacct honor the -S and -E option?

2013-08-22 Thread Yong Qin
Thanks, but this still doesn't make sense to me. The same job is reported
in both these two commands.

sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o
jobid,submit,eligible,start,end
sacct -a -S 2013-05-12T00:00:00 -E 2013-05-13T00:00:00 -o
jobid,submit,eligible,start,end

4173 2013-05-11T23:45:26 2013-05-11T23:45:26 2013-05-12T23:03:59
2013-05-13T11:53:42

I totally agree your comment on that sacct lacks on the way to filter jobs
that are actually within the time interval. If the --starttime and
--endtime are eligible time instead of the real start and end time,
that's very counter-intuitive.



On Thu, Aug 22, 2013 at 1:21 AM, Bjørn-Helge Mevik b.h.me...@usit.uio.nowrote:


 Yong Qin yong@gmail.com writes:

  This has been puzzling me for a while. So I'm hoping somebody can clarify
  it for me. In short, when I use sacct -S $T1 -E $T2 I often get lots of
  jobs that are completely out of the range of ($T1, $T2). For example,
 
  $ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o
 jobid,start,end
 
  I got a job output:
 
  4173 2013-05-12T23:03:59 2013-05-13T11:53:42

 I might be wrong, but I believe -S and -E refers to the time period a
 job was _eligible_ to run, not when it started and ended.  Being eligible
 (in this context) seems to mean that it has been submitted (using
 #SBATCH --begin might change this), and has not ended.  So a job that
 was pending or running between -S and -E will show up in the output.

 Try using -o jobid,submit,eligible,start,end

 and see if that makes sense.

 It would have been nice to have the possibility to select jobs that were
 _running_ (or _started_) in an interval, but I don't think it's there.

 --
 Regards,
 Bjørn-Helge Mevik, dr. scient,
 Department for Research Computing, University of Oslo


[slurm-dev] How does sacct honor the -S and -E option?

2013-08-21 Thread Yong Qin
Hi,

This has been puzzling me for a while. So I'm hoping somebody can clarify
it for me. In short, when I use sacct -S $T1 -E $T2 I often get lots of
jobs that are completely out of the range of ($T1, $T2). For example,

$ sacct -a -S 2013-05-11T00:00:00 -E 2013-05-12T00:00:00 -o jobid,start,end

I got a job output:

4173 2013-05-12T23:03:59 2013-05-13T11:53:42

This doesn't make sense to me. If I use -T option this is even worse
because it will modify the endtime to be earlier than the starttime. For
example,

4173 2013-05-12T23:03:59 2013-05-12T00:00:00

Can anybody shed a light here? We are running 2.5.7.

Thanks,

Yong Qin