[slurm-users] Servers in pending state

2020-03-11 Thread Zohar Roe MLM
Hello,

I have a queue with 6 servers.
When 4 of the servers are with heavy load, If I send new jobs to the other 2 
servers which are free and under different partition and features, The jobs are 
still in pending mode (can take them 20 minutes to start running)

If I change their priority with "scontrol update" they start to run immediately.

I am guessing it take Slurm a lot of time to reschedule all jobs when there is 
a heavy load so the new jobs are not check until I change their priority.

Is there a way to tell  Slurm to check all pending jobs every 2 minutes so if 
there are pending jobs on a free servers they will start running?

More info:
SchedulerType = sched/backfill
SchedulerParameters = bf_continue,bf_max_job_test=300


Thank!
Roy.

***
 Please consider the environment before printing this email ! The information 
contained in this communication is proprietary to Israel Aerospace Industries 
Ltd. and/or third parties, may contain confidential or privileged information, 
and is intended only for the use of the intended addressee thereof. If you are 
not the intended addressee, please be aware that any use, disclosure, 
distribution and/or copying of this communication is strictly prohibited. If 
you receive this communication in error, please notify the sender immediately 
and delete it from your computer. Thank you. Visit us at: www.iai.co.il


Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread mike tie
Yep, slurmd -C is obviously getting the data from somewhere, either a local
file or from the master node.  hence my email to the group;  I was hoping
that someone would just say:  "yeah, modify file ".  But oh well. I'll
start playing with strace and gdb later this week;  looking through the
source might also be helpful.

I'm not cloning existing virtual machines with slurm.  I have access to a
vmware system that from time to time isn't running at full capacity;  usage
is stable for blocks of a month or two at a time, so my thought/plan was to
spin up a slurm compute node  on it, and resize it appropriately every few
months (why not put it to work).  I started with 10 cores, and it looks
like I can up it to 16 cores for a while, and that's when I ran into the
problem.

-mike



*Michael Tie*Technical Director
Mathematics, Statistics, and Computer Science

 One North College Street  phn:  507-222-4067
 Northfield, MN 55057   cel:952-212-8933
 m...@carleton.edufax:507-222-4312


On Wed, Mar 11, 2020 at 1:15 AM Kirill 'kkm' Katsnelson 
wrote:

> On Tue, Mar 10, 2020 at 1:41 PM mike tie  wrote:
>
>> Here is the output of lstopo
>>
>
>> *$* lstopo -p
>>
>> Machine (63GB)
>>
>>   Package P#0 + L3 (16MB)
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#2
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#3
>>
>>   Package P#1 + L3 (16MB)
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#4
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#5
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#6
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#7
>>
>>   Package P#2 + L3 (16MB)
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#8
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#9
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#10
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#11
>>
>>   Package P#3 + L3 (16MB)
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#12
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#13
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#14
>>
>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#15
>>
>
> There is no sane way to derive the number 10 from this topology.
> obviously: it has a prime factor of 5, but everything in the lstopo output
> is sized in powers of 2 (4 packages, a.k.a.  sockets, 4 single-threaded CPU
> cores per).
>
> I responded yesterday but somehow managed to plop my signature into the
> middle of it, so maybe you have missed inline replies?
>
> It's very, very likely that the number is stored *somewhere*. First to
> eliminate is the hypothesis that the number is acquired from the control
> daemon. That's the simplest step and the largest landgrab in the
> divide-and-conquer analysis plan. Then just look where it comes from on the
> VM. strace(1) will reveal all files slurmd reads.
>
> You are not rolling out the VMs from an image, ain't you? I'm wondering
> why do you need to tweak an existing VM that is already in a weird state.
> Is simply setting its snapshot aside and creating a new one from an image
> hard/impossible? I did not touch VMWare for more than 10 years, so I may be
> a bit naive; in the platform I'm working now (GCE), create-use-drop pattern
> of VM use is much more common and simpler than create and maintain it to
> either *ad infinitum* or *ad nauseam*, whichever will have been reached the
> earliest.  But I don't know anything about VMWare; maybe it's not possible
> or feasible with it.
>
>  -kkm
>
>


[slurm-users] Can some partitions be excluded from a cluster when Slurm calculates user/total usage in Fairshare algorithm?

2020-03-11 Thread Wang, Manhui
Dear All,

We have a HPC cluster with Slurm job scheduler (17.02.8). There are several 
private partitions (which are sponsored by several groups) and a "common" 
partition. Private partitions are exclusively used by those private users, and 
all users (including private users) have equal access to the "common" 
partition. Currently we have set up  Multi-factor Job Priority plugin to 
determine job priority. Everything is working ok except that private users are 
less favourable in the fair-share factor than non-private users when using  the 
"common" partition.  Private users may run many jobs on their own private 
partitions, and occasionally run jobs on the "common" partition. When 
submitting jobs to the "common" partition, those private users normally get 
much lower fair-share factor in job priority compared with non-private users. 
It appears those private users are penalized because they have already used a 
large amount of resources on the cluster (even those resources are private 
owned). In Fairshare Algorithm( 
https://slurm.schedmd.com/classic_fair_share.html), it appears when calculating 
Uuser and Utotal, the consumed processor*seconds is based on the whole cluster. 
Can we exclude some partitions when calculating such consumed resources? Is 
such functionality available in Slurm?

I have tried to set TRESBillingWeights="CPU=0.0,Mem=0.0,GRES/gpu=0.0" in the 
private partitions. It seems such settings only affect the TRES factor, NOT 
Fair-share factor.

Any suggestion?

Thanks
Manhui


[slurm-users] Upgrade paths

2020-03-11 Thread Will Dennis
Hi all,

I have one cluster running v16.05.4 that I would like to upgrade if possible to 
19.05.5; it was installed via a .deb package I created back in 2016. I have 
located a 17.11.7 Ubuntu PPA 
(https://launchpad.net/~jonathonf/+archive/ubuntu/slurm) and have myself 
recently put up one for 19.05.5 
(https://launchpad.net/~wdennis/+archive/ubuntu/dhpc-backports). Theoretically, 
I believe I should be able to upgrade from the 16.05 release to 17.11, then 
from 17.11 to 19.05, correct? (going under the assumption that can only go 
forward at most 2 Slurm releases, which went 16.05 -> 17.02 -> 17.11 -> 18.08 
-> 19.05, if I am correct.)

Thanks,
Will


Re: [slurm-users] Upgrade paths

2020-03-11 Thread Renfro, Michael
The release notes at https://slurm.schedmd.com/archive/slurm-19.05.5/news.html 
indicate you can upgrade from 17.11 or 18.08 to 19.05. I didn’t find equivalent 
release notes for 17.11.7, but upgrades over one major release should work.

> On Mar 11, 2020, at 2:01 PM, Will Dennis  wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hi all,
>  
> I have one cluster running v16.05.4 that I would like to upgrade if possible 
> to 19.05.5; it was installed via a .deb package I created back in 2016. I 
> have located a 17.11.7 Ubuntu PPA 
> (https://launchpad.net/~jonathonf/+archive/ubuntu/slurm) and have myself 
> recently put up one for 19.05.5 
> (https://launchpad.net/~wdennis/+archive/ubuntu/dhpc-backports). 
> Theoretically, I believe I should be able to upgrade from the 16.05 release 
> to 17.11, then from 17.11 to 19.05, correct? (going under the assumption that 
> can only go forward at most 2 Slurm releases, which went 16.05 -> 17.02 -> 
> 17.11 -> 18.08 -> 19.05, if I am correct.)
>  
> Thanks,
> Will



Re: [slurm-users] Upgrade paths

2020-03-11 Thread Ole Holm Nielsen

On 11-03-2020 20:01, Will Dennis wrote:
I have one cluster running v16.05.4 that I would like to upgrade if 
possible to 19.05.5; it was installed via a .deb package I created back 
in 2016. I have located a 17.11.7 Ubuntu PPA 
(https://launchpad.net/~jonathonf/+archive/ubuntu/slurm) and have myself 
recently put up one for 19.05.5 
(https://launchpad.net/~wdennis/+archive/ubuntu/dhpc-backports). 
Theoretically, I believe I should be able to upgrade from the 16.05 
release to 17.11, then from 17.11 to 19.05, correct? (going under the 
assumption that can only go forward at most 2 Slurm releases, which went 
16.05 -> 17.02 -> 17.11 -> 18.08 -> 19.05, if I am correct.)


You may find some useful information collected in my Slurm Wiki page 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm


/Ole



Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread Kirill 'kkm' Katsnelson
Yup, I think if you get stuck so badly, the first thing is to make sure the
node does not get the number 10 from the controller, and the second just
reimage the VM fresh. It maybe not the quickest way, but at least
predictable in the sense of time spent.

Good luck!

 -kkm

On Wed, Mar 11, 2020 at 7:28 AM mike tie  wrote:

>
> Yep, slurmd -C is obviously getting the data from somewhere, either a
> local file or from the master node.  hence my email to the group;  I was
> hoping that someone would just say:  "yeah, modify file ".  But oh
> well. I'll start playing with strace and gdb later this week;  looking
> through the source might also be helpful.
>
> I'm not cloning existing virtual machines with slurm.  I have access to a
> vmware system that from time to time isn't running at full capacity;  usage
> is stable for blocks of a month or two at a time, so my thought/plan was to
> spin up a slurm compute node  on it, and resize it appropriately every few
> months (why not put it to work).  I started with 10 cores, and it looks
> like I can up it to 16 cores for a while, and that's when I ran into the
> problem.
>
> -mike
>
>
>
> *Michael Tie*Technical Director
> Mathematics, Statistics, and Computer Science
>
>  One North College Street  phn:  507-222-4067
>  Northfield, MN 55057   cel:952-212-8933
>  m...@carleton.edufax:507-222-4312
>
>
> On Wed, Mar 11, 2020 at 1:15 AM Kirill 'kkm' Katsnelson 
> wrote:
>
>> On Tue, Mar 10, 2020 at 1:41 PM mike tie  wrote:
>>
>>> Here is the output of lstopo
>>>
>>
>>> *$* lstopo -p
>>>
>>> Machine (63GB)
>>>
>>>   Package P#0 + L3 (16MB)
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#2
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#3
>>>
>>>   Package P#1 + L3 (16MB)
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#4
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#5
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#6
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#7
>>>
>>>   Package P#2 + L3 (16MB)
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#8
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#9
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#10
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#11
>>>
>>>   Package P#3 + L3 (16MB)
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#12
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#13
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#14
>>>
>>> L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#15
>>>
>>
>> There is no sane way to derive the number 10 from this topology.
>> obviously: it has a prime factor of 5, but everything in the lstopo output
>> is sized in powers of 2 (4 packages, a.k.a.  sockets, 4 single-threaded CPU
>> cores per).
>>
>> I responded yesterday but somehow managed to plop my signature into the
>> middle of it, so maybe you have missed inline replies?
>>
>> It's very, very likely that the number is stored *somewhere*. First to
>> eliminate is the hypothesis that the number is acquired from the control
>> daemon. That's the simplest step and the largest landgrab in the
>> divide-and-conquer analysis plan. Then just look where it comes from on the
>> VM. strace(1) will reveal all files slurmd reads.
>>
>> You are not rolling out the VMs from an image, ain't you? I'm wondering
>> why do you need to tweak an existing VM that is already in a weird state.
>> Is simply setting its snapshot aside and creating a new one from an image
>> hard/impossible? I did not touch VMWare for more than 10 years, so I may be
>> a bit naive; in the platform I'm working now (GCE), create-use-drop pattern
>> of VM use is much more common and simpler than create and maintain it to
>> either *ad infinitum* or *ad nauseam*, whichever will have been reached the
>> earliest.  But I don't know anything about VMWare; maybe it's not possible
>> or feasible with it.
>>
>>  -kkm
>>
>>


Re: [slurm-users] slurmd -C showing incorrect core count

2020-03-11 Thread Chris Samuel

On 10/3/20 1:40 pm, mike tie wrote:


Here is the output of lstopo


Hmm, well I believe Slurm should be using hwloc (which provides lstopo) 
to get its information (at least it calls the xcpuinfo_hwloc_topo_get() 
function for that), so if lstopo works then slurmd should too.


Ah, looking a bit deeper I see in src/slurmd/common/xcpuinfo.c:

if (!hwloc_xml_whole)
hwloc_xml_whole = xstrdup_printf("%s/hwloc_topo_whole.xml",
 conf->spooldir);

Do you happen to have a file called "hwloc_topo_whole.xml" in your spool 
directory on that node?  I'm wondering if it's cached old config there.


If so move it out of the way somewhere safe (just in case) and try again.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA