Re: [slurm-users] select/cons_res - found bug when allocating job with --cpus-per-task (-c) option on slurm 17.11.9 (fix included).

2018-09-05 Thread Kilian Cavalotti
Hi Didier,

On Wed, Sep 5, 2018 at 7:39 AM Didier GAZEN
 wrote:
> What do you think?

I'd recommend opening a bug at https://bugs.schedmd.com to report your
findings, if you haven't done that already.
This is the best way to get attention of the developers and get this fixed.

Cheers,
-- 
Kilian



[slurm-users] select/cons_res - found bug when allocating job with --cpus-per-task (-c) option on slurm 17.11.9 (fix included).

2018-09-05 Thread Didier GAZEN

Hi,

On a cluster comprised of quad-processors nodes, I encountered the 
following issue during a job allocation for an application that has 4 
tasks, each requiring 3 processors (this is exactly the example you are 
providing in the --cpus-per-task section of the salloc manpage).


Here are the facts:

First, the cluster configuration : 4 nodes, 1 socket/node, 4 
cores/socket, 1 thread/core


> sinfo -V
slurm 17.11.9-2

> sinfo
PARTITION    AVAIL  TIMELIMIT NODES   CPUS(A/I/O/T) STATE  NODELIST
any* up   2:00:00 4   0/16/0/16 idle~  n[101-104]

1) With the CR_CORE consumable resource:

> scontrol show conf|grep -i select
SelectType  = select/cons_res
SelectTypeParameters    = CR_CORE

> salloc -n4 -c3
salloc: Granted job allocation 218
> squeue
   JOBID  QOS PRIORITY    PARTITION NAME USER  
ST   TIME  NODES  CPUS  NODELIST(REASON)
 218   normal   43  any bash sila   
R   0:03  3    10  n[101-103]

> srun hostname
srun: error: Unable to create step for job 218: More processors 
requested than permitted


We can see that the number of granted processors and nodes is completely 
wrong : 10 cpus instead of 12 and only 3 nodes instead of 4.
The correct behaviour when requesting 4 tasks (-n 4) with 3 processors 
per tasks (-c 3) on a quad-core nodes cluster should be that the 
controller grant an allocation of 4 nodes, one for each of the 4 tasks.


Note that when specifying --tasks-per-node=1, the behaviour is correct:

> salloc -n4 -c3 --tasks-per-node=1
salloc: Granted job allocation 221
> squeue
   JOBID  QOS PRIORITY    PARTITION NAME USER  
ST   TIME  NODES  CPUS  NODELIST(REASON)
 221   normal   43  any bash sila   
R   0:03  4    12  n[101-104]

> srun hostname
n101
n103
n102
n104

2) With the CR_SOCKET consumable resource:

> scontrol show conf|grep -i select
SelectType  = select/cons_res
SelectTypeParameters    = CR_SOCKET

> salloc -n4 -c3
salloc: Granted job allocation 226
> squeue
   JOBID  QOS PRIORITY    PARTITION NAME USER  
ST   TIME  NODES  CPUS  NODELIST(REASON)
 226   normal   43  any bash sila   
R   0:02  3    12  n[101-103]


Here, slurm allocates the right number of processors (12) but the number 
of allocated nodes is wrong : 3 instead of 4. Then, there will be 2 
tasks on the same node (n101):


> srun hostname
n102
n101
n101
n103

Again,  when specifying --tasks-per-node=1, the behaviour is correct:

> salloc -n4 -c3 --tasks-per-node=1
salloc: Granted job allocation 230
> squeue
   JOBID  QOS PRIORITY    PARTITION NAME USER  
ST   TIME  NODES  CPUS  NODELIST(REASON)
 230   normal   43  any bash sila  
R    0:03  4    16  n[101-104]


Note that 16 processors have been allocated instead of 12 but this is 
correct because slurm is configured with the CR_Socket consumable 
resource (each node has got 1 socket and 4 cores/socket). The srun 
command is as expected :


sila@master2-l422:~> srun hostname
n101
n102
n103
n104

3) Conclusion and fix :

I thought that the --tasks-per-node should not be mandatory to obtain 
the right behaviour so I did some investigations.


I think that a bug has been introduced when unifying allocation code for 
CR_Socket and CR_Core (commit 6fa3d5ad) in the Step 3 of the 
_allocate_sc() function (src/plugins/select/cons_res/job_test.c) when 
computing the avail_cpus:


src/plugins/select/cons_res/job_test.c, _allocate_sc(...):

    if (cpus_per_task < 2) {
    avail_cpus = num_tasks;
    } else if ((ntasks_per_core == 1) &&
   (cpus_per_task > threads_per_core)) {
    /* find out how many cores a task will use */
    int task_cores = (cpus_per_task + threads_per_core - 1) /
 threads_per_core;
    int task_cpus  = task_cores * threads_per_core;
    /* find out how many tasks can fit on a node */
    int tasks = avail_cpus / task_cpus;
    /* how many cpus the the job would use on the node */
    avail_cpus = tasks * task_cpus;
    /* subtract out the extra cpus. */
    avail_cpus -= (tasks * (task_cpus - cpus_per_task));
    } else {
    j = avail_cpus / cpus_per_task;
    num_tasks = MIN(num_tasks, j);
    if (job_ptr->details->ntasks_per_node) <- problem
    avail_cpus = num_tasks * cpus_per_task;
    }


The 'if (job_ptr->details->ntasks_per_node)' condition marked above as 
'problem' prevents the avail_cpus to be correctly computed when 
--ntasks_per_node is NOT specified (and cpus_per_task>1). Before 
unifying the _allocate_sockets and _allocate_cores functions in the 
_allocate_sc function, this condition was only 

Re: [slurm-users] Configuration issue on Ubuntu

2018-09-05 Thread Chris Samuel
On Wednesday, 5 September 2018 5:48:25 PM AEST Gennaro Oliva wrote:

> It can be possible that Umut installed slurm-wlm-emulator package
> together with the regular package and the emulated daemon was picked by
> the alternatives system.

That sounds eminently possible, that's a great catch Gennaro!

Ah, just noticed you're the Debian package maintainer for Slurm. :-)

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Configuration issue on Ubuntu

2018-09-05 Thread John Hearns
Following on from what Chris Samuel says
/root/sl/sl2  kinda suggest Scientific Linux to me (SL - Redhat alike
distribution used by Fermilab and CERN)
Or it could just be sl = slurm

I would run  ldd `which slurctld` and let us know what libraries is it
linked to




On Wed, 5 Sep 2018 at 08:51, Gennaro Oliva  wrote:

> Hi Chris,
>
> On Wed, Aug 29, 2018 at 07:04:27AM +1000, Chris Samuel wrote:
> > On Tuesday, 28 August 2018 11:43:54 PM AEST Umut Arus wrote:
> >
> > > It seems the main problem is; slurmctld: fatal: No front end nodes
> defined
> >
> > Frontend nodes are for IBM BlueGene and Cray systems where you cannot
> run
> > slurmd on the compute nodes themselves so a proxy system must be used
> instead
> > (at $JOB-1 we used this on our BG/Q system).  I strongly suspect you are
> not
> > running on either of those!
>
> The option --enable-front-end to configure is also needed to emulate
> really large cluster:
>
> https://slurm.schedmd.com/faq.html#multi_slurmd
>
> > If you built Slurm yourself you'll need to check you didn't use those
> > arguments by mistake or configure didn't enable them in error, and if
> this is
> > an Ubuntu package then it's probably an bug in how they packaged it!
>
> This option is enabled only in the slurmctld daemon that is contained in
> the slurm-wlm-emulator package that is not intended to be used for batch
> jobs.
>
> vagrant@ubuntu-bionic:~$ grep 'No front end nodes defined'
> /usr/sbin/slurmctld-wlm-emulator
> Binary file /usr/sbin/slurmctld-wlm-emulator matches
> vagrant@ubuntu-bionic:~$ grep 'No front end nodes defined'
> /usr/sbin/slurmctld-wlm
> vagrant@ubuntu-bionic:~$
>
> It can be possible that Umut installed slurm-wlm-emulator package
> together with the regular package and the emulated daemon was picked by
> the alternatives system.
>
> Best regards,
> --
> Gennaro Oliva
>
>