Re: [slurm-users] How to force jobs to run next in queue

2019-03-12 Thread Bjørn-Helge Mevik
Sean Brisbane  writes:

> Does anyone have a feeling for why setting a high Priority on a partition
> makes jobs run in that partition first regardless that a job in a different
> Partition may have a much higher overall priority?

Perhaps because that is what it was designed to do?  Did you try using
PriorityJobFactor instead, as I suggested?

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature


Re: [slurm-users] weight setting not working

2019-03-12 Thread Andy Leung Yin Sui
Thank you for your reply. I was running 18.08.1 and updated to
18.08.6. Everything was solved. Thank you.

On Tue, 12 Mar 2019 at 20:23, Eli V  wrote:
>
> On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui  wrote:
> >
> > Hi,
> >
> > I am new to slurm and want to use weight option to schedule the jobs.
> > I have some machine with same hardware configuration with GPU cards. I
> > use QoS to force user at least required 1 gpu gres when submitting
> > jobs.
> > The machine serve multiple partition.
> > What I want is consume dedicated nodes first when schedule gpu_2h
> > parition jobs by adding  weight settings.(e.g. schedule to GPU38/39
> > rather than 36/37). However, the scheduler turns out not following the
> > weight settings and schedule to 36/37 (e.g. srun -p gpu_2h).
> > All the GPU node are idle and the billing are same, did I miss
> > something? Was it some limitation if a nodes server multiple partition
> > or consume GRES?  Please advise. Thank you very much.
> >
> > Below are the setting which may help.
> > slurm.conf
> > NodeName=gpu[36-37] Gres=gpu:titanxp:4  ThreadsPerCore=2 State=unknown
> > Sockets=2  CPUs=40 CoresPerSocket=10 Weight=20
> > NodeName=gpu[38-39] Gres=gpu:titanxp:4  ThreadsPerCore=2 State=unknown
> > Sockets=2  CPUs=40 CoresPerSocket=10 Weight=1
> >
> >
> > PartitionName=gpu_2h Nodes=gpu[36-39] Default=YES MaxTime=02:00:00
> > DefaultTime=02:00:00 MaxNodes=1 State=UP AllowQos=GPU
> > PartitionName=gpu_8h Nodes=gpu[31-37] MaxTime=08:00:00
> > DefaultTime=08:00:00  MaxNodes=1 State=UP AllowQos=GPU
> >
> >
> > # sinfo -N -O nodelist,partition,gres,weight
> >
> >
> > NODELISTPARTITION   GRESWEIGHT
> > gpu36   gpu_2h* gpu:titanxp:4   20
> > gpu36   gpu_8h  gpu:titanxp:4   20
> > gpu37   gpu_2h* gpu:titanxp:4   20
> > gpu37   gpu_8h  gpu:titanxp:4   20
> > gpu38   gpu_2h* gpu:titanxp:4   1
> > gpu39   gpu_2h* gpu:titanxp:4   1
> >
>
> You didn't mention the version of slurm you are using. Weights are
> known to be broken in early versions of 18.08. I think it was fixed in
> 18.08.04 put you'd have to go back and read the release message to
> confirm.
>



Re: [slurm-users] problems with slurm and openmpi

2019-03-12 Thread Gilles Gouaillardet

Rick,


The issue is SLURM can only provide pmi2 support, and it seems Open MPI 
only supports pmix



One option is to rebuild SLURM with PMIx as explained by Daniel, and then

srun --mpi=pmix ...


If you do not want (or cannot) rebuilt SLURM, you can use the older pmi 
or pmi2.


In that case, you have to rebuild Open MPI and pass --with-pmi to the 
configure command line



and then

srun --mpi=pmi2 ...

(or srun --mpi=pmi ...)


Finally, you can

scontrol show config | grep MpiDefault


and have your sysadmin update this so a simple

srun 

will run without any --mpi=... parameter


Cheers,


Gilles

On 3/13/2019 5:53 AM, Riccardo Veraldi wrote:

Hello,
after trynig hard for over 10 days I am forced to write to the list.
I am not able to have SLURM work with openmpi. Openmpi compiled 
binaries won't run on slurm, while all non openmpi progs run just fine 
under "srun". I am using SLURM 18.08.5 building the rpm from the 
tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2
prior to bulid SLURM I installed openmpi 4.0.0 which has built in pmix 
support. the pmix libraries are in /usr/lib64/pmix/ which is the 
default installation path.


The problem is that hellompi is not working if I launch in from srun. 
of course it runs outside slurm.


[psanagpu105:10995] OPAL ERROR: Not initialized in file 
pmix3x_client.c at line 113

--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[psanagpu105:10995] Local abort before MPI_INIT completed completed 
successfully, but am not able to aggregate error messages, and not 
able to guarantee that all other processes were killed!

srun: error: psanagpu105: task 0: Exited with exit code 1

I really have no clue. I even reinstalled openmpi on a specific 
different path /opt/openmpi/4.0.0
anyway seems like slurm does not know how to fine the MPI libraries 
even though they are there and right now in the default path /usr/lib64


even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and 
the same error message is given to me.

srun --mpi=list
srun: MPI types are...
srun: none
srun: openmpi
srun: pmi2


Any hint how could I fix this problem ?
thanks a lot

Rick






Re: [slurm-users] problems with slurm and openmpi

2019-03-12 Thread Daniel Letai

  
  
Hi.
On 12/03/2019 22:53:36, Riccardo
  Veraldi wrote:


  
  

  

  

  Hello,
  after trynig hard for over 10 days I am forced to
write to the list.
  I am not able to have SLURM work with openmpi.
Openmpi compiled binaries won't run on slurm, while
all non openmpi progs run just fine under "srun". I
am using SLURM 18.08.5 building the rpm from the
tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2
  
  prior to bulid SLURM I installed openmpi 4.0.0
which has built in pmix support. the pmix libraries
are in /usr/lib64/pmix/ which is the default
installation path.
  
  
  The problem is that hellompi is not working if I
launch in from srun. of course it runs outside
slurm.
  
  
  [psanagpu105:10995] OPAL ERROR: Not initialized
in file pmix3x_client.c at line 113
--
The application appears to have been direct launched
using "srun",
but OMPI was not built with SLURM's PMI support and
therefore cannot
execute. There are several options for building PMI
support under
  

  

  

  

I would guess (but having the config.log files would verify it)
  that you should rebuild Slurm --with-pmix and then you should
  rebuild OpenMPI --with Slurm.
Currently there might be a bug in Slurm's configure file building
  PMIx support without path, so you might either modify the spec
  before building (add --with-pmix=/usr to the configure section) or
  for testing purposes ./configure --with-pmix=/usr; make; make
  install.



It seems your current configuration has built-in mismatch - Slurm
  only supports pmi2, while OpenMPI only supports PMIx. you should
  build with at least one common PMI: either external PMIx when
  building  Slurm, or Slurm's PMI2 when building OpenMPI.
However, I would have expected the non-PMI option (srun
  --mpi=openmpi) to work even in your env, and Slurm should have
  built PMIx support automatically since it's in default search
  path.




  

  

  

  SLURM, depending upon the SLURM version you are
using:

  version 16.05 or later: you can use SLURM's PMIx
support. This
  requires that you configure and build SLURM
--with-pmix.

  Versions earlier than 16.05: you must use either
SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or
you can manually
  install PMI-2. You must then build Open MPI using
--with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
***    and potentially your MPI job)
[psanagpu105:10995] Local abort before MPI_INIT
completed completed successfully, but am not able to
aggregate error messages, and not able to guarantee
that all other processes were killed!
srun: error: psanagpu105: task 0: Exited with exit
code 1
  
  
  
  I really have no clue. I even reinstalled openmpi
on a specific different path /opt/openmpi/4.0.0
  anyway seems like slurm does not know how to fine
the MPI libraries even though they are there and
right now in the default path /usr/lib64
  
  
  even using --mpi=pmi2 or --mpi=openmpi does not
fix the problem and the same error message is given
to me.
  srun --mpi=list
srun: MPI types are.

[slurm-users] Resolution! was Re: Mysterious job terminations on Slurm 17.11.10

2019-03-12 Thread Andy Riebs
It appears that we have gotten to the bottom of this problem! We 
discovered that we only seem to see this problem if our overnight test 
script is run with "nohup," as we have been doing for several years. 
Typically, we would see the mysterious cancellations about once every 
other day, or 3-4 times a week. In the week+ since we started using 
"tmux" instead, we haven't seen this problem at all.


On that basis, I'm declaring success!

Many thanks to Doug Meyer and Chris Samuel for jumping in to offer 
suggestions.


Andy


*From:* Andy Riebs 
*Sent:* Thursday, January 31, 2019 2:04PM
*To:* Slurm-users 
*Cc:*
*Subject:* Mysterious job terminations on Slurm 17.11.10
Hi All,

Just checking to see if this sounds familiar to anyone.

Environment:
- CentOS 7.5 x86_64
- Slurm 17.11.10 (but this also happened with 17.11.5)

We typically run about 100 tests/night, selected from a handful of 
favorites. For roughly 1 in 300 test runs, we see one of two mysterious 
failures:


1. The 5 minute cancellation

A job will be rolling along, generating it's expected output, and then 
this message appears:


   srun: forcing job termination
   srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
   slurmstepd: error: *** STEP 3531.0 ON nodename CANCELLED AT
   2019-01-30T07:35:50 ***
   srun: error: nodename: task 250: Terminated
   srun: Terminating job step 3531.0

sacct reports

   JobID   Start End ExitCode  State
    --- --- 
   --
   3418 2019-01-29T05:54:07 2019-01-29T05:59:16 0:9 FAILED

These failures consistently happen at just about 5 minutes into the run 
when they happen.


2. The random cancellation

As above, a job will be generating the expected output, and then we see

   srun: forcing job termination
   srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
   slurmstepd: error: *** STEP 3531.0 ON nodename CANCELLED AT
   2019-01-30T07:35:50 ***
   srun: error: nodename: task 250: Terminated
   srun: Terminating job step 3531.0

But this time, sacct reports

   JobID   Start End ExitCode  State
    --- --- 
   --
   3531 2019-01-30T07:21:25 2019-01-30T07:35:50 0:0  COMPLETED
   3531.0   2019-01-30T07:21:27 2019-01-30T07:35:56 0:15  CANCELLED

I think we've seen these cancellations pop up as soon as a minute or two 
into the test run, up to perhaps 20 minutes into the run.


The only thing slightly unusual in our job submissions is that we use 
srun's "--immediate=120" so that the scripts can respond appropriately 
if a node goes down.


With SlurmctldDebug=debug2 and SlurmdDebug=debug5, there's not a clue in 
the slurmctld or slurmd logs.


Any thoughts on what might be happening, or what I might try next?

Andy



Re: [slurm-users] problems with slurm and openmpi

2019-03-12 Thread Cyrus Proctor
Both your Slurm and OpenMPI config.logs would be helpful in debugging 
here. Throw in your slurm.conf as well for good measure. Also, what type 
of system are you running, what type of high speed fabric are you trying 
to run on, and what's your driver stack look like?

I know the feeling and will try to lend any extra bubble gum and duct 
tape I can.

On 3/12/19 3:53 PM, Riccardo Veraldi wrote:
> Hello,
> after trynig hard for over 10 days I am forced to write to the list.
> I am not able to have SLURM work with openmpi. Openmpi compiled 
> binaries won't run on slurm, while all non openmpi progs run just fine 
> under "srun". I am using SLURM 18.08.5 building the rpm from the 
> tarball: rpmbuild -ta slurm-18.08.5-2.tar.bz2
> prior to bulid SLURM I installed openmpi 4.0.0 which has built in pmix 
> support. the pmix libraries are in /usr/lib64/pmix/ which is the 
> default installation path.
>
> The problem is that hellompi is not working if I launch in from srun. 
> of course it runs outside slurm.
>
> [psanagpu105:10995] OPAL ERROR: Not initialized in file 
> pmix3x_client.c at line 113
> --
> The application appears to have been direct launched using "srun",
> but OMPI was not built with SLURM's PMI support and therefore cannot
> execute. There are several options for building PMI support under
> SLURM, depending upon the SLURM version you are using:
>
>   version 16.05 or later: you can use SLURM's PMIx support. This
>   requires that you configure and build SLURM --with-pmix.
>
>   Versions earlier than 16.05: you must use either SLURM's PMI-1 or
>   PMI-2 support. SLURM builds PMI-1 by default, or you can manually
>   install PMI-2. You must then build Open MPI using --with-pmi pointing
>   to the SLURM PMI library location.
>
> Please configure as appropriate and try again.
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [psanagpu105:10995] Local abort before MPI_INIT completed completed 
> successfully, but am not able to aggregate error messages, and not 
> able to guarantee that all other processes were killed!
> srun: error: psanagpu105: task 0: Exited with exit code 1
>
> I really have no clue. I even reinstalled openmpi on a specific 
> different path /opt/openmpi/4.0.0
> anyway seems like slurm does not know how to fine the MPI libraries 
> even though they are there and right now in the default path /usr/lib64
>
> even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and 
> the same error message is given to me.
> srun --mpi=list
> srun: MPI types are...
> srun: none
> srun: openmpi
> srun: pmi2
>
>
> Any hint how could I fix this problem ?
> thanks a lot
>
> Rick
>
>


[slurm-users] problems with slurm and openmpi

2019-03-12 Thread Riccardo Veraldi
Hello,
after trynig hard for over 10 days I am forced to write to the list.
I am not able to have SLURM work with openmpi. Openmpi compiled binaries
won't run on slurm, while all non openmpi progs run just fine under "srun".
I am using SLURM 18.08.5 building the rpm from the tarball: rpmbuild -ta
slurm-18.08.5-2.tar.bz2
prior to bulid SLURM I installed openmpi 4.0.0 which has built in pmix
support. the pmix libraries are in /usr/lib64/pmix/ which is the default
installation path.

The problem is that hellompi is not working if I launch in from srun. of
course it runs outside slurm.

[psanagpu105:10995] OPAL ERROR: Not initialized in file pmix3x_client.c at
line 113
--
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[psanagpu105:10995] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
srun: error: psanagpu105: task 0: Exited with exit code 1

I really have no clue. I even reinstalled openmpi on a specific different
path /opt/openmpi/4.0.0
anyway seems like slurm does not know how to fine the MPI libraries even
though they are there and right now in the default path /usr/lib64

even using --mpi=pmi2 or --mpi=openmpi does not fix the problem and the
same error message is given to me.
srun --mpi=list
srun: MPI types are...
srun: none
srun: openmpi
srun: pmi2


Any hint how could I fix this problem ?
thanks a lot

Rick


Re: [slurm-users] How to force jobs to run next in queue

2019-03-12 Thread Thomas M. Payerle
Are you uising the prioirty/multifactor plugin?  What are the values of the
various Priority* weight factors?

On Tue, Mar 12, 2019 at 12:42 PM Sean Brisbane 
wrote:

> Hi,
>
> Thanks for your help.
>
> Either setting qos or setting priority doesn't work for me.  However I
> have found the cause if not the reason.
>
> Using a Priority setting on the partition called "Priority" in slurm.conf
> seems to force all jobs waiting on this queue to run first regardless of
> any qos set on a job.  Priority is not a limit, but I think this is a bit
> inconsistent with the limit hierarchy we see elsewhere and possibly even a
> bug.
>
> 1. Partition QOS limit*2. Job QOS limit*
> 3. User association
> 4. Account association(s), ascending the hierarchy
> 5. Root/Cluster association*6. Partition limit*
> 7. None
>
> So for multiple partitions with differing priorities, I can get the same
> effect by moving the priority into a qos, applying a qos on the partition,
> and then taking care to set OverPartQOS flag on the "boost" qos.
>
> Does anyone have a feeling for why setting a high Priority on a partition
> makes jobs run in that partition first regardless that a job in a different
> Partition may have a much higher overall priority?
>
>
> Sean
>
>
>
> On Mon, 11 Mar 2019 at 17:00, Sean Brisbane 
> wrote:
>
>> Hi,
>>
>> I'm looking to have a way an administrator can boost any job to be next
>> to run when resources become available.  What is the best practice way to
>> do this? Happy to try something new :-D
>>
>> The way I thought to do this was to have a qos with a large priority and
>> manually assign this to the job.  Job 469 is the job in this example I am
>> trying to elevate to be next in queue.
>>
>> scontrol update jobid=469 qos=boost
>>
>> sprio shows that this job is the highest priority by quite some way,
>> however, job nbumber 492 will be next to run
>>
>> squeue (qxluding runnign jobs)
>>  JOBID PARTITION NAME USER ST   TIME  NODES
>> NODELIST(REASON)
>>469 Backgroun sleeping   centos PD   0:00  1
>> (Resources)
>>492  Priority sleepy.s superuse PD   0:00  1
>> (Resources)
>>448 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>478 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>479 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>480 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>481 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>482 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>483 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>484 Backgroun sleepy.s groupboo PD   0:00  1
>> (Resources)
>>449 Backgroun sleepy.s superuse PD   0:00  1
>> (Resources)
>>450 Backgroun sleepy.s superuse PD   0:00  1
>> (Resources)
>>465 Backgroun sleeping   centos PD   0:00  1
>> (Resources)
>>466 Backgroun sleeping   centos PD   0:00  1
>> (Resources)
>>467 Backgroun sleeping   centos PD   0:00  1
>> (Resources)
>>
>>
>> [root@master yp]# sprio
>>   JOBID PARTITION   PRIORITYAGE  FAIRSHAREJOBSIZE
>> PARTITIONQOS
>> 448 Backgroun  13667 58484   3125
>>   1  0
>> 449 Backgroun  13205 58 23   3125
>>   1  0
>> 450 Backgroun  13205 58 23   3125
>>   1  0
>> 465 Backgroun  13157 32  0   3125
>>   1  0
>> 466 Backgroun  13157 32  0   3125
>>   1  0
>> 467 Backgroun  13157 32  0   3125
>>   1  0
>> 469 Backgroun   10013157 32  0   3125
>>   1   1000
>> 478 Backgroun  13640 32484   3125
>>   1  0
>> 479 Backgroun  13640 32484   3125
>>   1  0
>> 480 Backgroun  13640 32484   3125
>>   1  0
>> 481 Backgroun  13610 32454   3125
>>   1  0
>> 482 Backgroun  13610 32454   3125
>>   1  0
>> 483 Backgroun  13610 32454   3125
>>   1  0
>> 484 Backgroun  13610 32454   3125
>>   1  0
>> 492 Priority 1003158 11 23   3125
>> 100  0
>>
>>
>> I'm trying to troubleshoot why the highest priority job is not next to
>> run, jobs in the partition called "Priority" seem to run firs

Re: [slurm-users] How do I impose a limit the memory requested by a job?

2019-03-12 Thread Paul Edmon
Slurm should automatically block or reject jobs that can't run on that 
partition in terms of memory usage for a single node.  So you shouldn't 
need to do anything.  If you need something less than the max memory per 
node then you will need to enforce some limits.  We do this via a 
jobsubmit lua script.  That would be my recommended method.



-Paul Edmon-


On 3/12/19 12:31 PM, David Baker wrote:


Hello,


I have set up a serial queue to run small jobs in the cluster. 
Actually, I route jobs to this queue using the job_submit.lua script. 
Any 1 node job using up to 20 cpus is routed to this queue, unless a 
user submits their job with an exclusive flag.



The partition is shared and so I defined memory to be a resource. I've 
set default memory/cpu to be 4300 Mbytes. There are 40 cpus installed 
in the nodes and the usable memory is circa 17200 Mbytes -- hence my 
default mem/cpu.



The compute nodes are defined with RealMemory=19, by the way.


I am curious to understand how I can impose a memory limit on the jobs 
that are submitted to this partition. It doesn't make any sense to 
request more than the total usable memory on the nodes. So could 
anyone please advise me how to ensure that users cannot request more 
than the usable memory on the nodes.



Best regards,

David


PartitionName=serial nodes=red[460-464] Shared=Yes MaxCPUsPerNode=40 
DefaultTime=02:00:00 MaxTime=60:00:00 QOS=serial 
SelectTypeParameters=CR_Core_Memory *DefMemPerCPU=4300* State=UP 
AllowGroups=jfAccessToIridis5 PriorityJobFactor=10 PreemptMode=off






Re: [slurm-users] How to force jobs to run next in queue

2019-03-12 Thread Sean Brisbane
Hi,

Thanks for your help.

Either setting qos or setting priority doesn't work for me.  However I have
found the cause if not the reason.

Using a Priority setting on the partition called "Priority" in slurm.conf
seems to force all jobs waiting on this queue to run first regardless of
any qos set on a job.  Priority is not a limit, but I think this is a bit
inconsistent with the limit hierarchy we see elsewhere and possibly even a
bug.

1. Partition QOS limit*2. Job QOS limit*
3. User association
4. Account association(s), ascending the hierarchy
5. Root/Cluster association*6. Partition limit*
7. None

So for multiple partitions with differing priorities, I can get the same
effect by moving the priority into a qos, applying a qos on the partition,
and then taking care to set OverPartQOS flag on the "boost" qos.

Does anyone have a feeling for why setting a high Priority on a partition
makes jobs run in that partition first regardless that a job in a different
Partition may have a much higher overall priority?


Sean



On Mon, 11 Mar 2019 at 17:00, Sean Brisbane 
wrote:

> Hi,
>
> I'm looking to have a way an administrator can boost any job to be next to
> run when resources become available.  What is the best practice way to do
> this? Happy to try something new :-D
>
> The way I thought to do this was to have a qos with a large priority and
> manually assign this to the job.  Job 469 is the job in this example I am
> trying to elevate to be next in queue.
>
> scontrol update jobid=469 qos=boost
>
> sprio shows that this job is the highest priority by quite some way,
> however, job nbumber 492 will be next to run
>
> squeue (qxluding runnign jobs)
>  JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
>469 Backgroun sleeping   centos PD   0:00  1
> (Resources)
>492  Priority sleepy.s superuse PD   0:00  1
> (Resources)
>448 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>478 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>479 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>480 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>481 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>482 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>483 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>484 Backgroun sleepy.s groupboo PD   0:00  1
> (Resources)
>449 Backgroun sleepy.s superuse PD   0:00  1
> (Resources)
>450 Backgroun sleepy.s superuse PD   0:00  1
> (Resources)
>465 Backgroun sleeping   centos PD   0:00  1
> (Resources)
>466 Backgroun sleeping   centos PD   0:00  1
> (Resources)
>467 Backgroun sleeping   centos PD   0:00  1
> (Resources)
>
>
> [root@master yp]# sprio
>   JOBID PARTITION   PRIORITYAGE  FAIRSHAREJOBSIZE
> PARTITIONQOS
> 448 Backgroun  13667 58484   3125
> 1  0
> 449 Backgroun  13205 58 23   3125
> 1  0
> 450 Backgroun  13205 58 23   3125
> 1  0
> 465 Backgroun  13157 32  0   3125
> 1  0
> 466 Backgroun  13157 32  0   3125
> 1  0
> 467 Backgroun  13157 32  0   3125
> 1  0
> 469 Backgroun   10013157 32  0   3125
> 1   1000
> 478 Backgroun  13640 32484   3125
> 1  0
> 479 Backgroun  13640 32484   3125
> 1  0
> 480 Backgroun  13640 32484   3125
> 1  0
> 481 Backgroun  13610 32454   3125
> 1  0
> 482 Backgroun  13610 32454   3125
> 1  0
> 483 Backgroun  13610 32454   3125
> 1  0
> 484 Backgroun  13610 32454   3125
> 1  0
> 492 Priority 1003158 11 23   3125
> 100  0
>
>
> I'm trying to troubleshoot why the highest priority job is not next to
> run, jobs in the partition called "Priority" seem to run first.
>
>  The job 469  has no qos, partition, user accounts or group limits on the
> number of cpus,jobs,nodes etc.  I've set this test cluster up from scratch
> to be sure!
>
> [root@master yp]# scontrol show job 469
> JobId=469 JobName=sleeping.sh
>UserId=centos(1000) GroupId=centos(1000) MCS_label=N/A
>Priority=10013161 Nice=0 Ac

[slurm-users] How do I impose a limit the memory requested by a job?

2019-03-12 Thread David Baker
Hello,


I have set up a serial queue to run small jobs in the cluster. Actually, I 
route jobs to this queue using the job_submit.lua script. Any 1 node job using 
up to 20 cpus is routed to this queue, unless a user submits their job with an 
exclusive flag.


The partition is shared and so I defined memory to be a resource. I've set 
default memory/cpu to be 4300 Mbytes. There are 40 cpus installed in the nodes 
and the usable memory is circa 17200 Mbytes -- hence my default mem/cpu.


The compute nodes are defined with RealMemory=19, by the way.


I am curious to understand how I can impose a memory limit on the jobs that are 
submitted to this partition. It doesn't make any sense to request more than the 
total usable memory on the nodes. So could anyone please advise me how to 
ensure that users cannot request more than the usable memory on the nodes.


Best regards,

David


PartitionName=serial nodes=red[460-464] Shared=Yes MaxCPUsPerNode=40 
DefaultTime=02:00:00 MaxTime=60:00:00 QOS=serial 
SelectTypeParameters=CR_Core_Memory DefMemPerCPU=4300 State=UP 
AllowGroups=jfAccessToIridis5 PriorityJobFactor=10 PreemptMode=off




Re: [slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Renfro, Michael
If the failures happen right after the job starts (or close enough), I’d use an 
interactive session with srun (or some other wrapper that calls srun, such as 
fisbatch).

Our hpcshell wrapper for srun is just a bash function:

=

hpcshell ()
{
srun --partition=interactive $@ --pty bash -i
}

=

The interactive partition argument is optional, but we use it as a time- and 
resource-limited partition with a higher priority. I always recommend our users 
to develop and debug with interactive jobs, and only submit the full production 
job with sbatch after all the easy bugs have been identified.

-- 
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University

> On Mar 12, 2019, at 9:26 AM, Selch, Brigitte (FIDF)  
> wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hello,
>  
> Some jobs have to be restarted several times until they run.
> Users start the Job, it fails, they have to do some changes,
> they start the job again, it fails again … and so on.
>  
> So they want to keep the resources until the job is running properly.
>  
> Is there a possibility to ‘inherit’ allocated resources
> from one job to the next.
>  
> Or something else to do the job? 
>  
> All our jobs are submitted with sbatch
>  
> Thank you,
> Brigitte Selch
>  
>  
>  
> Mit freundlichen Grüßen,
> Brigitte Selch
>  
> MAN Truck & Bus AG
> IT Produktentwicklung Simulation (FIDF)
> Vogelweiher Str. 33
> 90441 Nürnberg
>  
> Telefon +49 911 420 6056
> brigitte.se...@man.eu
>  
> 
> 
> MAN Truck & Bus AG
> Sitz der Gesellschaft: München
> Registergericht: Amtsgericht München, HRB 86963
> Vorsitzender des Aufsichtsrates: Andreas Renschler
> Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr. Carsten 
> Intra, Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg, Dr. Frederik Zohm
> 
> You can find information about how we process your personal data and your 
> rights in our data protection notice: www.man.eu/data-protection-notice
> 
> This e-mail (including any attachments) is confidential and may be privileged.
> If you have received it by mistake, please notify the sender by e-mail and 
> delete this message from your system.
> Any unauthorised use or dissemination of this e-mail in whole or in part is 
> strictly prohibited.
> Please note that e-mails are susceptible to change.
> MAN Truck & Bus AG (including its group companies) shall not be liable for 
> the improper or incomplete transmission of the information contained in this 
> communication nor for any delay in its receipt.
> MAN Truck & Bus AG (or its group companies) does not guarantee that the 
> integrity of this communication has been maintained nor that this 
> communication is free of viruses, interceptions or interference.



[slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Selch, Brigitte (FIDF)
Hello,

Some jobs have to be restarted several times until they run.
Users start the Job, it fails, they have to do some changes,
they start the job again, it fails again ... and so on.

So they want to keep the resources until the job is running properly.

Is there a possibility to 'inherit' allocated resources
from one job to the next.

Or something else to do the job?

All our jobs are submitted with sbatch

Thank you,
Brigitte Selch



Mit freundlichen Grüßen,
Brigitte Selch

MAN Truck & Bus AG
IT Produktentwicklung Simulation (FIDF)
Vogelweiher Str. 33
90441 Nürnberg

Telefon +49 911 420 6056
brigitte.se...@man.eu




MAN Truck & Bus AG
Sitz der Gesellschaft: München
Registergericht: Amtsgericht München, HRB 86963
Vorsitzender des Aufsichtsrates: Andreas Renschler
Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr. Carsten Intra, 
Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg, Dr. Frederik Zohm

You can find information about how we process your personal data and your 
rights in our data protection notice: www.man.eu/data-protection-notice

This e-mail (including any attachments) is confidential and may be privileged.
If you have received it by mistake, please notify the sender by e-mail and 
delete this message from your system.
Any unauthorised use or dissemination of this e-mail in whole or in part is 
strictly prohibited.
Please note that e-mails are susceptible to change.
MAN Truck & Bus AG (including its group companies) shall not be liable for the 
improper or incomplete transmission of the information contained in this 
communication nor for any delay in its receipt.
MAN Truck & Bus AG (or its group companies) does not guarantee that the 
integrity of this communication has been maintained nor that this communication 
is free of viruses, interceptions or interference.



Re: [slurm-users] weight setting not working

2019-03-12 Thread Eli V
On Tue, Mar 12, 2019 at 1:14 AM Andy Leung Yin Sui  wrote:
>
> Hi,
>
> I am new to slurm and want to use weight option to schedule the jobs.
> I have some machine with same hardware configuration with GPU cards. I
> use QoS to force user at least required 1 gpu gres when submitting
> jobs.
> The machine serve multiple partition.
> What I want is consume dedicated nodes first when schedule gpu_2h
> parition jobs by adding  weight settings.(e.g. schedule to GPU38/39
> rather than 36/37). However, the scheduler turns out not following the
> weight settings and schedule to 36/37 (e.g. srun -p gpu_2h).
> All the GPU node are idle and the billing are same, did I miss
> something? Was it some limitation if a nodes server multiple partition
> or consume GRES?  Please advise. Thank you very much.
>
> Below are the setting which may help.
> slurm.conf
> NodeName=gpu[36-37] Gres=gpu:titanxp:4  ThreadsPerCore=2 State=unknown
> Sockets=2  CPUs=40 CoresPerSocket=10 Weight=20
> NodeName=gpu[38-39] Gres=gpu:titanxp:4  ThreadsPerCore=2 State=unknown
> Sockets=2  CPUs=40 CoresPerSocket=10 Weight=1
>
>
> PartitionName=gpu_2h Nodes=gpu[36-39] Default=YES MaxTime=02:00:00
> DefaultTime=02:00:00 MaxNodes=1 State=UP AllowQos=GPU
> PartitionName=gpu_8h Nodes=gpu[31-37] MaxTime=08:00:00
> DefaultTime=08:00:00  MaxNodes=1 State=UP AllowQos=GPU
>
>
> # sinfo -N -O nodelist,partition,gres,weight
>
>
> NODELISTPARTITION   GRESWEIGHT
> gpu36   gpu_2h* gpu:titanxp:4   20
> gpu36   gpu_8h  gpu:titanxp:4   20
> gpu37   gpu_2h* gpu:titanxp:4   20
> gpu37   gpu_8h  gpu:titanxp:4   20
> gpu38   gpu_2h* gpu:titanxp:4   1
> gpu39   gpu_2h* gpu:titanxp:4   1
>

You didn't mention the version of slurm you are using. Weights are
known to be broken in early versions of 18.08. I think it was fixed in
18.08.04 put you'd have to go back and read the release message to
confirm.



Re: [slurm-users] How to force jobs to run next in queue

2019-03-12 Thread Bjørn-Helge Mevik
Sean Brisbane  writes:

> I'm trying to troubleshoot why the highest priority job is not next to run,
> jobs in the partition called "Priority" seem to run first.
>
[...]
> The partition called "Priority" has a priority boost assigned through qos.
>
> PartitionName=Priority Nodes=compute[01-02]  Default=NO MaxTime=INFINITE
> State=UP Priority=1000 QOS=Priority
> PartitionName=Background Nodes=compute[01-02]   Default=YES
> MaxTime=INFINITE State=UP Priority=10
>
> Any Ideas would be much appreciated.

I suggest you look at the discussion of the partition settings
PriorityJobFactor and PriorityTier in slurm.conf, which supersede the
Priority setting.  I saw an explanation of why they did the change, but
cannot remember where right now (sorry, it's early in the morning here
:), but *if* I remember right, it went something like this: "Priority
didn't do what one would expect, it changed the order of how the partitions
that were scheduled which overruled the jobs' priority."  This is what
PriorityTier does, so you might want to try using PriorityJobFactor
instead of Priority.

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo


signature.asc
Description: PGP signature