Jobs are executed according to priority.  This can be
configured in many, many ways.  A jobs priority depends
on somewhat static things like: the queue, account, user, etc.

There are dynamic factors at play also, from fair share
calculations if enabled, and how long the job has been
waiting in the queue (i.e., the longer a job waits in the
queue, the higher its priority).

All else equal, jobs will be run on a first in, first
out basis.

On top of all this, there is backfill.

As Brian explained, if you have a 40 node cluster, and
a job that needs all 40 nodes, when the fat job reaches
the top of the list, the scheduler will reserve all available
nodes for the fat job.  Since some nodes are probably in
use, the fat job still cannot run, but will wait for the
running jobs to finish, and reserve those nodes too.  Once
all 40 nodes are available, the fat job will run.

IIRC (my PBS/Maui/Moab knowledge is a bit rusty, but most
of what I said is scheduler agnostic), by default no 
other jobs can start running once the fat job 
reaches the top of the queued list.  As you notice,
that is not very efficient.  Let's say the fat job is at
the top of the list, but there are 3 x 5 node jobs currently
running, with let's say 12 hours of wall time remaining,
the other 25 nodes will sit idle for about 12 hours (if
the 3 jobs finish early, the fat job can run before the
12 hours is up).

But, again by default, if there are 50 single node jobs
with 4 hour walltime limits, they cannot start until
they reach the top of the queue list, which will be about
12 hours plus however long the fat job runs.

There is something called backfill which can be enabled.
This will allow jobs to use the idle reserve nodes, as
long as their walltime limit is such that they will be
finished (either on their own or killed by scheduler)
before the fat job is expected to be able to use the nodes.

So with backfill, all 50 of the short jobs could run
while the fat job is waiting for the other 3 jobs to
finish.

On Thu, 25 Jun 2015, Andrus, Brian Contractor wrote:
> 
> Fernando,
>
> 
> 
> This may be merely by design.
> 
> When a job is queued, whatever resources are available that it may need but 
> are not yet used are reserved.
> 
> So if it needs 12 cores on a 16 core machine but there is an 8 core job 
> running there, it will reserve the remaining 8 while it waits for the other 4 
> it needs to be freed up.
>
> 
> 
> Now IF there is another 8 core job that is submitted, it has to wait. UNLESS 
> it can run on those 8 cores and be done before the other 8 core job 
> completes. Maui can ?squeeze? it in
> without affecting the start time of that 12 core job.
>
> 
> 
> So it could be you are seeing the resources being reserved for the 12 core 
> job because the smaller jobs could not be run without bumping the soonest 
> start time of the 12 core job.
>
> 
>
> 
> 
> Brian Andrus
> 
> ITACS/Research Computing
> 
> Naval Postgraduate School
> 
> Monterey, California
> 
> voice: 831-656-6238
>
> 
>
> 
>
> 
> 
> From: mauiusers-boun...@supercluster.org 
> [mailto:mauiusers-boun...@supercluster.org] On Behalf Of Fernando Caba
> Sent: Tuesday, June 23, 2015 4:04 PM
> To: Torque Users Mailing List; mauiusers
> Subject: [Mauiusers] Queueing jobs in inappropriate order
>
> 
> 
> 
> Hi All, in my cluster the users run jobs in one node with different quantity 
> of processors (nodes=1:ppn= 4, 8 or 12)
> 
> For some reason, the jobs are queued besides resources are available. For 
> example, a job requiring 12 cores becomes queued and several nodes have 8 
> cores free (we have 8 nodes and
> each node have 12 cores).
> 
> If the users submit new jobs with 4 cores o 8 cores, those jobs don?t run, 
> becomes queued in spite of the available resources.
> 
> Here is my maui.cfg:
> 
> 
> # maui.cfg 3.3.1
> 
> SERVERHOST            fe
> 
> # primary admin must be first in list
> ADMIN1                root
> 
> # Resource Manager Definition
> 
> RMCFG[FE] TYPE=PBS
> 
> # Allocation Manager Definition
> 
> AMCFG[bank]  TYPE=NONE
> 
> # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
> # use the 'schedctl -l' command to display current configuration
> 
> RMPOLLINTERVAL        00:00:30
> 
> SERVERPORT            42559
> SERVERMODE            NORMAL
> 
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
> 
> 
> LOGFILE               maui.log
> LOGFILEMAXSIZE        10000000
> LOGLEVEL              3
> 
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
> 
> QUEUETIMEWEIGHT       1
> 
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
> 
> #FSPOLICY              PSDEDICATED
> #FSDEPTH               7
> #FSINTERVAL            86400
> #FSDECAY               0.80
> 
> # Throttling Policies: 
> http://supercluster.org/mauidocs/6.2throttlingpolicies.html
> 
> # NONE SPECIFIED
> 
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
> 
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
> 
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
> 
> NODEALLOCATIONPOLICY  MINRESOURCE
> #NODEALLOCATIONPOLICY   FIRSTAVAILABLE
> 
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
> 
> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
> 
> # Standing Reservations: 
> http://supercluster.org/mauidocs/7.1.3standingreservations.html
> 
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test]   17:00:00
> # SRDAYS[test]      MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test]   0:30:00
> 
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
> 
> # USERCFG[DEFAULT]      FSTARGET=25.0
> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch]       FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
> CLASSCFG[batch] MAXPROCPERUSER=12
> 
> JOBNODEMATCHPOLICY EXACTPROC
> #JOBNODEMATCHPOLICY      EXACTNODE
> 
> 
> and here is my torque configuration:
> 
> 
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue batch
> #
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.nodes = 8
> set queue batch resources_default.walltime = 4800:00:00
> set queue batch enabled = True
> set queue batch started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server acl_hosts = fe
> set server managers = root@fe
> set server operators = root@fe
> set server default_queue = batch
> set server log_events = 511
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server log_level = 7
> set server mom_job_sync = True
> set server keep_completed = 300
> set server auto_node_np = True
> set server next_job_number = 10422
> set server record_job_info = True
> set server record_job_script = True
> 
> So, i was thinking about the creation of different queue, one for 4 cores 
> jobs, another one for 8 cores jobs and another one for 12 cores jobs. Is this 
> a reasonable policy, forcing
> the exact quantity of cores in each job per corresponding queue (for 4, 8 or 
> 12 cores per job)?
> 
> 
> Thanks in advance!!
> 
> Fernando
>
> 
> 
> --
> 
> [IMAGE]
> Universidad
> Nacional del Sur
> 
> Mg. Fernando Caba
> Director General de Telecomunicaciones
> Avda. Alem 1253, (B8000CPB) Bah?a Blanca - Argentina
> Tel/Fax: (54)-291-4595166
> Tel: (54)-291-4595101 int. 2050
> http://www.dgt.uns.edu.ar
>
> 
> 
> 
>

Tom Payerle
IT-ETI-EUS                              paye...@umd.edu
University of Maryland                  (301) 405-6135
College Park, MD 20742-4111
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to