Re: WLM : multiple periods not recommended for batch - why?

Martin Packer Sun, 06 May 2012 09:50:18 -0700

Agree on DDF - with the proviso that we need to check we're actually 
getting transaction endings out of DDF (CMTSTAT=INACTIVE etc).

Cheers, Martin

Martin Packer,
Mainframe Performance Consultant, zChampion
Worldwide Banking Center of Excellence, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker
Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

From:
Cheryl Walker <che...@watsonwalker.com>
To:
IBM-MAIN@bama.ua.edu, 
Date:
05/06/2012 03:13 PM
Subject:
Re: WLM : multiple periods not recommended for batch - why?
Sent by:
IBM Mainframe Discussion List <IBM-MAIN@bama.ua.edu>

I'm writing a series of articles for my Tuning Letter about service level 
agreements and mentioned in the last issue that I strongly believe in 
single period batch and two-period TSO service classes. One of my readers 
asked me to clarify, so I pulled up an old article on multi-period batch. 
It will soon be added to our website as part of the z/OS 101 Primer 
articles that are free to the public - 
http://www.watsonwalker.com/articles.html. I've included the entire 
article below, but would like to qualify that I consider work like DDF to 
be more like TSO, needing two periods, than batch. (I've kept this as 
plain text, so it isn't pretty. Sorry.)

Best regards,
Cheryl

======================
Cheryl Watson
Watson & Walker, Inc.
www.watsonwalker.com
======================

Multi-Period Batch

What are the advantages and disadvantages of running batch in 
single-period service class versus a multi-period service class?

We must have heard this question at least six times at the latest SHARE. 
Although we did provide an answer in our September 1994 TUNING Letter, we 
think it's time for an update. We'll address the considerations for both 
batch and production jobs, because they tend to have different 
requirements.

Test Batch

If your intention is to provide the best turnaround to the most people by 
allowing large resource consumers to suffer slightly, then you'll want to 
use the typical method of managing test batch jobs. That method simply 
consists of getting as many of the small jobs through the system, at a 
high dispatch priority, as you can. You would then let the larger jobs run 
at a lower priority, and possibly miss their service goals.

This technique is used in almost every data center today. The only 
difference is in how it's implemented. Let us describe the two typical 
methods and the pros and cons of each.

Priority by Job Classes

The most common technique is to define a set of test batch job classes 
that allow a certain set of resources. For example, you might define the 
following test batch job classes:

      A - Less than 5 seconds CPU time, no tapes - 10 minute turnaround
      B - Less than 15 seconds CPU time, 0 to 1 tape - 30 minute 
turnaround
      C - Unlimited CPU time, 0 or 1 tape - 2 hour turnaround
      D - Unlimited CPU time, unlimited tapes - overnight

Then you would define some JES initiators to process these jobs. There are 
dozens of ways to set up initiators, but a typical scenario, might be:

      Init 1 - Classes:  A
      Init 2 - Classes:  A
      Init 3 - Classes:  B
      Init 4 - Classes:  BA
      Init 5 - Classes:  CA
      Init 6 - Classes:  DCBA

You would then set up a single period service class for each job class. As 
one example:

      TSTBATA - 90% within 10 minutes
      TSTBATB - 90% within 30 minutes
      TSTBATC - period 1 = velocity of 20%; period 2 = discretionary
      TSTBATD - discretionary

We're making an assumption that there aren't enough ended class C jobs to 
allow a response time goal.

The advantage of this technique is that the initiators will determine the 
highest priority jobs to allow into MVS. If the operators feel that the 
system is too busy at the moment, they can close down the initiators in 
order of 6, 5, 4, 3, 2 and 1. When jobs in classes A and B get onto an 
initiator, they'll go into a single-period service class and stay at the 
same dispatch priority while they're executing. For those job classes, the 
first jobs on an initiator are normally the first jobs completed.

Job classes C and D, on the other hand, have unlimited CPU time. They 
might need 20 seconds of CPU time or three hours of CPU time - you don't 
really know. Therefore, the multi-period batch allows you to push the 
smaller of these large jobs through the system by setting the dispatch 
priority of period one to provide higher performance.

Priority by Period

Prioritizing test batch jobs by their actual use rather than their 
anticipated use is another common technique. In this method, there would 
be just one test batch job class. The initiators would be used to manage 
the number of test jobs in the system, but wouldn't differentiate between 
the short jobs or the long jobs.

A service class for this method might have four periods and look like:

      Period 1 - 90% within 10 minutes, duration = 1000 Service Units 
(SUs)
      Period 2 - 90% within 30 minutes, duration = 3000 SUs
      Period 3 - velocity of 20%, duration = 10000 SUs
      Period 4 - discretionary

All test jobs would enter the system in a first-come, first-served order. 
As soon as MVS sees them, they will probably be run at a high dispatch 
priority until they've consumed 1000 service units. Those jobs taking less 
than 4,000 service units (1000 in period one and 3000 in period two) have 
the next highest priority and will be completed next. The longer jobs will 
compete at the same low priority, with the smaller jobs typically 
completing first.

Comparison

The first method using job classes takes more effort on the part of the 
sysprogs and the programmers that submit jobs. The sysprogs will need to 
analyze the current data to determine appropriate job class groupings, the 
job class information will need to be distributed to the users, and the 
users will need to estimate their job's usage before they submit the job. 
If they guess too low, the jobs will ABEND with a time-out for the job 
class. If they guess too high, they'll get poorer service than they 
deserve. If job class designations change, there may be additional ABENDs 
because programmers have a tendency to use "old JCL," change a line or two 
(seldom the job class) and submit the job. If the job runs in the wrong 
job class, you may have an ABEND. This technique is not too productive for 
the programmers, but will result in the best turnaround times for the 
majority of users. (As you'll see next.)

The second method is very simple to use, but can lead to severe problems. 
The programmers don't need to be concerned with which job class to use, 
and the sysprogs don't need to do any analysis (other than determine the 
appropriate durations for the periods). MVS and SRM will get the shortest 
jobs out at the highest priority. This method has a large problem however, 
and that's the possibility that a programmer will overload the initiators 
with a lot of long-running jobs. If the short jobs can't even get on an 
initiator, WLM can't get them completed in time. This technique is very 
useful when you have plenty of resources, and can greatly over-initiate 
the system. If the system is very constrained, and you have to limit the 
number of initiators, it may be difficult or impossible to get the small 
jobs in and out of the system. The biggest problem with this technique is 
that you can't guarantee turnaround times for your users. It becomes 
extremely difficult to manage to a set of !
 service level objectives.

The easiest compromise seems to be to define a very, very small number of 
batch job classes, such as three. Make them significantly different enough 
in terms of resource usage that it will be easy for the users to choose 
the correct class. For service levels, you might simply talk about short, 
medium, and long batch (e.g. class A, B, and C), with response goals for 
the first two. When setting up your initiators, make sure that classes A 
and B have at least one dedicated initiator each (otherwise a bunch of 
class C jobs could grab the initiators during slack times and class A jobs 
couldn't get started). Then create a multi-period service class for class 
C. This would give you the capability of managing your test jobs to 
provide consistent turnaround times for classes A and B. Class C users 
that use the least amount of resources would tend to get better response 
than those using more resources.

WLM managed initiators can have similar problems with multi-period batch 
service classes, because the work is run in the same type of service 
classes. An additional problem with these initiators is that if all of the 
current initiators are blocked with long running jobs and small jobs are 
missing their goals, then WLM will open up more initiators. It's quite 
possible that the system can become over-initiated. WLM will eventually 
stop the unnecessary initiators, but the problem may exist for a period of 
time.

Production Batch

Production batch jobs present a different problem. The most typical 
scenario is that all production batch jobs are placed in a single job 
class with TYPRUN=HOLD in the JCL. Then they are released one job at a 
time by the operations staff or an automated scheduler. There are no 
turnaround times associated with production batch jobs. Although the 
intention of most production batch jobs is to complete before the online 
systems come up in the morning, there is no way to indicate this to MVS 
and SRM. Most installations have solved the problem by identifying 
critical jobs in the batch cycle and assigning them to a higher priority 
service class (one with a higher dispatch priority).

If you put all production batch in a multi-period service class, you will 
generally have problems when resources become constrained. One of the 
major jobs in the critical path, for example, might fall into second or 
third period. If resources are constrained, other smaller jobs will come 
in and out of the system at a higher priority than the critical batch job. 
In a very constrained system, the critical job could take three to four 
times the normal elapsed time just because it's running at a low priority. 
Often the solution to this is to move that particular job to a higher 
priority, single-period, service class. But this solution is applied one 
job at a time as problems are diagnosed.

In general, multi-period production batch is very frustrating to the 
operators and schedulers. If they've released a job, it's because they 
want it to run (now!). They don't want it to lie around the bottom of the 
resource pool using CPU only when nobody else wants it. For production 
jobs, usually first-in, first-out is the desired mode of operation. With 
single-period production batch service classes, that's what you get. With 
multi-period service classes, the job using the most resources will take 
the longest, sometimes to the detriment of the critical batch window. This 
has been one of the primary causes for sites not meeting their critical 
batch window. If you're having trouble getting your batch jobs complete 
before your online systems come up each day, check to see if this is the 
cause.

** pull-quote - Multi-period production batch service classes are one of 
the primary causes for missing your batch window goals

One alternative for this is to identify the critical jobs in your batch 
cycle and place them in a unique job class assigned to a unique 
single-period service class. This service class would run at a higher 
velocity and importance than other batch, and would exhibit first-in, 
first-out characteristics.

On May 1, 2012, at 7:39 AM, Andrew Rowley wrote:

> Hi,
> 
> I have read a few articles that say that multiple periods are not 
recommended for batch service classes. Multiple periods seems to be 
considered a bit old fashioned.
> 
> I haven't been able to find anything clearly explaining why. I have 
always felt that they worked well. My best guess is that it is something 
to do with the behaviour of WLM managed initiators but I'm not sure.
> 
> Can anyone shed any light, or point me to some further reading?
> 
> Thanks
> 
> Andrew Rowley
> 
> -- 
> Andrew Rowley
> Black Hill Software Pty. Ltd.
> Phone: +61 413 302 386
> 
> EasySMF for z/OS: Interactive SMF Reports on Your PC
> http://www.smfreports.com
> 
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN

Re: WLM : multiple periods not recommended for batch - why?

Reply via email to