Agree on DDF - with the proviso that we need to check we're actually getting transaction endings out of DDF (CMTSTAT=INACTIVE etc).
Cheers, Martin Martin Packer, Mainframe Performance Consultant, zChampion Worldwide Banking Center of Excellence, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker From: Cheryl Walker <che...@watsonwalker.com> To: IBM-MAIN@bama.ua.edu, Date: 05/06/2012 03:13 PM Subject: Re: WLM : multiple periods not recommended for batch - why? Sent by: IBM Mainframe Discussion List <IBM-MAIN@bama.ua.edu> I'm writing a series of articles for my Tuning Letter about service level agreements and mentioned in the last issue that I strongly believe in single period batch and two-period TSO service classes. One of my readers asked me to clarify, so I pulled up an old article on multi-period batch. It will soon be added to our website as part of the z/OS 101 Primer articles that are free to the public - http://www.watsonwalker.com/articles.html. I've included the entire article below, but would like to qualify that I consider work like DDF to be more like TSO, needing two periods, than batch. (I've kept this as plain text, so it isn't pretty. Sorry.) Best regards, Cheryl ====================== Cheryl Watson Watson & Walker, Inc. www.watsonwalker.com ====================== Multi-Period Batch What are the advantages and disadvantages of running batch in single-period service class versus a multi-period service class? We must have heard this question at least six times at the latest SHARE. Although we did provide an answer in our September 1994 TUNING Letter, we think it's time for an update. We'll address the considerations for both batch and production jobs, because they tend to have different requirements. Test Batch If your intention is to provide the best turnaround to the most people by allowing large resource consumers to suffer slightly, then you'll want to use the typical method of managing test batch jobs. That method simply consists of getting as many of the small jobs through the system, at a high dispatch priority, as you can. You would then let the larger jobs run at a lower priority, and possibly miss their service goals. This technique is used in almost every data center today. The only difference is in how it's implemented. Let us describe the two typical methods and the pros and cons of each. Priority by Job Classes The most common technique is to define a set of test batch job classes that allow a certain set of resources. For example, you might define the following test batch job classes: A - Less than 5 seconds CPU time, no tapes - 10 minute turnaround B - Less than 15 seconds CPU time, 0 to 1 tape - 30 minute turnaround C - Unlimited CPU time, 0 or 1 tape - 2 hour turnaround D - Unlimited CPU time, unlimited tapes - overnight Then you would define some JES initiators to process these jobs. There are dozens of ways to set up initiators, but a typical scenario, might be: Init 1 - Classes: A Init 2 - Classes: A Init 3 - Classes: B Init 4 - Classes: BA Init 5 - Classes: CA Init 6 - Classes: DCBA You would then set up a single period service class for each job class. As one example: TSTBATA - 90% within 10 minutes TSTBATB - 90% within 30 minutes TSTBATC - period 1 = velocity of 20%; period 2 = discretionary TSTBATD - discretionary We're making an assumption that there aren't enough ended class C jobs to allow a response time goal. The advantage of this technique is that the initiators will determine the highest priority jobs to allow into MVS. If the operators feel that the system is too busy at the moment, they can close down the initiators in order of 6, 5, 4, 3, 2 and 1. When jobs in classes A and B get onto an initiator, they'll go into a single-period service class and stay at the same dispatch priority while they're executing. For those job classes, the first jobs on an initiator are normally the first jobs completed. Job classes C and D, on the other hand, have unlimited CPU time. They might need 20 seconds of CPU time or three hours of CPU time - you don't really know. Therefore, the multi-period batch allows you to push the smaller of these large jobs through the system by setting the dispatch priority of period one to provide higher performance. Priority by Period Prioritizing test batch jobs by their actual use rather than their anticipated use is another common technique. In this method, there would be just one test batch job class. The initiators would be used to manage the number of test jobs in the system, but wouldn't differentiate between the short jobs or the long jobs. A service class for this method might have four periods and look like: Period 1 - 90% within 10 minutes, duration = 1000 Service Units (SUs) Period 2 - 90% within 30 minutes, duration = 3000 SUs Period 3 - velocity of 20%, duration = 10000 SUs Period 4 - discretionary All test jobs would enter the system in a first-come, first-served order. As soon as MVS sees them, they will probably be run at a high dispatch priority until they've consumed 1000 service units. Those jobs taking less than 4,000 service units (1000 in period one and 3000 in period two) have the next highest priority and will be completed next. The longer jobs will compete at the same low priority, with the smaller jobs typically completing first. Comparison The first method using job classes takes more effort on the part of the sysprogs and the programmers that submit jobs. The sysprogs will need to analyze the current data to determine appropriate job class groupings, the job class information will need to be distributed to the users, and the users will need to estimate their job's usage before they submit the job. If they guess too low, the jobs will ABEND with a time-out for the job class. If they guess too high, they'll get poorer service than they deserve. If job class designations change, there may be additional ABENDs because programmers have a tendency to use "old JCL," change a line or two (seldom the job class) and submit the job. If the job runs in the wrong job class, you may have an ABEND. This technique is not too productive for the programmers, but will result in the best turnaround times for the majority of users. (As you'll see next.) The second method is very simple to use, but can lead to severe problems. The programmers don't need to be concerned with which job class to use, and the sysprogs don't need to do any analysis (other than determine the appropriate durations for the periods). MVS and SRM will get the shortest jobs out at the highest priority. This method has a large problem however, and that's the possibility that a programmer will overload the initiators with a lot of long-running jobs. If the short jobs can't even get on an initiator, WLM can't get them completed in time. This technique is very useful when you have plenty of resources, and can greatly over-initiate the system. If the system is very constrained, and you have to limit the number of initiators, it may be difficult or impossible to get the small jobs in and out of the system. The biggest problem with this technique is that you can't guarantee turnaround times for your users. It becomes extremely difficult to manage to a set of ! service level objectives. The easiest compromise seems to be to define a very, very small number of batch job classes, such as three. Make them significantly different enough in terms of resource usage that it will be easy for the users to choose the correct class. For service levels, you might simply talk about short, medium, and long batch (e.g. class A, B, and C), with response goals for the first two. When setting up your initiators, make sure that classes A and B have at least one dedicated initiator each (otherwise a bunch of class C jobs could grab the initiators during slack times and class A jobs couldn't get started). Then create a multi-period service class for class C. This would give you the capability of managing your test jobs to provide consistent turnaround times for classes A and B. Class C users that use the least amount of resources would tend to get better response than those using more resources. WLM managed initiators can have similar problems with multi-period batch service classes, because the work is run in the same type of service classes. An additional problem with these initiators is that if all of the current initiators are blocked with long running jobs and small jobs are missing their goals, then WLM will open up more initiators. It's quite possible that the system can become over-initiated. WLM will eventually stop the unnecessary initiators, but the problem may exist for a period of time. Production Batch Production batch jobs present a different problem. The most typical scenario is that all production batch jobs are placed in a single job class with TYPRUN=HOLD in the JCL. Then they are released one job at a time by the operations staff or an automated scheduler. There are no turnaround times associated with production batch jobs. Although the intention of most production batch jobs is to complete before the online systems come up in the morning, there is no way to indicate this to MVS and SRM. Most installations have solved the problem by identifying critical jobs in the batch cycle and assigning them to a higher priority service class (one with a higher dispatch priority). If you put all production batch in a multi-period service class, you will generally have problems when resources become constrained. One of the major jobs in the critical path, for example, might fall into second or third period. If resources are constrained, other smaller jobs will come in and out of the system at a higher priority than the critical batch job. In a very constrained system, the critical job could take three to four times the normal elapsed time just because it's running at a low priority. Often the solution to this is to move that particular job to a higher priority, single-period, service class. But this solution is applied one job at a time as problems are diagnosed. In general, multi-period production batch is very frustrating to the operators and schedulers. If they've released a job, it's because they want it to run (now!). They don't want it to lie around the bottom of the resource pool using CPU only when nobody else wants it. For production jobs, usually first-in, first-out is the desired mode of operation. With single-period production batch service classes, that's what you get. With multi-period service classes, the job using the most resources will take the longest, sometimes to the detriment of the critical batch window. This has been one of the primary causes for sites not meeting their critical batch window. If you're having trouble getting your batch jobs complete before your online systems come up each day, check to see if this is the cause. ** pull-quote - Multi-period production batch service classes are one of the primary causes for missing your batch window goals One alternative for this is to identify the critical jobs in your batch cycle and place them in a unique job class assigned to a unique single-period service class. This service class would run at a higher velocity and importance than other batch, and would exhibit first-in, first-out characteristics. On May 1, 2012, at 7:39 AM, Andrew Rowley wrote: > Hi, > > I have read a few articles that say that multiple periods are not recommended for batch service classes. Multiple periods seems to be considered a bit old fashioned. > > I haven't been able to find anything clearly explaining why. I have always felt that they worked well. My best guess is that it is something to do with the behaviour of WLM managed initiators but I'm not sure. > > Can anyone shed any light, or point me to some further reading? > > Thanks > > Andrew Rowley > > -- > Andrew Rowley > Black Hill Software Pty. Ltd. > Phone: +61 413 302 386 > > EasySMF for z/OS: Interactive SMF Reports on Your PC > http://www.smfreports.com > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN