Re: WLM issue with a proposed solution
Dispatching priorities mean nothing if the work is getting done. You're using the WLM; you should learn and use its terminology. -teD Original Message From: Tracy Adams Sent: Thursday, April 28, 2016 15:57 To: IBM-MAIN@LISTSERV.UA.EDU Reply To: IBM Mainframe Discussion List Subject: Re: WLM issue with a proposed solution The importance (priority) of DB2 is set 2, as well as the CICS service class. It serves both the CICS and batch jobs. I only speak of dispatching priorities because isn't ultimately that is driven by the collective results of WLM? To Mark's question, I am not sure what is stalling those transactions, I will try to collect some delay information. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Martin Packer Sent: Thursday, April 28, 2016 3:49 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: WLM issue with a proposed solution Hello Tracy. What importance have you set DB2 address spaces' service class(es) to? Likewise the things it serves, such as CICS regions and CICS transactions/ If DB2 is getting locked out it could be caused by it being Imp 2 or something, rather than Imp 1 with a goal 70+. I also note you're mainly talking dispatching priorities rather than WLM language. Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Tracy Adams <tad...@fbbrands.com> To: IBM-MAIN@LISTSERV.UA.EDU Date: 28/04/2016 19:22 Subject: WLM issue with a proposed solution Sent by: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> So here is my issue: We have a soft capped LPAR that runs our DB2 and CICS regions and during the day some "marketing batch". On Wednesdays, the marketing batch (online submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice while we are capped, the busiest CICS slow down to the point where some old automation kicks in to kill transactions over 45 seconds old, some of these transactions dump through DumpMaster, we then go to max sockets and more transactions dump and in 10 - 30 seconds all is fine again. What I see: The CICS regions have a DP around EC and are meeting their service goal of 99% under .5 seconds. But there are tens of thousands transactions that have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service class. I believe the problem lies with the DB2 service class. That has a definition of velocity at 66 and it tends to run below that when there is more contention in the system. The DP of the DB2 region is F6. My theory: when this brown out occurs the resources are maxed out and the CICS regions being the ones that have meet their goal and will have to suffer many transactions missing the service goal to make the DP go up. They get hung up just long enough to cause the delays that trigger the "panic" automation to clear the stalled transactions. Chaos breaks out! My proposal: A. limit the batch jobs to a max of three by controlling open initiators for their job class. B. change the DB2 velocity to 60 C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little more desperate. Thoughts? TIA, Tracy -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
On the zIIP point assume most of DB2 DBM1 in V10 and nearly all in V11 is zIIP-eligible. (And, yes, SMF 30 gives you the actual numbers.) And thereby hangs another tale... :-) Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Neil Duffee <nduf...@uottawa.ca> To: IBM-MAIN@LISTSERV.UA.EDU Date: 02/05/2016 20:20 Subject: Re: WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> Caveat: as a daily digester, responses are implicitly delayed... Tracy: among other good advice you got, I'll emphasize that the Importance for your Databases (DB2, etc) must be higher than your Applications (Cics, etc) to avoid [some of] these time-out/deadlock scenarios. I strongly suggest reading the WLM RedBook. [1] It has specific chapters on Cics, DB2, etc. Secondly, I'd avoid strangling WLM but, rather, tend to suggest loosening the rules. If WLM has this leeway, it is more able to balance the workload and, after all, that's the whole point of a WorkLoad Manager. I use the concept where the rules are, "what can you tolerate when things go south?" vs. "how do I want things to perform normally?" [2] When there's sufficient resources, all your classes will over-perform. By bumping up the Cics minimum you're forcing WLM to deprecate others of the same Importance (or less; such as DB2 from your message). Rather, by loosening the restrictions, DB2 is allowed to breath some. In fact, you'll see below [5] that our Online-Hi is 75% in 1 second but our typical Cics response is 0.3 seconds and 0.8 on bad days. Third, you might consider removing your long-running Cics transactions to a different Transaction group because they can skew the accumulated WLM results. Below [5], you'll see I have a group LONGRUN that encompasses monitoring tasks which, essentially, never end; meaning *bad* response times. Instead, because we cycle our production Cics each workday, they're shunted to the ONLINELG service class with 75% in 1 second so they don't pollute the ONLINEHI stats. Lastly, tho' I believe it is the default, make sure you have I/O Priority management [3] set to YES. It will encourage WLM to promote lower classed work such as Batch to a higher DP (temporarily) to clear the blockage. It will repeat the process if necessary and results can be seen in the RMF reporting [4] under LCK. (LCK or ENQ?) The Dynamic alias tuning management will let WLM manage your hyper-volSer UCB allocations as well. (can't remember the real name at the moment.) A zIIP was suggested but, unless you're doing Java in Cics, it won't *directly* help your Cics/DB2 problems. However, depending on your z/OS & DB2, more things are becoming zIIP-able ie. tcp/ip, system XML services, DRDA, etc. Plus, it's not included in your 4hr cap or licencing. ps. the DB2 velocity goal can be a small, red herring. It applies to activities that are not assigned to specific enclaves such as Dasd I/O & lock management. Your Batch work will be in a Batch class enclave (SRB) within DB2 and be dispatched as such. This is one of the places where you will see promotion by WLM occur due to enqueues/locks. [1] System Programmer’s Guide to: Workload Manager SG24-6472-03 [2] The latter is from the old Dispatching Priority mentality that needs to be dropped. Instead, DP is employed by WLM to achieve the minimum goals you have defined. [3] WLM samples: Service Coefficient/Service Definition Options: I/O priority management . . . . . . . . YES Dynamic alias tuning management . . . . YES [4] RMF reporting --PROMOTED-- BLK0.062 ENQ 52.084 CRM 21.455 LCK 654.084 SUP0.000 [5] WLM samples: Transaction Name Group LONGRUN - Long running CICS transactions Qualifier Starting name position Description - B11R BETA93 C* CICS supplied transactions OSEC Omegamon OSRV Omegamon -from the Cics monitor: CSSY, CSTP, CSNC, CSZI, CEX2, CSHQ, CSNE, OSRV, & OSEC all have elapsed/response times in days Subsystem Type CICS - CICS transactions Classification: Default service class is ONLINELO Default report class is CICS Qualifier Qualifier Starting Service # type name position Class - -- -- - 1 SIGCICSPRD1 ONLINEHI 2 . TNG . LONGRUN ONLINELG Service Class ONLINELG - Long running tr
Re: WLM issue with a proposed solution
75% complete within 00:00:01.000 > signature = 8 lines follows < Neil Duffee, Joe Sysprog, uOttawa, Ottawa, Ont, Canada telephone:1 613 562 5800 x4585 fax:1 613 562 5161 mailto:NDuffee of uOttawa.ca http:/ /aix1.uOttawa.ca/ ~nduffee “How *do* you plan for something like that?” Guardian Bob, Reboot “For every action, there is an equal and opposite criticism.” “Systems Programming: Guilty, until proven innocent” John Norgauer 2004 "Schrodinger's backup: The condition of any backup is unknown until a restore is attempted." John McKown 2015 -Original Message- From: Tracy Adams [mailto:tad...@fbb...com] Sent: April 29, 2016 08:55 Subject: Re: WLM issue with a proposed solution Thank you all for chiming in! Yeah the bottom line... figure out why those sub second transactions get stalled! Hard to tune your way out of a locking condition :-) I will check out the SYSSTC actual velocity... that is a good bench mark to what my max achievable would be around. Happy Friday Martin, sounds like you have written the book on this! Gotta go read about resource groups. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Scott Chapman Sent: Friday, April 29, 2016 6:40 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: WLM issue with a proposed solution >If your batch jobs are running Dicretionary at a DP lower than CICS, it >is very unlikely that they are causing significant CICS delays. True from a CPU perspective. But the batch jobs could be locking resources in DB2 that are delaying the CICS transactions. And if the batch jobs holding those locks are progressing very slowly due to running in discretionary when there's little CPU available, the locks may persist for an extended period of time, elongating CICS transaction response time. Or I saw a similar situation once where some batch queries exhausted the RID pool, which caused sub-second CICS transactions to start taking over 60 seconds. That's fortunately harder to do on the later versions of DB2. In short, while adjusting the goals very well may be in order, I'd be inclined to first look into the apparently unusually long running CICS transactions to identify why those particular transactions are taking a long time. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Thanks! One note on SYSSTC: Whether the velocity GOAL of "STCHI" matches the MEASURED velocity of SYSSTC or not the latter is still protected relative to the former. Sensitised to this because a recent customer situation saw DBM1 in SYSSTC, competing with IRLM. I normally - in my graphing at least - have SYSSTC as "Imp 0" and SYSTEM as "Imp -1", though in my REXX I probably make everyone take 1 step down. :-) But I would expect the delivered velocity for "STCHI" to be (slightly) lower than SYSSTC. Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Tracy Adams <tad...@fbbrands.com> To: IBM-MAIN@LISTSERV.UA.EDU Date: 29/04/2016 13:55 Subject:Re: WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> Thank you all for chiming in! Yeah the bottom line... figure out why those sub second transactions get stalled! Hard to tune your way out of a locking condition :-) I will check out the SYSSTC actual velocity... that is a good bench mark to what my max achievable would be around. Happy Friday Martin, sounds like you have written the book on this! Gotta go read about resource groups. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Scott Chapman Sent: Friday, April 29, 2016 6:40 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: WLM issue with a proposed solution >If your batch jobs are running Dicretionary at a DP lower than CICS, it >is very unlikely that they are causing significant CICS delays. True from a CPU perspective. But the batch jobs could be locking resources in DB2 that are delaying the CICS transactions. And if the batch jobs holding those locks are progressing very slowly due to running in discretionary when there's little CPU available, the locks may persist for an extended period of time, elongating CICS transaction response time. Or I saw a similar situation once where some batch queries exhausted the RID pool, which caused sub-second CICS transactions to start taking over 60 seconds. That's fortunately harder to do on the later versions of DB2. In short, while adjusting the goals very well may be in order, I'd be inclined to first look into the apparently unusually long running CICS transactions to identify why those particular transactions are taking a long time. Scott -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Thank you all for chiming in! Yeah the bottom line... figure out why those sub second transactions get stalled! Hard to tune your way out of a locking condition :-) I will check out the SYSSTC actual velocity... that is a good bench mark to what my max achievable would be around. Happy Friday Martin, sounds like you have written the book on this! Gotta go read about resource groups. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Scott Chapman Sent: Friday, April 29, 2016 6:40 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: WLM issue with a proposed solution >If your batch jobs are running Dicretionary at a DP lower than CICS, it >is very unlikely that they are causing significant CICS delays. True from a CPU perspective. But the batch jobs could be locking resources in DB2 that are delaying the CICS transactions. And if the batch jobs holding those locks are progressing very slowly due to running in discretionary when there's little CPU available, the locks may persist for an extended period of time, elongating CICS transaction response time. Or I saw a similar situation once where some batch queries exhausted the RID pool, which caused sub-second CICS transactions to start taking over 60 seconds. That's fortunately harder to do on the later versions of DB2. In short, while adjusting the goals very well may be in order, I'd be inclined to first look into the apparently unusually long running CICS transactions to identify why those particular transactions are taking a long time. Scott -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
>If your batch jobs are running Dicretionary at a DP lower than CICS, it is >very >unlikely that they are causing significant CICS delays. True from a CPU perspective. But the batch jobs could be locking resources in DB2 that are delaying the CICS transactions. And if the batch jobs holding those locks are progressing very slowly due to running in discretionary when there's little CPU available, the locks may persist for an extended period of time, elongating CICS transaction response time. Or I saw a similar situation once where some batch queries exhausted the RID pool, which caused sub-second CICS transactions to start taking over 60 seconds. That's fortunately harder to do on the later versions of DB2. In short, while adjusting the goals very well may be in order, I'd be inclined to first look into the apparently unusually long running CICS transactions to identify why those particular transactions are taking a long time. Scott -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Agree: "Achievable" is what's important here. Please measure it - with load. This is where I struggle without IBM-MAIN being a visual medium. :-) I plot velocity WITH LOAD and see how it droops. My code has done this for years and I present this graphing method regularly - to individual customers as well as at conferences. Drooping tells you most of what you need to know. :-) Of course it's an economic decision whether you let DB2's velocity (really the service class' velocity) falter with load. Or whether a velocity in the 30-40% range is acceptable. So I'm not making an absolute "70%" statement; I merely observe MOST customers achieve 60+, many 70+, some 80+, a few 90+ . One other observation: In the service class that DB2 (notionally "STCHI") I typically see the main "Using" sample being "Using I/O". It's worthwhile establishing this. Of course, if DBM1 is not the "dominant" address space this picture could look quite different. To repeat, slightly altered: It's worthwhile figuring out why "STCHI" has the velocity it has, when it has. Hoping this helps, rather than confusing. In any case it's a great way to slide into a Friday. :-) And I've a feeling Marna and I could do a while podcast episode on just this one topic. :-) Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Edward Finnell <000248cce9f3-dmarc-requ...@listserv.ua.edu> To: IBM-MAIN@LISTSERV.UA.EDU Date: 29/04/2016 01:04 Subject:Re: WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> Some of the new features in RMF are an improvement into what's happening. SHARE papers and Redbooks give insight into what to look for in the 'buckets'. The Boebligen folks admit Velocity goals are really tough for RMF due to rapidity of changing landscape. Configuration is very important. I'd hang some zIIPs on that puppy for a start. In a message dated 4/28/2016 6:49:39 P.M. Central Daylight Time, and...@blackhillsoftware.com writes: available (maybe only 1 or 2), in which case a velocity of 70 is probably not achievable. 30 or 40 might be what you realistically get. Perhaps looking at the velocity of SYSSTC might give an idea of the limit of achievable velocity? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
My experience is that CICS will suffer if the LPAR is being soft capped, no matter what you try to do to this situation. So I think the best and only solution is to avoid that the LPAR becomes capped by keeping the batch consumption under control. Not with a limited number of initiators, because which will not control CPU consumption, but with a Resource Group, which will keep the batch CPU consumption within the limits that would otherwise have caused capping. Kees. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Tracy Adams Sent: 28 April, 2016 20:22 To: IBM-MAIN@LISTSERV.UA.EDU Subject: WLM issue with a proposed solution So here is my issue: We have a soft capped LPAR that runs our DB2 and CICS regions and during the day some "marketing batch". On Wednesdays, the marketing batch (online submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice while we are capped, the busiest CICS slow down to the point where some old automation kicks in to kill transactions over 45 seconds old, some of these transactions dump through DumpMaster, we then go to max sockets and more transactions dump and in 10 - 30 seconds all is fine again. What I see: The CICS regions have a DP around EC and are meeting their service goal of 99% under .5 seconds. But there are tens of thousands transactions that have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service class. I believe the problem lies with the DB2 service class. That has a definition of velocity at 66 and it tends to run below that when there is more contention in the system. The DP of the DB2 region is F6. My theory: when this brown out occurs the resources are maxed out and the CICS regions being the ones that have meet their goal and will have to suffer many transactions missing the service goal to make the DP go up. They get hung up just long enough to cause the delays that trigger the "panic" automation to clear the stalled transactions. Chaos breaks out! My proposal: A. limit the batch jobs to a max of three by controlling open initiators for their job class. B. change the DB2 velocity to 60 C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little more desperate. Thoughts? TIA, Tracy -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
On Thu, 28 Apr 2016 19:57:32 +, Tracy Adams wrote: >The importance (priority) of DB2 is set 2 Importance is NOT priority. -- Tom Marchant -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
On Thu, 28 Apr 2016 18:22:11 +, Tracy Adams We have a soft capped LPAR that runs our DB2 and CICS regions and during >the day some "marketing batch". On Wednesdays, the marketing batch (online >submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once >or twice while we are capped, the busiest CICS slow down to the point where >some old automation kicks in to kill transactions over 45 seconds old, some of >these transactions dump through DumpMaster, we then go to max sockets and >more transactions dump and in 10 - 30 seconds all is fine again. >What I see: The CICS regions have a DP around EC and are meeting their >service goal of 99% under .5 seconds. But there are tens of thousands >transactions that have led to this. The batch jobs (3-5 of them), while >running >10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service >class. I believe the problem lies with the DB2 service class. That has a >definition >of velocity at 66 and it tends to run below that when there is more >contention in >the system. The DP of the DB2 region is F6. Are your CICS regions still meeting their goals when these anomalies occur? If your batch jobs are running Dicretionary at a DP lower than CICS, it is very unlikely that they are causing significant CICS delays. You say that DB2 sometimes fails to meet its goals when the system is loaded. That suggests to me that 66% isn't achievable, and it may be causing WLM to work extra hard to try to meet that goal. If the DB2 address spaces really are running at higher DP than CICS when the problems occur, then they are probably ok. Are your batch jobs using DB2 or other high priority address spaces? If your DB2 address goals are too aggressive, dropping the velocity from 66 to 60 won't make much difference. Have you read John Arwe's paper on velocity goals? I'm not a fan of percentile goals as high as 99%. It doesn't take many outliers to cause you to fail to meet your goal. Assuming that the vast majority of your transactions are quick, it won't matter whether your percentile is 99% or e.g. 80%. I like to set my percentile response times for the fastest transactions in each CICS address space and let the rest go along for the ride. That's not likely your problem though. How does your "old automation" determine that there is a problem? When one of these long running transactions are canceled, do you know what was going on in them? Are they just unusual transactions that take a long time, or are they in a loop or something? I wonder if the real problem is that this automation is canceling transactions that it shouldn't. -- Tom Marchant -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Some of the new features in RMF are an improvement into what's happening. SHARE papers and Redbooks give insight into what to look for in the 'buckets'. The Boebligen folks admit Velocity goals are really tough for RMF due to rapidity of changing landscape. Configuration is very important. I'd hang some zIIPs on that puppy for a start. In a message dated 4/28/2016 6:49:39 P.M. Central Daylight Time, and...@blackhillsoftware.com writes: available (maybe only 1 or 2), in which case a velocity of 70 is probably not achievable. 30 or 40 might be what you realistically get. Perhaps looking at the velocity of SYSSTC might give an idea of the limit of achievable velocity? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
On 29/04/2016 6:06, Martin Packer wrote: DB2 should have a higher importance than what it serves, so in this case it should be Importance 1. I'd set its goal velocity to what's achievable - probably 70, likely 80, maybe 90. I would not mess with eg 75, 85. By "DB2" I mean DBM1, DIST and MSTR. IRLM should be in SYSSTC. Achievable velocity depends on the number of CPUs available. The original problem sounded like a system with a limited number of CPUs available (maybe only 1 or 2), in which case a velocity of 70 is probably not achievable. 30 or 40 might be what you realistically get. Perhaps looking at the velocity of SYSSTC might give an idea of the limit of achievable velocity? -- Andrew Rowley Black Hill Software +61 413 302 386 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
DB2 should have a higher importance than what it serves, so in this case it should be Importance 1. I'd set its goal velocity to what's achievable - probably 70, likely 80, maybe 90. I would not mess with eg 75, 85. By "DB2" I mean DBM1, DIST and MSTR. IRLM should be in SYSSTC. You'd be surprised how many customers get this wrong. :-( Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Tracy Adams <tad...@fbbrands.com> To: IBM-MAIN@LISTSERV.UA.EDU Date: 28/04/2016 20:57 Subject: Re: WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> The importance (priority) of DB2 is set 2, as well as the CICS service class. It serves both the CICS and batch jobs. I only speak of dispatching priorities because isn't ultimately that is driven by the collective results of WLM? To Mark's question, I am not sure what is stalling those transactions, I will try to collect some delay information. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Martin Packer Sent: Thursday, April 28, 2016 3:49 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: WLM issue with a proposed solution Hello Tracy. What importance have you set DB2 address spaces' service class(es) to? Likewise the things it serves, such as CICS regions and CICS transactions/ If DB2 is getting locked out it could be caused by it being Imp 2 or something, rather than Imp 1 with a goal 70+. I also note you're mainly talking dispatching priorities rather than WLM language. Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Tracy Adams <tad...@fbbrands.com> To: IBM-MAIN@LISTSERV.UA.EDU Date: 28/04/2016 19:22 Subject:WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> So here is my issue: We have a soft capped LPAR that runs our DB2 and CICS regions and during the day some "marketing batch". On Wednesdays, the marketing batch (online submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice while we are capped, the busiest CICS slow down to the point where some old automation kicks in to kill transactions over 45 seconds old, some of these transactions dump through DumpMaster, we then go to max sockets and more transactions dump and in 10 - 30 seconds all is fine again. What I see: The CICS regions have a DP around EC and are meeting their service goal of 99% under .5 seconds. But there are tens of thousands transactions that have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service class. I believe the problem lies with the DB2 service class. That has a definition of velocity at 66 and it tends to run below that when there is more contention in the system. The DP of the DB2 region is F6. My theory: when this brown out occurs the resources are maxed out and the CICS regions being the ones that have meet their goal and will have to suffer many transactions missing the service goal to make the DP go up. They get hung up just long enough to cause the delays that trigger the "panic" automation to clear the stalled transactions. Chaos breaks out! My proposal: A. limit the batch jobs to a max of three by controlling open initiators for their job class. B. change the DB2 velocity to 60 C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little more desperate. Thoughts? TIA, Tracy -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscrib
Re: WLM issue with a proposed solution
The importance (priority) of DB2 is set 2, as well as the CICS service class. It serves both the CICS and batch jobs. I only speak of dispatching priorities because isn't ultimately that is driven by the collective results of WLM? To Mark's question, I am not sure what is stalling those transactions, I will try to collect some delay information. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Martin Packer Sent: Thursday, April 28, 2016 3:49 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: WLM issue with a proposed solution Hello Tracy. What importance have you set DB2 address spaces' service class(es) to? Likewise the things it serves, such as CICS regions and CICS transactions/ If DB2 is getting locked out it could be caused by it being Imp 2 or something, rather than Imp 1 with a goal 70+. I also note you're mainly talking dispatching priorities rather than WLM language. Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Tracy Adams <tad...@fbbrands.com> To: IBM-MAIN@LISTSERV.UA.EDU Date: 28/04/2016 19:22 Subject:WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU> So here is my issue: We have a soft capped LPAR that runs our DB2 and CICS regions and during the day some "marketing batch". On Wednesdays, the marketing batch (online submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice while we are capped, the busiest CICS slow down to the point where some old automation kicks in to kill transactions over 45 seconds old, some of these transactions dump through DumpMaster, we then go to max sockets and more transactions dump and in 10 - 30 seconds all is fine again. What I see: The CICS regions have a DP around EC and are meeting their service goal of 99% under .5 seconds. But there are tens of thousands transactions that have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service class. I believe the problem lies with the DB2 service class. That has a definition of velocity at 66 and it tends to run below that when there is more contention in the system. The DP of the DB2 region is F6. My theory: when this brown out occurs the resources are maxed out and the CICS regions being the ones that have meet their goal and will have to suffer many transactions missing the service goal to make the DP go up. They get hung up just long enough to cause the delays that trigger the "panic" automation to clear the stalled transactions. Chaos breaks out! My proposal: A. limit the batch jobs to a max of three by controlling open initiators for their job class. B. change the DB2 velocity to 60 C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little more desperate. Thoughts? TIA, Tracy -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Hello Tracy. What importance have you set DB2 address spaces' service class(es) to? Likewise the things it serves, such as CICS regions and CICS transactions/ If DB2 is getting locked out it could be caused by it being Imp 2 or something, rather than Imp 1 with a goal 70+. I also note you're mainly talking dispatching priorities rather than WLM language. Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Cloud & Systems Performance, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker Podcast Series (With Marna Walle): https://developer.ibm.com/tv/category/mpt/ From: Tracy AdamsTo: IBM-MAIN@LISTSERV.UA.EDU Date: 28/04/2016 19:22 Subject:WLM issue with a proposed solution Sent by:IBM Mainframe Discussion List So here is my issue: We have a soft capped LPAR that runs our DB2 and CICS regions and during the day some "marketing batch". On Wednesdays, the marketing batch (online submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice while we are capped, the busiest CICS slow down to the point where some old automation kicks in to kill transactions over 45 seconds old, some of these transactions dump through DumpMaster, we then go to max sockets and more transactions dump and in 10 - 30 seconds all is fine again. What I see: The CICS regions have a DP around EC and are meeting their service goal of 99% under .5 seconds. But there are tens of thousands transactions that have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service class. I believe the problem lies with the DB2 service class. That has a definition of velocity at 66 and it tends to run below that when there is more contention in the system. The DP of the DB2 region is F6. My theory: when this brown out occurs the resources are maxed out and the CICS regions being the ones that have meet their goal and will have to suffer many transactions missing the service goal to make the DP go up. They get hung up just long enough to cause the delays that trigger the "panic" automation to clear the stalled transactions. Chaos breaks out! My proposal: A. limit the batch jobs to a max of three by controlling open initiators for their job class. B. change the DB2 velocity to 60 C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little more desperate. Thoughts? TIA, Tracy -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Do you know why some of the transactions are taking longer than 45 seconds? A CICS performance monitor should be able to break down the response time. In other words, are you sure that the delay is caused by CPU restraints? On 28 April 2016 at 22:40, Staller, Allanwrote: > Set the DB2 goal to be "more reasonable" FSVO reasonable and see what > happens. > > > We have a soft capped LPAR that runs our DB2 and CICS regions and during > the day some "marketing batch". On Wednesdays, the marketing batch (online > submit via CICS) increases and by afternoon we hit our 4 hour soft cap. > Once or twice while we are capped, the busiest CICS slow down to the point > where some old automation kicks in to kill transactions over 45 seconds > old, some of these transactions dump through DumpMaster, we then go to max > sockets and more transactions dump and in 10 - 30 seconds all is fine again. > > What I see: The CICS regions have a DP around EC and are meeting their > service goal of 99% under .5 seconds. But there are tens of thousands > transactions that have led to this. The batch jobs (3-5 of them), while > running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of > the service class. I believe the problem lies with the DB2 service class. > That has a definition of velocity at 66 and it tends to run below that > when there is more contention in the system. The DP of the DB2 region is > F6. > > My theory: when this brown out occurs the resources are maxed out and the > CICS regions being the ones that have meet their goal and will have to > suffer many transactions missing the service goal to make the DP go up. > They get hung up just long enough to cause the delays that trigger the > "panic" automation to clear the stalled transactions. Chaos breaks out! > > My proposal: A. limit the batch jobs to a max of three by controlling > open initiators for their job class. B. change the DB2 velocity to 60 > C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP > to be a little more desperate. > > > This email – including attachments – may contain confidential information. > If you are not the intended recipient, do not copy, distribute or act on > it. Instead, notify the sender immediately and delete the message. > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > -- Mike Shorkend m...@shorkend.com www.shorkend.com Tel: +972524208743 Fax: +97239772196 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: WLM issue with a proposed solution
Set the DB2 goal to be "more reasonable" FSVO reasonable and see what happens. We have a soft capped LPAR that runs our DB2 and CICS regions and during the day some "marketing batch". On Wednesdays, the marketing batch (online submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice while we are capped, the busiest CICS slow down to the point where some old automation kicks in to kill transactions over 45 seconds old, some of these transactions dump through DumpMaster, we then go to max sockets and more transactions dump and in 10 - 30 seconds all is fine again. What I see: The CICS regions have a DP around EC and are meeting their service goal of 99% under .5 seconds. But there are tens of thousands transactions that have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service class. I believe the problem lies with the DB2 service class. That has a definition of velocity at 66 and it tends to run below that when there is more contention in the system. The DP of the DB2 region is F6. My theory: when this brown out occurs the resources are maxed out and the CICS regions being the ones that have meet their goal and will have to suffer many transactions missing the service goal to make the DP go up. They get hung up just long enough to cause the delays that trigger the "panic" automation to clear the stalled transactions. Chaos breaks out! My proposal: A. limit the batch jobs to a max of three by controlling open initiators for their job class. B. change the DB2 velocity to 60 C. Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little more desperate. This email � including attachments � may contain confidential information. If you are not the intended recipient, do not copy, distribute or act on it. Instead, notify the sender immediately and delete the message. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN