Hi John, DON'T RUN OUT OF BERTS
5K even 25K is not that much. You need to take a look at your checkpoint space including your BERTWARN threshold and your automation then make sure you have a long runway and visibility to problems before they become a crisis. Here is our current status $DCKPTSPACE $HASP852 CKPTSPACE $HASP852 CKPTSPACE BERTNUM=200098,BERTFREE=180633,BERTWARN=80, $HASP852 CKPT1=(CAPACITY=16275,UNUSED=6448), $HASP852 CKPT2=(CAPACITY=16188,UNUSED=6374) The only update I have to what is below is that I also have this monitored at an even lower level in CA-SYSVIEW before the JES2 message automation would trip for 80% using variables. JESBERTP BERTs percent JESBERTU BERTs used Best Regards, Sam Knutson -----Original Message ----- From: Knutson, Sam Sent: Thursday, August 28, 2008 3:25 PM To: 'IBM Mainframe Discussion List' Subject: DON'T RUN OUT OF BERTS (or you may COLD start JES2) & APAR OA25562 Hi, One of the best sessions I went to at SHARE in San Jose all week was 2665 - How to Recover from JES2 Imminent Disasters. Tom Wasik from JES2 development shared some lessons other customers have painfully learned recently. Shortages of JES2 resource BERTs while rare can occur and mishandling has resulted in COLD starts in production for a half dozen including some very large well prepared z/OS customers in the past few months. IBM has done some significant work in APAR OA25562 and spoken to further work planned for a future release in the presentation. To avoid being one of the next ones you might want to review the presentation and APAR. Then review your site's current levels of resource allocation in JES2 and readiness for a shortage. Automation is probably key to responding quickly and correctly. Preparation can reduce the likelihood of ever have the problem. 2665 - How to Recover from JES2 Imminent Disasters Direct link to PDF http://ew.share.org/client_files/callpapers/attach/SHARE_in_San_Jose/S2665TW093140.pdf or TinyURL version http://tinyurl.com/56faxa Abstract page in SHARE proceedings with link on page to PDF http://ew.share.org/proceedingmod/abstract.cfm?abstract_id=18548&conference_id=19 or TinyURL version http://tinyurl.com/6htgnj SHARE http://www.share.org/ has a lot of great resources besides this presentation. OA25562: IMPROVMENTS FOR $HASP050 BERT RESOURCE SHORTAGE TO IDENTIFY CRITICAL RESOURCE http://www.ibm.com/support/docview.wss?uid=isg1OA25562 The most important message of presentation DO NOT RUN OUT OF BERTS! And if you do run out of BERTS DO NOT IPL OR RESTART JES2 And if you do IPL or restart JES2 YOU MAY BE FORCED TO COLD START So How do you Recover from JES2 Imminent Disasters? Never let things get to that stage We are working on updating our existing automation that is driven when HASP050 is issued to parse out the resource type and the percentage and take more specific actions. Automation is being added for HASP050 and new HASP052 to insure technical staff is paged automatically. Automation is being added to issue $JDHISTORY once each day in each JESPLEX to keep a history of resource use in the logs. We tracked APAR OA25562 and will be installing the PTF. We insured all the systems staff understand the significance of BERTS and know a few important commands to display JES2 resource use. Hopefully this will be useful to you too. There were hundreds of other great sessions at SHARE in San Jose. There will be many more at SHARE in Austin, TX March 1-6, 2009. Best Regards, Sam Knutson, GEICO System z Performance and Availability Management mailto:[EMAIL PROTECTED] (office) 301.986.3574 "It is a good thing to follow the first law of holes; if you are in one stop digging. " Denis Healey -----Original Message----- From: IBM Mainframe Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of John McKown Sent: Monday, October 27, 2008 11:35 AM To: IBM-MAIN@BAMA.UA.EDU Subject: JES2 BERTs??? We had a really weird semi-outage over the weekend. People could not logon to TSO (but could to CICS), jobs would simply "stop" running, and other strange events. It turns out that we had run out of JES2 BERTs. We are converting a very huge, to us, number of reports from one archival product to a different one. The only way that we have found to do this is to "print" every archived report to DASD, then process it into the new product. Our process is to look at each archive tape in one job. This job restores the VSAM based archive file. That job then submits "n" jobs ("n" == number of reports on that tape) which each print a separate report from that tape to a different DASD dataset. This resulted, at the time, of having close to 10,000 jobs in the system. They were all "small" jobs (like 400 lines). So, anybody want to tell me what a BERT is really used for? It seems to be a generic "overflow" type control block. How do I avoid this in the future? This shocked up because we had to increase our BERTNUM from 5,000 to 25,000! -- John ==================== This email/fax message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of this email/fax is prohibited. If you are not the intended recipient, please destroy all paper and electronic copies of the original message. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html