Hi John,

DON'T RUN OUT OF BERTS 

5K even 25K is not that much.  You need to take a look at your checkpoint space 
including your BERTWARN threshold and your automation then make sure you have a 
long runway and visibility to problems before they become a crisis.

Here is our current status 

$DCKPTSPACE                                                    
$HASP852 CKPTSPACE                                             
$HASP852 CKPTSPACE  BERTNUM=200098,BERTFREE=180633,BERTWARN=80,
$HASP852            CKPT1=(CAPACITY=16275,UNUSED=6448),        
$HASP852            CKPT2=(CAPACITY=16188,UNUSED=6374)     

The only update I have to what is below is that I also have this monitored at 
an even lower level in CA-SYSVIEW before the JES2 message automation would trip 
for 80% using variables.

JESBERTP          BERTs percent 
JESBERTU          BERTs used     

Best Regards, 

Sam Knutson   


-----Original Message -----
From: Knutson, Sam 
Sent: Thursday, August 28, 2008 3:25 PM
To: 'IBM Mainframe Discussion List'
Subject: DON'T RUN OUT OF BERTS (or you may COLD start JES2) & APAR OA25562 

Hi,

One of the best sessions I went to at SHARE in San Jose all week was 2665 - How 
to Recover from JES2 Imminent Disasters.   Tom Wasik from JES2 development 
shared some lessons other customers have painfully learned recently.

Shortages of JES2 resource BERTs while rare can occur and mishandling has 
resulted in COLD starts in production for a half dozen including some very 
large well prepared z/OS customers in the past few months.  IBM has done some 
significant work in APAR OA25562 and spoken to further work planned for a 
future release in the presentation.

To avoid being one of the next ones you might want to review the presentation 
and APAR. Then review your site's current levels of resource allocation in JES2 
and readiness for a shortage.  Automation is probably key to responding quickly 
and correctly.  Preparation can reduce the likelihood of ever have the problem.
 

2665 - How to Recover from JES2 Imminent Disasters

Direct link to PDF 

http://ew.share.org/client_files/callpapers/attach/SHARE_in_San_Jose/S2665TW093140.pdf
 

or TinyURL version 

http://tinyurl.com/56faxa  

Abstract page in SHARE proceedings with link on page to PDF 

http://ew.share.org/proceedingmod/abstract.cfm?abstract_id=18548&conference_id=19
 

or TinyURL version

http://tinyurl.com/6htgnj 


SHARE http://www.share.org/ has a lot of great resources besides this 
presentation. 



OA25562: IMPROVMENTS FOR $HASP050 BERT RESOURCE SHORTAGE TO IDENTIFY CRITICAL 
RESOURCE

http://www.ibm.com/support/docview.wss?uid=isg1OA25562   



The most important message of presentation

DO NOT RUN OUT OF BERTS!
And if you do run out of BERTS
DO NOT IPL OR RESTART JES2
And if you do IPL or restart JES2
YOU MAY BE FORCED TO COLD START

So How do you Recover from JES2
Imminent Disasters?
Never let things get to that stage


We are working on updating our existing automation that is driven when HASP050 
is issued to parse out the resource type and the percentage and take more 
specific actions.   Automation is being added for HASP050 and new HASP052 to 
insure technical staff is paged automatically. Automation is being added to 
issue $JDHISTORY once each day in each JESPLEX to keep a history of resource 
use in the logs. We tracked APAR OA25562 and will be installing the PTF. We 
insured all the systems staff understand the significance of BERTS and know a 
few important commands to display JES2 resource use. 

Hopefully this will be useful to you too.   There were hundreds of other great 
sessions at SHARE in San Jose. There will be many more at SHARE in Austin, TX 
March 1-6, 2009.  

        Best Regards, 

                Sam Knutson, GEICO 
                System z Performance and Availability Management 
                mailto:[EMAIL PROTECTED] 
                (office)  301.986.3574
             
"It is a good thing to follow the first law of holes; if you are in one stop 
digging. "  Denis Healey


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of John 
McKown
Sent: Monday, October 27, 2008 11:35 AM
To: IBM-MAIN@BAMA.UA.EDU
Subject: JES2 BERTs???

We had a really weird semi-outage over the weekend. People could not logon
to TSO (but could to CICS), jobs would simply "stop" running, and other
strange events. It turns out that we had run out of JES2 BERTs. We are
converting a very huge, to us, number of reports from one archival product
to a different one. The only way that we have found to do this is to "print"
every archived report to DASD, then process it into the new product. Our
process is to look at each archive tape in one job. This job restores the
VSAM based archive file. That job then submits "n" jobs ("n" == number of
reports on that tape) which each print a separate report from that tape to a
different DASD dataset. This resulted, at the time, of having close to
10,000 jobs in the system. They were all "small" jobs (like 400 lines).

So, anybody want to tell me what a BERT is really used for? It seems to be a
generic "overflow" type control block. How do I avoid this in the future?
This shocked up because we had to increase our BERTNUM from 5,000 to 25,000!

--
John

====================
This email/fax message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution of this
email/fax is prohibited. If you are not the intended recipient, please
destroy all paper and electronic copies of the original message.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to