The original post I think asked for procedures so I will share our SOP.
My considerations

1. SAD to DASD is very good and has been since 4.3.0 the simplification
of procedure is easily justification for the DASD on reserve to capture
the SAD
2. Allocate two sets of SAD data sets large enough for the largest LPAR
3. Take the SAD! If you have a failure don't assume you got enough to
diagnose it prior or hope lighting won't strike twice.  First Failure
data capture and root cause analysis are worth the effort when anything
as large and valued as a z/OS image has failed! 
4. Review IBM z/OS Best Practices: Large Stand-Alone Dump Handling
Version 2

http://tinyurl.com/2kf5em 

http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD103286 

        Best Regards, 

                Sam Knutson, GEICO 
                Performance and Availability Management 
                mailto:[EMAIL PROTECTED] 
                (office)  301.986.3574 

"Think big, act bold, start simple, grow fast..."
 
   Standard Operating Procedure           November 7, 2006
                                    One Step Stand-Alone Dump 

This document outlines the Standard Operating Procedure for a One-Step
Stand-Alone Dump (SAD) of <SNIP>  LPAR after a system failure/hang
condition or if a z/OS system is not responding.  This SOP replaces all
previous z/OS Stand-Alone Dump procedures.  Perform the following steps
in order to create a Stand-Alone system dump. 

The reason for a Stand Alone Dump is to save the master address space
and all its contents some place else to be sent off for further analysis
of why the system failed.  
Procedure to perform a Stand-Alone Dump on an IBM 2094
processor LPAR
Procedure Overview 
A)     Identify failed system
B)     Invoke Stand-Alone Dump Program
C)    Remove failed system from sysplex  (WITHOUT an LPAR RESET)
D)    Wait for SAD program to complete
E)     Re-IPL failed system     
F)     Interrupting the SAD (if needed)

A) IDENTIFY STOPPED OR HUNG LPAR   
Identify console messages that may indicate which system is hung or not
responding. Issue console commands or "ROUTE xSYS,cmd" to possibly clear
the hang condition. Notify the Lead Systems Programmer and SOD
Management of a system hang or failure condition.

B) PERFORM STAND-ALONE DUMP on an IBM 2094 PROCESSOR 

b-1) LOGON TO A  HMC CONSOLE
        Click on Log on and launch the Hardware Management Console web
application  
           Then  
              Logon to the Hardware Management Console with ADVANCED
/password
                 (THE PASSWORD WILL BE GIVEN TO YOU BY YOUR SUPERVISOR) 

b-2) SELECT HMC VIEW OF  DEFINED LPAR's
     In the following steps icon selection will be indicated by a gray
background surrounding the icon on the
     HMC screen.
 
1.       Ensure that the following can be seen  "GROUP WORK  AREA" open
(double click) the   
'STAND ALONE DUMP' icon. 

2.      Double left click the lpar icon to bring up the Instance
Information screen and verify that the 
          SADMP00 activation profile has been selected for activation 
           (ALL lpars now use the SADMP00 Profile).  If not, click on
the Change 
Options button and select the SADMP00 profile in the next screen.  
Click APPLY then CANCEL to return to the Instance Information  screen.  
Click CANCEL again to return to the GROUP WORK AREA.

PROFILES: All lpars now use SADMP00
 
NOTE:  If a second SAD is needed or SYS1.SADMP00 is unavailable, you may
use the SADMP01 profiles to dump to the SYS1.SADMP dataset.

If you initiate an SAD with the SADMP00 profile and receive
message AMD093I that SYS1.SADMP00 is not empty, reply to the 
AMD001A message with device address 2400 to use SYS1.SADMP 
and the SAD will proceed normally without further intervention.
        OR, reply with a tape device address if no dasd datasets are
available.

3.     Then single left click which will highlight the lpar icon for the
lpar which is to be dumped.
       
b-3) INITIATE THE STAND-ALONE-DUMP 
    Perform the following steps to invoke the STAND-ALONE-DUMP of the
LPAR on HMC console. 
        
1. Highlight the lpar icon by a single left click which is the failed
lpar to be dumped.

2. In the "DAILY" task list (right-hand pane), double Left Click the
ACTIVATE icon.

3. An "Activation TASK Confirmation" window will appear. Verify that the
requested lpar is the one, and only one, being activated and the correct
SAD activation profile (SADMP00) is being Activated.  
Click YES to proceed with SAD function.

4. An "ACTIVATE PROGRESS" window will appear showing the Stand-Alone
Dump IPL progress.

5. "STATUS  Success" window will appear.  Click OK to complete.

b-4) INVOKE STAND-ALONE-DUMP CONSOLE FUNCTIONS
The messages will be seen in the Operating SYSTEM  Messages ICON screen
of the  HMC console.  The FAILED LPAR's icon must be highlighted (single
left click to highlight) .  You may need to issue a V CN(*),ACTIVATE on
the HMC console to start things rolling.

Two DASD datasets have been set up for SAD and assigned to Load Profiles
SADMP00 and SADMP01. 
 They are laid out as follows:
DATASET VOLUMES     ADDR
SYS1.SADMP00 SADA20   2300 
            SADA21  4300 

SYS1.SADMP  2400
        SADB21 4400
                                                
 
    The SAD program will normally display the functions it is performing
with the following messages.  
 
AMD083I AMDSADMP: STAND-ALONE DUMP INITIALIZED
AMD094I 2300 SADA20 SYS1.SADMP00                                  
SYS1.SADMP00                                  
        SENSE ID DATA: FF 3990 E9 3390 0A  BLOCKSIZE: 24,960
AMD101I OUTPUT DEVICE: 2300 SADA20 SYS1.SADMP00          
        SENSE ID DATA: FF 3990 E9 3390 0A  BLOCKSIZE: 24,960
AMD101I OUTPUT DEVICE: 4300 SADA21 SYS1.SADMP00

        SENSE ID DATA: FF 3990 E9 3390 0A  BLOCKSIZE: 24,960
AMD005I DUMPING OF REAL STORAGE NOW IN PROGRESS.
AMD005I DUMPING OF REAL STORAGE COMPLETED (MINIMAL).
AMD005I DUMPING OF REAL STORAGE COMPLETED (SUMMARY).
AMD005I DUMPING OF REAL STORAGE COMPLETED (IN-USE). 
AMD005I DUMPING OF REAL STORAGE COMPLETED.          
AMD108I DUMPING OF SUMMARY    ADDRESS SPACES COMPLETED.
AMD108I DUMPING OF SWAPPED IN ADDRESS SPACES COMPLETED.
AMD056I DUMPING OF VIRTUAL STORAGE COMPLETED.
AMD104I       DEVICE VOLUME USED   DATA SET NAME
        1      2300  SADA20   2%   SYS1.SADMP00
        2      4300  SADA21   1%   SYS1.SADMP00

To get out of the Operating SYSTEM Messages just click on CLOSE.

Highlight (single left click) the Failed LPAR and double click on HMC
Hardware Messages select the block next to the message then click on
DETAILS
 to see the disable wait  state which should be the following: Wait
State of X'410000' if the dump has successfully completed. 
 
To get out of the Message just click CANCEL.

To get out of the HMC Hardware Messages just click CANCEL .

C) REMOVE FAILED LPAR FROM SYSPLEX
Perform the following steps WHILE the SAD program is running to remove
the failed system from
the sysplex.   May be initiated immediately after the ACTIVATE of the
SAD program.

C-1)        On a master console of  another LPAR, enter VARY
XCF,sysname,OFFLINE to start removing the failed system from the
sysplex:

                V XCF,sysname,OFFLINE

Respond to message IXC371D to confirm the VARY XCF command:

xxx IXC371D  CONFIRM REQUEST TO VARY SYSTEM sysname OFFLINE, REPLY      
SYSNAME=sysname TO REMOVE  sysname OR C TO CANCEL

         with

    ###,SYSNAME=sysname

        You will see on the console:
           IXC101I SYSPLEX PARTITIONING IN PROGRESS FOR sysname
REQUESTED BY
           *MASTER*.  REASON: OPERATOR VARY REQUEST


MVS may respond with message IXC102A .

xxx IXC102A  XCF WAITING FOR  SYSTEM sysname DEACTIVATION, REPLY DOWN 
      WHEN MVS ON sysname HAS BEEN SYSTEM RESET.
 
C-2)        Reply to message IXC102A  ..... BUT DO NOT RESET the LPAR. 
                REPLY AS FOLLOWS:
                               ###,DOWN

NOTE: MVS may automatically reply to IXC102A when it detects that the
LPAR has been removed from the sysplex.  A "D XCF" on the console where
you entered the V XCF  (not the SAD console) will verify this status.

        IXC105I SYSPLEX PARTITIONING HAS COMPLETED FOR system 128

        - PRIMARY REASON: SYSTEM REMOVED BY SYSPLEX FAILURE MANAGEMENT
BECAUSE ITS STATUS UPDATE WAS MISSING

       - REASON FLAGS: 000104


     
C-3)        NOTE: DO NOT  perform a hardware SYSTEM RESET or
                 RESET NORMAL on the failed LPAR at any time during 
         the Stand-Alone Dump.


D) WAIT FOR STAND-ALONE DUMP TO COMPETE 
        Highlight (single left click) the Failed LPAR and double click
on HMC Hardware Messages select the block next to the message then click
on DETAILS to see the disable wait  state which should be the following:
Wait State of X'410000' if the dump has successfully completed.  
To get out of the Message just click CANCEL.
To get out of the HMC Hardware Messages just click CANCEL.

              
    Wait for SAD program to complete (AMD056I and device used list or
X'410000' Wait State) and note the dump dataset or tape cartridge
numbers used and give to systems programmer.


 
E) PERFORM RE-IPL OF FAILED LPAR
 GET OUT of the "STAND ALONE DUMP" group by  Double Clicking on "GROUPS"
in the "VIEWS" to 
      Then from the "GROUP WORK AREA" choose the correct GROUP where the
FAILED LPAR's IPL 
        icon is located, 
         e.g. A2094PROD  for  ASYS/CSYS  or 
          B2094 TEST/DEVP for BSYS2/BEND2/HSYS2/ BTST2/ PT012/ PT022

NOTE: At this time a hardware SYSTEM RESET or RESET NORMAL CAN be
performed on the failed LPAR.

Don't forget to verify the Activation Profile for the  IPL, e.g.
ASYSIPL7*.
IPL in Prompt mode.
        

F) If it is necessary to STOP THE STAND-ALONE DUMP MID STREAM 
          1)   Make sure to have LOGGED ON to the HMC with ADVANCED
userid.
                         (THE PASSWORD WILL BE GIVEN TO YOU BY YOUR
SUPERVISOR)
             2)    DOUBLE- CLICK on GROUPS 
             3)    DOUBLE- CLICK on DEFINED CPCs
             4)    HIGHLIGHT (single left click) CPC that the FAILED
LPAR is on, either A2094 or B2094
             5)    SCROLL (arrows at the bottom right) around to
RECOVERY (right-hand pane),  
             6)    DOUBLE- CLICK on SINGLE OBJECT OPERATIONS
             7)    For SINGLE OBJECT OPERATIONS TASK CONFIRMATION
Question Click YES if sure 
                         the correct CPC was highlighted.
             8)    DOUBLE - CLICK on GROUPS
             9)    DOUBLE-CLICK on IMAGES 
           10)  RIGHT MOUSE -CLICK  the FAILED LPAR 
           11)  From the CHPIDS or CPs choice select CPs to display the
logical CPs for FAILED LPAR  
    12)  HIGHLIGHT (single left click) the 00                 
           13)  SCROLL (arrows on the bottom right) around to CP TOOLBOX
(right-hand pane),  
           14)  DOUBLE- CLICK on the ICON that says INTERRUPT 

At this time the message will come up on the console --  
AMD089I DUMP TERMINATED DUE TO EXTERNAL KEY
                        AMD066I AMDSAMP ERROR, CODE=3012,  
                        PSW=xxxxxxxxxxxxxxxx, COMPDATA(AMDSA001)


             To log off of the SINGLE OBECT OPERATIONS
          DOUBLE- CLICK on CONSOLE ACTIONS
          just DOUBLE - CLICK  on the  LOG OFF or DISCONNECT  ICON 
          Make sure the LOG OFF button is selected then Click OK.




       Updated    07/12/2005 (11/7/2006)






====================
This email/fax message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution of this
email/fax is prohibited. If you are not the intended recipient, please
destroy all paper and electronic copies of the original message.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to