Hi Tom, You really should plan towards "monkey boy" as being the only resource capable of performing the recovery. I classify "disasters" in three classes:
1. Minor: This would be like a prolonged regional power failure, but all facilities, systems and equipment are still present. 2. Major: This would be something involving the total loss of the computer facility and perhaps even some staff, but at least some people with expertise and knowledge of the business and systems are available to participate in the recovery. For example, a fire at the computer facility. 3. Total: All facilities and staff are impacted and unavailable, nobody has expertise and knowledge of the business and systems - this is the "monkey boy" scenario. An example might be a natural disaster (hurricane, tornado, earthquake, etc.) or act of terrorism. The recovery system is prestaged on tape, it's a small footprint z/VM system with EVERYTHING needed to perform the recovery (it senses available ARM libraries, networks, DASD, etc. etc.). Restoring the recovery system to disk and IPL'ing is the one part of the process that is not menu driven, but it's well documented and really only involves a few steps: 1. Mount recovery tape. 2. IPL DDR on the head of the recovery tape. 3. Restore the recovery system to disk. 4. IPL the recovery system from disk and startup the menu-based recovery system. The 4 steps above are the most "technical" part of the process, the "monkey boy" simply needs to execute commands and provide responses provided on a Recovery Worksheet completed by the host site technical staff, so even though it is a "technical" process, it's really more a question of following instructions (whether you understand them or not). :) I think we probably use around 30 or 40 full 3590 tapes, but since the entire process is automated and you can run multiple concurrent restore streams it becomes a moot point (basically you just tell the process how many 3590 drives are available for your use in the available ARM's and it will dispatch a number of dynamic DDR recovery slaves, one per tape drive). Years ago I wrote an automated recovery system for BellCore (the research and development arm of the old "phone company"). Back then, it might take 3 or 4 3480 carts to hold a single DASD spindle. That process was pretty elegant too, the SL on the tapes (and yes, all of my DDR tapes use SL tapes - which of course is a challenge in and of itself!) specified what contents were on that cartridge. The operators would fire up the DR Recovery process and simply start opening buckets of tapes and stuffing them into 3480 autoloaders in any order. The recovery system would piece it all together, and of course had a nice status monitor showing the progress of each spindle recovery and the ability to quiesce/restart slaves and such. I remember one time when we were at Sunguard for a DR drill the z/OS guys (MVS back in those days) were busily responding to a million human-interactive steps, locating and mounting specific tapes, initiating jobs to read tapes/write to DASD, etc. etc. - the VM guys were just sitting there having a coffee and watching the automated recovery monitor, occasionally opening a fresh tub of tapes and stuffing them into 3480 autoloaders in no particular order. Since we also had multiple concurrent streams going our recovery was done painlessly and QUICKLY compared to the poor ol' MVS guys. :) -Mike -----Original Message----- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Tom Duerbusch Sent: Tuesday, June 17, 2008 1:04 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: DDR'ing 3390 DASD To Remote Location I do like the "monkey boy" concept. I keep trying to go towards that state. You have a menu driven system, which implies you are not restoring to bare iron. So what was prestaged? Do you have a mini VM system on tape (or DVD) to be restored? If this is a Flex or P/390 type system, never mind. They have some other, easier, more interesting options. For one site, which doesn't do disaster recovery tests, I'm thinking of a TS1120 backup and a script, which will be testing on one of our LPARs, for recovery of our systems. At another site, which does disaster recovery tests, and Sungard has VM, I'm looking at scripts for the standalone side, to eliminate the people errors when restoring their systems. BTW, they keep the most recent 2 generations offsite, just in case of a media failure. They are 3480 based, and the standalone restores take some 80 tapes. Uggg! Tom Duerbusch THD Consulting >>> Michael Coffin <[EMAIL PROTECTED]> 6/17/2008 11:49 AM >>> Hi Tom, Yep, I have that all covered. This is actually a DR process that I developed about 8 years ago (and have been improving over the years) that is a complete "soup to nuts" system, automating both the backups AND the recovery. The system assumes a "worst case scenario" where computer center is gone, and all of the people having detailed knowledge of the system are likewise "gone", it allows someone with virtually no knowledge about the specifics of the system being recovered and very minimal mainframe knowledge to fully recover the system. When we conduct DR drills we typically recruit a management type that has minimal computing skills and little specific knowledge about the system (someone we affectionately refer to as the "monkey boy", with the understanding that a slightly trained monkey could actually complete this task) to actually conduct the recovery. The programming staff provides no input to the "monkey boy", instead taking notes of anything in the documentation that they found unclear and/or any technical problems that may arise. The entire process is menu-driven and pretty slick (including a Recovery Monitor that reports what each DDR slave is doing, what's it's ETA to completion of the current task is, what the total ETA to full system recovery is, the ability to quiesce and restart slaves/streams/devices, etc.). In 2006 there was a massive flood in Washington DC that required implementation of this DR Plan. I'm pleased to say it worked without a hitch, and from the time we got the green light to start spinning tapes we were back up in running in something like 3 or 4 hours (I think we had about 20 DDR slaves running simultaneously). While this process works extremely well, I now want to remove tapes from the process - there are a number of reasons why this makes sense: 1. There is a Federal mandate to encrypt all removable media which this site is subject to, and we don't presently have TS1120 tape drives/cartridges. 2. Tapes can be lost and/or damaged (damage used to happen with ALARMING frequency!), one bad tape and your entire recovery could be jeapordized. Ultimately, I'd like to have our production DASD replicate (either in realtime or via a nightly batch job) to a remote DASD array using PPRC - but until such time as I get funding to do that (perhaps never!) I need to eliminate the darned tapes. :) -Mike -----Original Message----- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Tom Duerbusch Sent: Tuesday, June 17, 2008 12:31 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: DDR'ing 3390 DASD To Remote Location The back half of this also needs to be considered. In case of an actual disaster, the process you are using requires a running VM system before you can do the restore. Disaster recovery sites have VM running. If you have your own replacement hardware, you can bring up the VM starter system, but you may still need software, along with scripts etc, on that site, in order for you to connect to your backup site, and bring the volumes back. Just something to consider before getting to far into the backup half of the project. Tom Duerbusch THD Consulting >>> Michael Coffin <[EMAIL PROTECTED]> 6/17/2008 10:42 AM >>> (Cross-posted on VMESA-L and LINUX-390) Hi Folks, I want to eliminate use of tapes in my weekly DR process. Currently we DDR numerous 3390 spindles to 3590 tape cartridges. I have set up a Linux server at our DR site with a ton of free disk space, but the question becomes what is the best method to get images of our DASD stored on it? I've modified our procedures to use DDR2CMS to create CMS files representing the 3390 DASD images, which are then FTP'd to the Linux server - but the process is VERY inefficient: 1. DDR2CMS produces RECFM=V files which are unsuitable for FTP (I've NEVER had any luck successfully FTP'ing RECFM=V files to a non-CMS environment and getting them back in the correct format later), so I have to COPYFILE (PACK the output from DDR2CMS. DDR2CMS takes around 47 minutes/spindle, and the COPYFILE takes around 38 minutes - the FTP only takes around 17 minutes! So we are really wasting nearly 90 minutes/spindle just prepping the data to be transmitted. 2. The output from DDR2CMS for a 3390-3 spindle may actually be LARGER than a 3390-3 spindle (even using COMPACT), so we need to use 3390-9 spindles as "work space", something I'm not fond of doing (as a general rule we don't use 3390-9's at this site, but I configured a string of them just for this purpose). There is a great tool on the VM download page called PIPEDDR which basically does what DDR2CMS does using PIPE TRACKREAD - and it can write the output to a TCPIP stage. This is exactly what I'm looking for, with ONE important difference - PIPEDDR only talks to a remote VM/CMS system running PIPEDDR to receive the output, I need to be able to PIPE the output to a remote Linux storage server. Can anyone recommend a nice client that can run on Linux and listen on a TCPIP port, accept some authorization credentials and host commands (i.e. MKDIR, CD to dir, etc.) and receive/write to disk a stream of data similar to what PIPEDDR might write to it's TCPIP stage? I could then skip creating the DDR2CMS file and COPYFILE (PACKing it, writing "indirectly" to the Linux server. I'd rather not reinvent the wheel if there's already something out there. :) PS: It would be sweet if there were just a way to mount a remote EXT3 filesystem somehow on CMS, but it looks like the only way to do this is with NFS, which is a problem because it is considered an "unsafe protocol". :( -Mike