Tom Duerbusch wore: > There are multiple layers to this... > > 1. As others have said, you can switch to another LPAR on the same or > different hardware. > 2. I've read about, but haven't actually used, a facility in Linux to > switch a work load from one Linux machine to another. I think that > started in SLES8. > 3. The major applications, such as Oracle, have a facility to switch > to another Oracle image running somewhere else (with both being in > sync). > > It all depends on what you need and what you are willing to pay for. > That last few minutes towards true 24X7 is rather costly. > > It isn't really a VM issue because VM isn't a true 24X7 operation > system. With VM, we can fake things and come close to true 24X7, but > compared to z/OS and parrallel sysplex, not close.
some recent postings with old historical references a couple recent posts about fast recovery after failure http://www.garlic.com/~lynn/2007c.html#21 http://www.garlic.com/~lynn/2007c.html#41 i had worked on complete rewrite of i/o subsystem for the disk engineering and product test labs (bldg. 14&15) ... when I started they were doing all their "testcell" testings in stand-alone environment. They had done some attempts with MVS ... but at the time, MVS MTBF (system crash or hang) was 15 minutes. The point of i/o subsystem rewrite was to make it absolute bullet proof so that they could have "on-demand" testing with multiple concurrent "testcells". http://www.garlic.om/~lynn/subtopic.html#disk some of this got me in lots of trouble with manager of MVS RAS when I happen to mention some of it (like it was my fault that MVS was crashing) ... reference to one example: http://www.garlic.com/~lynn/2007.html#2 with old email example http://www.garlic.com/~lynn/2007.html#email801015 another recent with old reference: http://www.garlic.com/~lynn/2007b.html#28 then there is internal distribution of combination of lots of enhancements http://www.garlic.com/~lynn/2007c.html#12 that includes old email about it drastically improving uptime at STL (now called silicon valley lab) http://www.garlic.com/~lynn/2007c.html#email830709 There was a project in this time frame SJR to do a high availability configuration with multiple vm/4341s ... however it ran into a lot of internal corporate political problems. Pieces had to be significantly modified (resulting in functional and performance degradation) before any of it shipped. This was sort of in the same time-frame as R-Star and Starburst. System/R was the original relational/sql implementation done on VM. R-Star and Starburst were followon distributed efforts http://www.garlic.com/~lynn/subtopic.html#systemr Of course, when we got around to doing real HA product, it wasn't with VM or mainframes http://www.garlic.com/~lynn/subtopic.html#hacmp In HA/CMP we had actually done the work on scaleup with Oracle ... some past references http://www.garlic.com/~lynn/95.html#13 http://www.garlic.com/~lynn/96.html#15 http://www.garlic.com/~lynn/2007b.html#16 with old email http://www.garlic.com/~lynn/2007b.html#email910928 A lot of work involved doing the DLM (distributed lock manager) for HA/CMP ... since then, similar implementations with similar fall-over recovery implementations have appeared. My wife had earlier been con'ed into going to POK to be in charge of loosely-coupled architecture. While there she had created the peer-coupled shareddata architecture http://www.garlic.com/~lynn/subtopic.html#shareddata However, except for IMS hotstandby, there wasn't a lot of uptake until sysplex.