IBM 99.999999% availability was: RPMs for installs and Maint

Jon Perryman Sat, 26 Aug 2023 14:13:27 -0700

> On Saturday, August 26, 2023 at 02:18:12 AM PDT, David Crayford 
> <dcrayf...@gmail.com> wrote:

> My bank runs a mainframe and I couldn’t use internet banking 
> when abroad because they were running month-end scheduled maintenance. 

You're naive if you think this had anything to do with z/OS regular scheduled 
system maintenance. It's been many years but it was extremely rare that I 
needed to shutdown every LPAR in a sysplex at one time. I've known companies 
that always disable access for a specific timeslot so that customers are 
accustomed to the downtime in the off-chance the time is actually needed for 
some reason. More likely it was something to do with an app that needed to move 
data starting a new month. Maybe it was being over cautious but it's ridiculous 
to say this was about z/OS being down.

>> On 26 Aug 2023, at 9:55 am, Jon Perryman <jperr...@pacbell.net> wrote:
>> I think z/OS uptime is 99.9999%.
> I don’t think so. IBM claim 99.999% single server uptime for z and that’s 
> just the hardware.

You are confusing the definition for Hardware and software uptime because they 
have very different definitions and meanings. More important, you don't 
understand the difference between "single z/OS server", "single Linux server" 
and "single lpar". I've worked on High Availability solutions that can get a 
little flexible in terminology. For instance, SAP HA can cancel a few inflight 
transactions during recovery but it's still considered fully functional for the 
entire time.

A z/OS sysplex is a single server unless a customer chooses not to be. Hardware 
is typically identical for all LPARs in a z/OS sysplex and workload is shifted 
according to definitions. Linux on the other hand requires you to jump through 
hoops with additional software to achieve similar results. I couldn't find z/OS 
availability but at https://www.ibm.com/z/resiliency, IBM says Linux is 
99.999999%. z/OS must be at least the same.

Why do you doubt IBM's claiming 99.999% system z uptime especially considering 
their automated hardware recovery and hotswap? Hardware uptime is calculated 
using equipment MTBF, life expectancy, number of machines and probably more but 
excludes downtime due to customers choice. 99.999% is a believable number 
especially compared to non-IBM equipment.

> That’s the same as they claim for POWER running either AIX or Linux 
> on RedHat Open Shift and what HP claim for Superdomes running HP-UX.
> They all claim higher then five-nines running in clusters. 

z/OS is not a cluster solution. It doesn't rely on data replication nor servers 
to provide common access to data. It doesn't require a server to assign 
workload although it does rely upon a coupling facility for intercommunications 
between LPARs. A z/OS lpar is a fully functional entity that participates 
within the sysplex.

IBM says eight-nines for Linux on system z and I suspect they mean clustering. 
This is well beyond the five-nines of AIX or HP-UX. Still, as you say, these 
require implementing clustering which is not out of the box as simple as z/OS 
sysplex for customers.

> Many providers claiming five-nines availability will add small print to get 
> around this problem. 
> By excluding scheduled downtime, five-nines becomes a lot easier.

This is absurd. You never schedule every LPAR in the Sysplex shutdown except on 
the very rare occasion that there is a compatibility issue. IBM is not 
guaranteeing you eight-nines availability because they do not have complete 
control.

IBM has set high expectations for their z/OS customers, and this affects OEM 
product developers too. The worst call I took involved 35 managers screaming at 
me because they thought my product trashed one z/OS LPAR. Imagine how bad it 
would have been if the entire sysplex was trashed. It turns out that another 
product did a storage stomp on my address space which was vital to the entire 
z/OS LPAR.

>> You get what you pay for. Unix maint philosophy may be acceptable on $10,000 
>> computers
>> but highly unacceptable on multi-million $ computers. We don't tolerate 
>> unintentional downtime.

> That doesn’t stand up to scrutiny! Just ask Air New Zealand in 2009, HSBC in 
> 2011, 
> or the Royal Bank of Scotland in 2013.

Again with absurd statements. If you crash your car, do you claim it's a 
manufacturing defect? Without details, we have no clue if these are relevant. 
These companies will take steps to ensure this doesn't happen again. On a 
$10,000 computer, they will ignore it and do business as usual.

> The fact is that even five-nines availability for an entire computing service 
> is impossible to guarantee. 

No one is offering a guarantee, but eight-nines availability is doable when 
planned correctly. No one can stop customers from compressing the active 
sys1.linklib. No one can stop customers from hiring unqualified people.

> There is too little room for error and Black Swan or unexpected events are 
> impossible to eliminate. 

There is plenty of room for error but you have processes in place to catch 
those errors before they are a problem. If you doubt me, then talk to a doctor 
or nurse because their lives revolve around tedious task that capture errors.

> If you have access to the IBM support portal go and do a search for z/OS Red 
> Alerts. 

Red alerts exist to warn customers. Imagine that, a process that stops 
customers from encountering a know problem that may not even apply to their 
situation. This is something that vendors want to help you avoid. It's 
generally something that was not uncovered during testing. Gee, another process 
to capture errors.

> Software has bugs, applications and subsystems fail.

It's impossible to capture every bug but the process works well. If you get a 
red alert, generally you can install 1 small PTF that has been tested. As a 
customer, you only need to test very small and specific change. Gee, another 
procedure that Unix ignores because they do package installs instead of problem 
fixes that affect you.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

IBM 99.999999% availability was: RPMs for installs and Maint

Reply via email to