Classification: Confidential

At a relatively small MF shop I used to work at, the cost of downtime was 
pegged at 100K/Hour.
I was able to use this to justify development of a parallel sysplex.

I was able to reduce a 12 hour event (quarterly) to zero, and this saved 4.8 
million annually,
Plus was able to make my job easier with monthly deployments, instead of 
quarterly.

Try that on open systems!




-----Original Message-----
From: IBM Mainframe Discussion List <[email protected]> On Behalf Of 
Phil Smith III
Sent: Sunday, March 2, 2025 9:43 AM
To: [email protected]
Subject: Cost of an outage (was: The mainframe is alive)

[CAUTION: This Email is from outside the Organization. Unless you trust the 
sender, Don't click links or open attachments as it may be a Phishing email, 
which can steal your Information and compromise your Computer.]

Re outages:
> Microsoft had a major outage today that banks, Walmart, insurance
> companies, airlines, and other companies can't

Well...*we* would say "can't". *They* would say "can't". But the reality is 
that if those happen--and they do (cf. Delta's outage last July, for one)--the 
world and the business don't stop. It makes us SMH and leaves their management 
screaming at people, but they don't go out of business. Of that list of 
industries, the airlines are the most critical in terms of real-time 
lost-revenue: if I can't complete my purchase at walmart.com, I might go to 
target.com, but also might just say "I'll try again later". Same with bank, 
insurance, and most other companies. An airline trip has a firm expiration date.

But Delta is still in business, so what does "can't go down" even mean any 
more? Definitely not what it used to.

I'm convinced that some or all of this is because the industry has shifted to 
this "move fast and break things" mantra that even bleeds into areas where 
people say "No, we don't/can't/won't do that". E.g., the Delta outage was 
apparently caused by a CrowdStrike problem. Back in the day, would Delta have 
allowed a third-party tool to be used in such a critical way? I'd say "Probably 
not", or if they did, they would have insisted on being able to test any 
updates well enough to be sure that such an outage was impossible. Nowadays 
that just isn't practical, so it isn't done, and we see the result. I haz a sad.

None of this is the mainframe's fault, of course, which is why I moved this to 
a different topic.

Back in 1989, SABRE had a 12-hour outage that made the front pages. That was 
rare enough that I remember it almost four decades later. At the time, the 
quote was that it cost SABRE $20,000 per minute, a huge deal, ~$15M total.

I had to Google the Delta outage, and not just because I'm old--it's just not 
THAT remarkable any more.

Delta is suing ClownStrike for $550M for the July follies, which is about 1/120 
of their annual revenue. This might actually be about right, since the outage 
lasted five days for them and a third of that figure appears to be actual 
costs, not just lost revenue. Hmm, doing the math--CPI is about 2.6x 1989-2024, 
and the outage was 10x as long as SABRE's: 15*2.6*10=390; Delta claims the lost 
revenue portion is $380M. Amazingly close!

BTW, for those who might be wondering, I was told by someone at SABRE that the 
1989 outage was caused by a rogue TPF job that clipped thousands of volumes 
(see http://catless.ncl.ac.uk/Risks/8.74.html for something that hints at 
this), which predictably made MVS very unhappy. People were basically running 
around with their hair on fire. The VM guy, Mike Roegner, quietly went off and 
wrote a Rexx program to drive re-clipping the packs and the rest of the outage 
was just the time it took to run that program against all those volumes. I of 
course cannot verify this and Mike is long retired; perhaps someone else here 
remembers?

So...how much does "can't go down" even mean any more? Did anyone at Delta lose 
their job over this? Or was the blame just pushed to ClownStrike--convenient, 
if so. One wonders.

Ok, this turned into a bit of a ramble, but it's a topic that I often think 
about!

...phsiii

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
[email protected] with the message: INFO IBM-MAIN
::DISCLAIMER::
________________________________
The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. E-mail transmission is not guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or may contain viruses in transmission. 
The e mail and its contents (with or without referred errors) shall therefore 
not attach any liability on the originator or HCL or its affiliates. Views or 
opinions, if any, presented in this email are solely those of the author and 
may not necessarily reflect the views or opinions of HCL or its affiliates. Any 
form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of authorized representative of HCL is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any email and/or attachments, please check them for 
viruses and other defects.
________________________________

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to