Why not simply fix the problems in the power systems, and test them regularly? If this had been the case in the last 3 places I have worked, I would have been escorted off the premises, and my stuff thrown out after me.



Kelly Bert Manning wrote:
Please don't laugh.

I work with applications on a non-sysplex and non-xrf, supported, z/OS
where there have been 3 cases of UPS batteries draining flat, followed by uncontrolled server crashes, in the past 17 years.

They all happened in October and November, gale season (Cue background
music with the "Gales of November" line by Gordon Lightfoot)

After the first one the data center operator said that they would consider
giving operators authority to shut down OS/390 if they were unable to
make immediate contact with the "Duty Manager" after discovering that
UPS batteries were draining during a power failure and that generator
power was not available or failed after starting.

Four weeks later a carbon copy crash occurred, inspriring a promise that
operators would start draining CICS and IMS message queues and stopping
and rolling back BMPs and DB2 online jobs, while there was still power
in batteries.

Roll forward to this decade, power off during gale season, generators
start, but one fails and goes offline, followed by other mayhem in the
power hardware. Back on batteries for 22 minutes, until they drain and
the z server crashes. Current operator says "what promise to shut
everything down cleanly before the batteries drain?".

Is 22 minutes an unreasonable time figure for purging IMS messaqe
queues, bringing down CICS regions, draining initiators, and abending
and rolling back online IMS and DB2 jobs to the last checkpoint, swapping logs, writing and dismounting log backups and turning off power before sudden power loss starts to play mayhem with disk and other hardware?

Oh did I mention, the 2 CPU single processor was only about 30% busy at the
time, the Sunday weekly low CPU use period.

We had a different sort of power outage after the first of the 2 crashes
last decade. Somebody working for one of the potential bidders used
a metal tape measure in an attempt to measure clearance around the
power cable entrance to the building. The resulting demonstration of
how much power moves through the space around a high voltage cable
destroyed several 3380 clone drives, in addition to crashing all
the OS/390 processors. I earned my DBA pay that day.

Bottom line, what should happen when UPS batteries start to drain and
there is no prospect of reliable, high quality, utility power being
restored quickly? Leave it up and roll the dice about losing work
in progress and log data (head crashes and cache controller microcode
bugs) or shut it down cleanly?

For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to