On Oct 23, 2007, at 2:54 AM, Kenny Fogarty wrote:


How would you handle a person that scrutinizes blood for a living and
mistakes a diagnosis ?
In some case an operator is just as "guilty" as the blood analyzer.
If you say thats not the same, I would agree but not in all

I wouldn't even begin to make the analogy. But, mistakes happen.
That's a fact of life, and, putting an unbearable amount of strain on
someone, as in - make a mistake and you're fired, will not, under any
circumstances, help that person to not make mistakes. In fact, I'd go
as far as to say it would only make things worse.

Well, I am not sure I agree with you about "worse". The operators usually had a great time (especially off day shift) The day shift only had one occurrence after that bad IPL. We (sysprogs) were consulted before any first time commands were entered so we were by the operators side as it was keyed in. Issues? not really the operators enjoyed their jobs and everyone was a little more productive. We did have an issue with delays of tape mounts we ended up buying hardware to monitor those events . The finger pointing was put squarely on the operators. But other than that the operators really liked their job (of course there were one or two exceptions) and everyone went about their own jobs happily. I think that department had the lowest turnover of any.

If an operator put in a wrong date at IPL and (because
of that) RACF refuses to come up and there is no backout or even
worse datasets gets scratched because of the operator error which
leads to a fine from say the SEC (or take you pick of agency).

See all of those issues? All perfectly valid. But, if I were having to
unravel the mess that came about from the wrong input at the console,
the operator would not be the person who should be blamed. There
should be contingency in place so that if RACF refuses to come up, we
get alerted very early on as to why, and have steps in place to remedy
the situation. Perhaps by re-IPL'ing. After all, that's what you're
going to do in 99% of cases if a wrong parameter is passed at IPL
time.

See above reply
If datasets get scratched, where's the back up? What's the contingency
in place to restore the data. If there isn't one, that's not the guy
who entered 'U' on the console instead of 'N''s fault.

Backup in our case was 24 hours ago, I can't speak for the actual company that it happened to.


There are degrees of error of course some are who cares to a possible
company going bankrupt there are in the last case MANY people being
out of work (possibly 1000's or more) would you not fire the person?

If the company went bankrupt, it wouldn't be because someone varied
off the wrong device.

Hmmm well how about this scenario. System A is writing the master file to tape drive d system b varies online the same tape drive and it starts to write to that tape drive you would have a clobbered tape and not know it for some time . If it was discovered during the database load that the tape was no good. The database would not be loaded and the firm could not open the next day. Not far fetched at all.


I think you are comparing apples and oranges. An operator can by mistake put the company out of business, a programmer can cause loss revenue and yes
possibly a fine.

I'd love to see how the wrong prompt on the console was traced back to
the one thing that put the company out of business. Seriously, if
anyone has any stories along those lines, I'd love to hear it. As
would any maker of automation software, because it would be the most
amazing sales pitch ever.

See above and wait to see if the sysprog it happened to will pipe up.

 BUT that should have been found in
QA before the program goes live. In other words their work is checked
by others.

QA can pick up a lot of things, but, for example, can QA pick up an
application program that performs ten million inserts and no commit
into a DB2 table, then, for whatever reason, abend, and have DB2
rollback all its work, thus rendering the objects unavailable for x
hours? I've seen it done. - Didn't make the company go bankrupt
though.

It depends on the company. If its a matter of opening (or not) for daily trading chances are good they will be out of business. If its for a small business I might agree but small business's probably don't have mainframes either.



 An operator does not have this luxury. Yes programmers can
make mistakes but (in most cases) its not a shut the front doors and
turn off the power whoever is the last one to leave. An operator can
do so with a small "oops". That is why an operator, IMO must go
through several years of training so they CAN'T make stupid mistakes.

I agree that console commands are free from any sort of QA, however,
there are ways and means to ensure that mistakes are minimised.
Automation products can help here, or, if they're not available, an
application program can write out WTO or WTOR messages with meaningful
text, which can also help an operator make a decision.

Not always the program that might look at commands before they are executed doesn't come into play until essentially the IPL is over with so there is no way you can catch a bad date.

Training does not, and never will ensure that mistakes are never made.
Training educates, and helps people understand better, but it never,
ever eradicates mistakes from any process.
True but it also makes people think before they act and a thinking operator is a LOT less likely to make errors.


Its possible that a programmer could write a program that
misdiagnoses a test (health) result and yes that could lead to the
persons death, but presumably there are other fingers in the stew to
catch the errors.

I agree with that, and, broadly, that's the point I was trying to
make. There should be enough tech support/ops support/sys progs around
to see what went wrong, and implement some sort of contingency to
rectify the mistake with the minimum of outage/cost to the company, be
that restoring data, re-IPLing a system, or whatever.

In the case of an operator there is no way to catch all errors that could cause a major issue.

There are ways to catch all operator entries from the console via
various automation products which can interrogate what has been
entered, and take appropriate measures.

NOT early on in the IPL.

Catching a Vary is a small part of any possible error. Catching a bad date at lets say early on in the IPL process is impossible by any of the suggestions
mentioned as the exits (programs) are not available then.

I agree, but, if the wrong date, or IPL parm, or whatever is entered,
then the chances are you're going to have to re-IPL to rectify the
situation. As you said above, if RACF doesn't start, you can go back
to see why, and take steps to fix the issue.

There must always be contingency plans in place to catch human errors,
but, to go back to the original point, sacking someone for entering
the wrong reply is not, and never has been the answer. It reads (to me
at least) pretty much as "They (operators) are easily replaced, sack
him".


A destroyed (or damaged) RACF database (or ACF2 or whatever) could make completion of the IPL sequence impossible.

Ed

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to