Re: handling of errors

Victor Gil Thu, 17 Feb 2011 08:54:27 -0800

Frank,

Error handling becomes much easier once you have a proper infrastructure to 
report errors.  Years ago I've convinced our CICS systems programmers to add 
the following line to the standard CICS JCL


//PTRACE    DD SYSOUT=*,DCB=(RECFM=V,BLKSIZE=137,DSORG=PS)

This DD is pointed to by the definition of an extrapartition TDQ, PTRC, so when 
a program issues WRITEQ TD Q(PTRC) the output goes into the above SYSOUT.
This is where we dump our program traces, informational messages and 
basically anything we want, [Eventually, this SYSOUT has became so 
important in debugging various application issues that now it even gets 
archived along with the CICS jobs] 

Over the years we've built various automation rules, like if a message has a 
specific prefix it is echoed to MSGUSR and some of them even become alerts 
and go to the console.

Then we've added a "standard" subroutine that would decipher both EIBFN and 
EIBRESP fields into a human-readable format and log them to the TDQ along 
with the whole EIB block for further analysis.         

So, now the application code does just this:

     EXEC CICS LINK PROGRAM(ACTION)              
          COMMAREA(ACTION-COMMAREA)           
          LENGTH  (LENGTH OF ACTION-COMMAREA)         
          NOHANDLE                                           
     END-EXEC.                                               
*                                                            
     IF EIBRESP NOT = DFHRESP(NORMAL)                        
        PERFORM SHOW-CICS-ERROR                              
        STRING MY-PGM-NAME                                   
            ' - Unable to LINK to =' ACTION
          BLANK-PAD DELIMITED BY SIZE INTO MESSAGE-TO-LOG    
        PERFORM LOG-AND-QUIT                                
     END-IF                                                  

where SHOW-CICS-ERROR is the place where we call the above "standard" 
subroutine. 

Notice, that in this case we perform paragraph LOG-AND-QUIT which 
terminates the task normally. However, in other cases the application may 
want to LOG-AND-DIE [i.e. ABEND with a specific abend code], or just LOG-
MESSAGE and keep going [retry in 1 sec, send input to "error queue", etc.]   

This very approach is also used for abend handling, and here we always 
produce a transaction dump, because the CICS itself won't dump when an 
abend is HANDLEd.   

HTH,
-Victor-

On Wed, 16 Feb 2011 16:30:18 -0700, Frank Swarbrick 
<frank.swarbr...@efirstbank.com> wrote:

>>>> On 2/16/2011 at 8:00 AM, in message
><listserv%201102160900125806.0...@bama.ua.edu>, Tom Marchant
><m42tom-ibmm...@yahoo.com> wrote:
>> On Tue, 15 Feb 2011 17:53:30 -0700, Frank Swarbrick wrote:
>>
>>>In my 15 years of application programming I've always hated the need
>>>to check 'status result' fields for conditions that, well, should not occur.
>>
>>
>> The point of those return codes is that those conditions *do* occur.
>> Checking them is part of the job of programming.  The reason z/OS
>> is such a robust system is because of all that error checking.
>
>I'm not suggesting (god forbid!) that status codes not be checked.  I am 
suggesting that status codes not be used at all; rather exceptions should be 
generated that cause everything to come to a halt, with some useful 
messages and a useful dump.  That way the users of routines need not be 
bothered with conditions that it is no possibility of handling, other than 
notifying the user of the unexpected condition and terminating.
>
>>>Most (not all, of course!) of the time its going to be the case that
>>>the application is simply going to have some code to check for
>>>expected status conditions, and if the status is not expected print
>>>out an error message and exit (hopefully with a 'non-successful'
>>>exit code of some sort) or force an abend.  So why not just allow
>>>the invoked routine to abend itself?
>>
>> The invoked routine can only protect itself.  It can not issue a message
>> that includes the context in which it is called.  That context is necessary
>> when the code is to be changed to deal with the situation.
>
>Running under LE, in any case, you get both.  You know the routine that 
recognised the condition and you know the routine that called the routine 
where the condition occured.  An LE dump, especially with symbolics, is a 
thing of elegance and beauty.
>I can't speak for non LE dumps.
>
>>>For batch I will then have a global USRHDLR that will query the
>>>operator as to what they want to do
>>
>> How is the operator going to know what to do?  Will you provide
>> sufficient documentation so that he can make a meaningful decision?
>> If you can provide proper documentation, you can also code it to not
>> require operator intervention.  IMO, asking the operator is the lazy
>> way out.  In 1970, I had to have a very good reason to ask for an
>> operator reply, especially from a Cobol program, and we only had
>> three jobs running on the system at a time.  Today it is not unusual
>> to have hundreds.
>
>The operator probably will not know what to do.  They will call the on-call 
application support person who will analyze the issue and instruct the operator 
on what to do.
>
>Let me give a very specific example.  A change to a program has been made 
to access a new file.  The JCL with the new DD for whatever reason was not 
implemented.  When the program runs and attempts to open the file the open 
fails.  What kind of automatic recovery can resolve this issue?  None, of 
course.  Therefore the application must do *something*.  So what are the 
somethings?
>
>1) Issue a message to the console and terminate with a return code 
indicating the job did not complete successfully.
>2) Issue a message and abend.
>3) Issue a message and don't set a return-code, thus making it appear the 
job completed successfully.  (Not recommended!!  <g>)
>4) Do any of the above but after issuing the message wait for an operator 
reply.
>
>The only reason I'm suggesting operator intervention prior to the job 
completing (abnormally) is that you can give an opportunity to actually 
continue, if it makes sense to continue, or to give the opportunity
>
>Let me give an example of why you may want to have the OPTION to 
continue in the case of a particular condition.  We have a job that should not 
run until after 3:30pm.  If the program detects that this rule has been 
violated 
(this was put in place before we had a scheduler!) and queries the operator 
whether to continue or to abort.  My point is, there are some situations where 
a decision needs to be made, and that decision cannot be made by a 
computer.  It must be made by a human who has analyzed the situation.  So 
rather than having to code all of the logic to write the message, receive the 
response, and act on the response in each program why not simply signal an 
exception and have the global exception handler write the exception, wait for 
a response, and act on the response.
>
>>>By doing all of this it seems to me the applications will need a lot
>>>less checking for errors from which the application cannot recover anyway.
>>
>> It sounds to me as if you are saying that you want to simplify your
>> code by removing meaningful error messages and replacing them
>> with a dump when unexpected situations occur.  IMO, this is a
>> mistake.
>
>I'm not saying remove meaningful error messages.  I am saying signal an 
appropriate exception condition which is related to a meaningful error message.
>
>>>what point is there of having the routine return back to your code
>>>when all your code is going to do is say "unexpected error in call
>>>to xxx" and terminate/abend.
>>
>> If that is all your code is going to do, without even giving the return
>> code that was issued or where the call was made, there isn't much
>> point.  In the end you'll spend more time looking at dumps than you
>> would have spent coding it correctly in the first place.
>
>What?  Of course it should have been coded properly in the first place.  The 
point is to handle, without a lot of extra redundant coding, the "unexpected".  
I don't expect that I will forget to put a DD in my JCL.
>
>>>would this not make life simpler?  And in most cases no less robust.
>>
>> Simpler to code, yes, because the code is less robust.  And more
>> difficult to diagnose errors.  It's ok.  I'm here to develop tools to
>> make it easier for you to figure out what went wrong.
>>
>> If you are looking for a justification for lazy programming, you won't
>> get it from me.  I started in this business over 40 years ago as an
>> applications programmer, coding mostly in Cobol with a bit of
>> assembler.  Validating data before using them and checking return
>> codes was always part of the job.
>
>I'm not looking to justify lazy programming.  I am looking to simplify 
programming.
>
>> Today, I am doing software development in assembler and checking
>> for error conditions is still a big part of the job.  When I neglect to
>> check for a condition, it usually causes problems.
>
>I guess I wasn't clear.  I am certainly not saying that you should not check 
for conditions.  I am saying that there are many cases where it make sense for 
the routine that detects the condition to "generate an exception" that can be 
handled in a general manner, rather than having each program that uses the 
routine having to check to see if his call failed and then, well, doing 
something 
so that the issue can be addressed.  If the program can usefully continue to 
process when an exception occurs then certainly it should do so.  My point is 
that there shouldn't have to be a lot of things like this...
>
>CALL ROUTINE USING PARM1, PARM2, PARM3, RC
>IF RC-SUCCESSFUL OR RC-EXPECTED-CONDITION
>    CONTINUE
>ELSE
>    DISPLAY 'An unexpected condtion ' RC ' occured in the call to ROUTINE' 
UPON CONSOLE
>    MOVE 16 TO RETURN-CODE
>    STOP RUN
>END-IF
>
>or worse
>EVALUATE TRUE
>WHEN RC-SUCCESSFUL
>    CONTINUE
>WHEN RC-EXPECTED-CONDITION
>   . ..DO SOMETHING...
>WHEN RC = 1
>    DISPLAY 'Invalid PARM1 in call to ROUTINE: ' PARM1 UPON CONSOLE
>    MOVE 16 TO RETURN-CODE
>    STOP RUN
>WHEN RC = 2
>    DISPLAY 'Invalid PARM2 in call to ROUTINE: ' PARM2 UPON CONSOLE
>    MOVE 16 TO RETURN-CODE
>    STOP RUN
>...ETC, ETC...
>WHEN OTHER
>    DISPLAY 'An unexpected condtion ' RC ' occured in the call to ROUTINE' 
UPON CONSOLE
>    MOVE 16 TO RETURN-CODE
>    STOP RUN
>END-EVALAUTE
>
>If I called ROUTINE with an invalid PARM1 wouldn't it be better if the routine 
that knows what is valid and what is not simply produce a useful error 
message and then terminate, perhaps with a dump that can be analyzed if the 
message itself doesn't tell the whole story?
>
>Frank
>
>--
>
>Frank Swarbrick
>Applications Architect - Mainframe Applications Development
>FirstBank Data Corporation - Lakewood, CO  USA
>P: 303-235-1403
>
>
>
>
>The information contained in this electronic communication and any 
document attached hereto or transmitted herewith is confidential and intended 
for the exclusive use of the individual or entity named above.  If the reader 
of 
this message is not the intended recipient or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified 
that any examination, use, dissemination, distribution or copying of this 
communication or any part thereof is strictly prohibited.  If you have received 
this communication in error, please immediately notify the sender by reply e-
mail and destroy this communication.  Thank you.
>
>----------------------------------------------------------------------
>For IBM-MAIN subscribe / signoff / archive access instructions,
>send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
>Search the archives at http://bama.ua.edu/archives/ibm-main.html

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: handling of errors

Reply via email to