Re: Detect the loop for batch job

Joel C. Ewing Sat, 05 Dec 2009 09:19:04 -0800

On 12/04/2009 10:14 AM, bjbxd wrote:
> Hello List,
> We are looking for a tool to detect the loop for batch application,
> any suggestion are appreciated.
> 
> My shop is runing z/OS, application is C/C++.
> Bob.


Considering the speed of today's processors, all batch programs that run
for more than a few seconds must of necessity contain one or more logic
loops; so I assume the intended question is how do you determine if the
program is in a non-productive loop.

In general making this determination automatically by any tool without
knowing something about the expected or historic behaviour of the
program is an impossible task.  And in the practical world it's not just
applications that will never terminate that are a problem, but also
those which cost more in resources than the end user can afford. You
want to catch a batch job step that is badly designed or poorly tuned
that consumes an order of magnitude more CPU than required, as this is
also a problem even if it may not be in an infinite loop.

If the general issue is one of jobs wasting resources (which with
sub-capacity licensing may cost real money), then the simplest first
step is to impose and enforce standards requiring reasonable CPU TIME
limits on jobs and job steps in JCL, and possibly also require OUTLIM to
restrict SYSOUT loops in testing (needless to say "NOLIMIT" for CPU time
should not be allowed for any batch job).  Different default limits can
be set for testing vs production via JES2 definitions based on job
classes, and it is possible to use an IEFUTL exit to provide for
unanticipated application growth by allowing Operators the option of
granting CPU time extensions or cancelling production jobs that reach
the limit based on job class.  JCL overrides for higher limits could be
allowed for specific job steps that have known higher requirements where
the cost is acceptable to the end user.

Rate of CPU consumption by itself is an unreliable indicator of
problems.  A single-threaded program would be limited to 100% of one CP,
but even a solid, infinite CPU loop could show up as a much lower value
on a loaded system, and some very efficiently-designed,
computationally-intense programs might be able to approach 100% of a CP
on a lightly loaded system and still be doing productive work.

Some very simple tools, like SDSF DA display, that show both CPU and
EXCP resources used, are sometime sufficient to provide clues.  If the
program is known to require periodic I/O to do anything useful and it is
consuming an unusually amount of CPU time with no EXCPs, that would be
strongly suggestive of a problem; or if the program is generating much
more SYSOUT than usual or repetitive SYSOUT lines, again a likely
problem.

If the program is using both CPU and EXCPs, but a lot more than expected
and no other obvious perverse behaviour, it is more difficult to make a
determination.  Other tools, like Omegamon, that show EXCPs on specific
DDs in the job step may allow one to see if the program is continuing to
progress through sequential data and if the total number of blocks is
known may allow you to estimate if it will complete in an acceptable
time and at what total cost.  Or EXCPs on a file way in excess of the
total number of blocks in the file may point to a poorly tuned or poorly
designed application.

In our experience, there is no substitute for having human intelligence
in the monitoring loop when resources get tight.


-- 
Joel C. Ewing, Fort Smith, AR        jremoveccapsew...@acm.org

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Detect the loop for batch job

Reply via email to