Re: Strange performance issues again

Marcy Cortes Mon, 16 Jul 2007 11:25:32 -0700

Does your VSE have more than 1 virutal cpu? 

Marcy Cortes


"This message may contain confidential and/or privileged information. If
you are not the addressee or authorized to receive this for the
addressee, you must not use, copy, disclose, or take any action based on
this message or any information herein. If you have received this
message in error, please advise the sender immediately by reply e-mail
and delete this message. Thank you for your cooperation."

 

________________________________

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Tim Joyce
Sent: Monday, July 16, 2007 10:40 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] Strange performance issues again


Hey Guys,
 
I am sending this email to both VSE and VM discussion list, so I
apologize for any duplicate posts.
 
Back in March, I started an issue on the discussion lists (Subject
:Strange performance with Batch on z/9 and V=V) that prompted a lot of
discussion. The old issue revolved around performance issues we had when
migrating from a 9672-X27 (335 MIPS) under z/VM 4.3 with production VSE
V=R to a z/9 2096-Q02 (335 MIPS) and z/VM 5.2 but kept our VSE 2.7.1
(with PTFS) running V=V until we were ready to migrate to z/VSE. Because
of the excellent response, both on and off of list, we ended up changing
many things including utilizing MDCache and Migrating from our 2096-Q02
to a 2096-W01. The migration to a UNI was quite dramatic. Many of our
batch run times were cut in half! It seemed that we had solved all our
issues. Life was good, or so it seemed until last month when we noticed
one of our production CICSs transactions timing out around the same time
every day. We looked at all sorts of things and finally determined that
a certain utility job was using up all the resources on the VSE and
causing the CICS to time out while it was running. We have used these
CHECKIP jobs for years to ping various various IP elements in our
network to verify the elements were responding. If a ping fails to a
resource, the job will send an email to the responsible party to
investigate. Due to a scheduler glitch, the CHECKIP job had not run for
a long time. I had our scheduler fix and they started running every 30
min. on June 13th. It seems that when this job runs (at a lower priority
than CACICS) it still causes the SYSTEM CPU to climb enough to stall all
other functions in that VSE machine. We removed the job from scheduler.
We started working with CSI (TCPIP for VSE) to determine why there PING
client was doing this. We then discovered that a MULTI VOLUME DYNVTOC
job would do the same thing. Again this is a utility we have used for
years to print a VTOC of our various VSE attached volumes. After much
testing we found that multi step jobs that quickly jump from step to
step seem to cause excessive system CPU. We have tested the same multi
step jobs on our test VSE and noticed similar performance issues. TMON
(our VSE monitor) will show the portion running the problem jobs use
"normal" CPU, but overall CPU and DELAY will jump much higher and all
other higher priority partitions (including TMON) will drag and even
stall. We normally only see our test VSE machine at 10% or less CPU and
little to no delay. Here are some examples of TMON data we are seeing : 
 
Before DYNVTOC :
 
**Jobname: LSSSTART*********** Job Execution Monitor *********Date:
07/16/2007**
*  Screen: TVSE2601 (+)                                       Time:
9:15:33   *
* Command: __________________________________________________ Cycle:
MMSS      *
*   STATS:  CPU=    .8 DELAY=    .2 PAGING=    .0 IORATE =   7.9
*
*                                        Sorted On: % CPU BUSY
*
*                              %CPU Busy Page Rate I/O Rate  %Pag Fram
*
*   Jobname Ptn AS Pr Phase    <--100--> <---20--> <--200--> <---30-->
Status  *
* _ LSSSTART S3 S3  2 $TMGT01    .56       .00       .00       .35
READY   *
* _          SP       $$A$SUPX   .17       .00       .00     >  1.98
N/A     *
* _ TCPDOSD2 Z1 Z1  8 IPNET      .01       .00       1.49      .22
W-I/O   *
* _ TCPIP    U1 U1  6 IPNET      .01       .00       1.49      .31
W-I/O   *
* _ OPTIWORK S1 S1  2 BSOWMAIN   .01       .00       .00       .02
W-I/O   *
* _ ACCTCICS H1 H1  9 DFHSIP     .01       .00       .00       1.58
W-I/O   *
* _ TCACICS  E2 E2 13 DFHSIP     .01       .00       .00     >  2.01
W-I/O   *
* _ TC24     E1 E1 13 DFHSIP     .01       .00       .00     >  2.10
W-I/O   *
* _ TESTCICS C1 C1 12 DFHSIP     .01       .00       .00     >  2.47
W-I/O   *
* _ VTAMPRTT Z2 Z2  8 M4VAPRNT   .00       .00       .00       .30
W-I/O   *
* _ VPT      Y1 Y1 11 M4VAVPT    .00       .00       .00       .08
W-I/O   *
* _ HMSK3000 T3 T3 10 SOKY300    .00       .00       .00       .06
W-I/O   *
* _ HMSK6000 T2 T2 10 SOKY600    .00       .00       .00       .06
W-I/O   *
* _ HMSK0001 T1 T1 10 SOKY001    .00       .00       .00       .06
W-I/O   *
* _ NETDOS   S4 S4  2 ADARUN     .00       .00       .00       .16
W-I/O   *
* _ FAQSTEST S2 S2  2 FAQSMAIN   .00       .00       .00       .09
W-I/O   *
** Help Information = PF1 *********TMONTEST******** PF Key Assignments =
PA1 ***
 
During DYNVTOC:
 
**Jobname: LSSSTART*********** Job Execution Monitor *********Date:
07/16/2007**
*  Screen: TVSE2601 (+)                                       Time:
13:30:31   *
* Command: __________________________________________________ Cycle:
MMSS      *
*   STATS:  CPU=  19.3 DELAY=  73.8 PAGING=    .0 IORATE = 205.0
*
*                                        Sorted On: % CPU BUSY
*
*                              %CPU Busy Page Rate I/O Rate  %Pag Fram
*
*   Jobname Ptn AS Pr Phase    <--100--> <---20--> <--200--> <---30-->
Status  *
* _ TIMVTOC  P1 P1 24 DYNVTOC  >  16.13    .00       .00       .01
READY   *
* _ TCPIP    U1 U1  6 IPNET      .85       .00     ==>57.88    .31
READY   *
* _          SP       $$A$SUPX   .75       .00       1.39    >  2.02
N/A     *
* _ TESTFTP  F3 3  23 FTP        .50       .00       9.06      .15
READY   *
* _ POWSTART F1 1   2 TESTPOWR   .28       .00       9.76      .28
W-I/O   *
* _ LSSSTART S3 S3  1 $TMGT01    .04       .00       1.39      .35
READY   *
* _ ACCTCICS H1 H1  8 DFHSIP     .03       .00       .00       1.58
W-I/O   *
* _ TC24     E1 E1 13 DFHSIP     .03       .00       .00     >  2.10
W-I/O   *
* _ TESTCICS C1 C1 12 DFHSIP     .03       .00       .00     >  2.65
READY   *
* _ TCACICS  E2 E2 13 DFHSIP     .02       .00       .00     >  2.01
W-I/O   *
* _ TCPDOSD2 Z1 Z1  9 IPNET      .01       .00       .00       .22
W-I/O   *
* _ VPT      Y1 Y1 11 M4VAVPT    .01       .00       .00       .08
W-I/O   *
* _ VTAMSTRT F2 2   3 ISTINCVT   .01       .00       11.85     .46
READY   *
* _ VTAMPRTT Z2 Z2  9 M4VAPRNT   .00       .00       .00       .30
W-I/O   *
* _ HMSK3000 T3 T3 10 SOKY300    .00       .00       .00       .06
W-I/O   *
* _ HMSK6000 T2 T2 10 SOKY600    .00       .00       .00       .06
W-I/O   *
** Help Information = PF1 *********TMONTEST******** PF Key Assignments =
PA1 ***
 
As you can see, when running the DYNVTOC, TMON shows a large amount of
CPU and Delay. The Job seems to run through normally, even quickly, so
the temp CPU demand is not an much of a concern as the fact the job
seems to cause all other resources in the VSE machine to bog down. It
does not matter how low we set the priority of the job. As you can see
above job (DYNVTOC) has a priority of 24, much lower than all the other
active resources. Notice the TMON partition (LSSSTART) has a priority of
1 and still I had to wait 10 to 15 sec., through several "TVSE05005I -
THE REQUESTED DATA IS NOT AVAILABLE. RC= X'0000001C'" messages from TMON
I finally can get a screen back.
 
Since higher priority applications are so bogged down by these multi
step jobs. I feel that the problem lies in either the VSE or VM
dispatcher. Has anyone else seen anything like this?
 
Sorry for the long email .. but I felt I needed to explain with examples
to get the problem across. 
 
Tim
 
~~~~~~~~~~~~~~~~~~~~~~~~

Tim Joyce
Sr. Systems Programmer 
Alex Lee, Inc. 
Email : [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>  
Phone: (828) 725-4448  
Fax: (828) 725-4800

Re: Strange performance issues again

Reply via email to