Does your VSE have more than 1 virutal cpu? Marcy Cortes
"This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." ________________________________ From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Tim Joyce Sent: Monday, July 16, 2007 10:40 AM To: IBMVM@LISTSERV.UARK.EDU Subject: [IBMVM] Strange performance issues again Hey Guys, I am sending this email to both VSE and VM discussion list, so I apologize for any duplicate posts. Back in March, I started an issue on the discussion lists (Subject :Strange performance with Batch on z/9 and V=V) that prompted a lot of discussion. The old issue revolved around performance issues we had when migrating from a 9672-X27 (335 MIPS) under z/VM 4.3 with production VSE V=R to a z/9 2096-Q02 (335 MIPS) and z/VM 5.2 but kept our VSE 2.7.1 (with PTFS) running V=V until we were ready to migrate to z/VSE. Because of the excellent response, both on and off of list, we ended up changing many things including utilizing MDCache and Migrating from our 2096-Q02 to a 2096-W01. The migration to a UNI was quite dramatic. Many of our batch run times were cut in half! It seemed that we had solved all our issues. Life was good, or so it seemed until last month when we noticed one of our production CICSs transactions timing out around the same time every day. We looked at all sorts of things and finally determined that a certain utility job was using up all the resources on the VSE and causing the CICS to time out while it was running. We have used these CHECKIP jobs for years to ping various various IP elements in our network to verify the elements were responding. If a ping fails to a resource, the job will send an email to the responsible party to investigate. Due to a scheduler glitch, the CHECKIP job had not run for a long time. I had our scheduler fix and they started running every 30 min. on June 13th. It seems that when this job runs (at a lower priority than CACICS) it still causes the SYSTEM CPU to climb enough to stall all other functions in that VSE machine. We removed the job from scheduler. We started working with CSI (TCPIP for VSE) to determine why there PING client was doing this. We then discovered that a MULTI VOLUME DYNVTOC job would do the same thing. Again this is a utility we have used for years to print a VTOC of our various VSE attached volumes. After much testing we found that multi step jobs that quickly jump from step to step seem to cause excessive system CPU. We have tested the same multi step jobs on our test VSE and noticed similar performance issues. TMON (our VSE monitor) will show the portion running the problem jobs use "normal" CPU, but overall CPU and DELAY will jump much higher and all other higher priority partitions (including TMON) will drag and even stall. We normally only see our test VSE machine at 10% or less CPU and little to no delay. Here are some examples of TMON data we are seeing : Before DYNVTOC : **Jobname: LSSSTART*********** Job Execution Monitor *********Date: 07/16/2007** * Screen: TVSE2601 (+) Time: 9:15:33 * * Command: __________________________________________________ Cycle: MMSS * * STATS: CPU= .8 DELAY= .2 PAGING= .0 IORATE = 7.9 * * Sorted On: % CPU BUSY * * %CPU Busy Page Rate I/O Rate %Pag Fram * * Jobname Ptn AS Pr Phase <--100--> <---20--> <--200--> <---30--> Status * * _ LSSSTART S3 S3 2 $TMGT01 .56 .00 .00 .35 READY * * _ SP $$A$SUPX .17 .00 .00 > 1.98 N/A * * _ TCPDOSD2 Z1 Z1 8 IPNET .01 .00 1.49 .22 W-I/O * * _ TCPIP U1 U1 6 IPNET .01 .00 1.49 .31 W-I/O * * _ OPTIWORK S1 S1 2 BSOWMAIN .01 .00 .00 .02 W-I/O * * _ ACCTCICS H1 H1 9 DFHSIP .01 .00 .00 1.58 W-I/O * * _ TCACICS E2 E2 13 DFHSIP .01 .00 .00 > 2.01 W-I/O * * _ TC24 E1 E1 13 DFHSIP .01 .00 .00 > 2.10 W-I/O * * _ TESTCICS C1 C1 12 DFHSIP .01 .00 .00 > 2.47 W-I/O * * _ VTAMPRTT Z2 Z2 8 M4VAPRNT .00 .00 .00 .30 W-I/O * * _ VPT Y1 Y1 11 M4VAVPT .00 .00 .00 .08 W-I/O * * _ HMSK3000 T3 T3 10 SOKY300 .00 .00 .00 .06 W-I/O * * _ HMSK6000 T2 T2 10 SOKY600 .00 .00 .00 .06 W-I/O * * _ HMSK0001 T1 T1 10 SOKY001 .00 .00 .00 .06 W-I/O * * _ NETDOS S4 S4 2 ADARUN .00 .00 .00 .16 W-I/O * * _ FAQSTEST S2 S2 2 FAQSMAIN .00 .00 .00 .09 W-I/O * ** Help Information = PF1 *********TMONTEST******** PF Key Assignments = PA1 *** During DYNVTOC: **Jobname: LSSSTART*********** Job Execution Monitor *********Date: 07/16/2007** * Screen: TVSE2601 (+) Time: 13:30:31 * * Command: __________________________________________________ Cycle: MMSS * * STATS: CPU= 19.3 DELAY= 73.8 PAGING= .0 IORATE = 205.0 * * Sorted On: % CPU BUSY * * %CPU Busy Page Rate I/O Rate %Pag Fram * * Jobname Ptn AS Pr Phase <--100--> <---20--> <--200--> <---30--> Status * * _ TIMVTOC P1 P1 24 DYNVTOC > 16.13 .00 .00 .01 READY * * _ TCPIP U1 U1 6 IPNET .85 .00 ==>57.88 .31 READY * * _ SP $$A$SUPX .75 .00 1.39 > 2.02 N/A * * _ TESTFTP F3 3 23 FTP .50 .00 9.06 .15 READY * * _ POWSTART F1 1 2 TESTPOWR .28 .00 9.76 .28 W-I/O * * _ LSSSTART S3 S3 1 $TMGT01 .04 .00 1.39 .35 READY * * _ ACCTCICS H1 H1 8 DFHSIP .03 .00 .00 1.58 W-I/O * * _ TC24 E1 E1 13 DFHSIP .03 .00 .00 > 2.10 W-I/O * * _ TESTCICS C1 C1 12 DFHSIP .03 .00 .00 > 2.65 READY * * _ TCACICS E2 E2 13 DFHSIP .02 .00 .00 > 2.01 W-I/O * * _ TCPDOSD2 Z1 Z1 9 IPNET .01 .00 .00 .22 W-I/O * * _ VPT Y1 Y1 11 M4VAVPT .01 .00 .00 .08 W-I/O * * _ VTAMSTRT F2 2 3 ISTINCVT .01 .00 11.85 .46 READY * * _ VTAMPRTT Z2 Z2 9 M4VAPRNT .00 .00 .00 .30 W-I/O * * _ HMSK3000 T3 T3 10 SOKY300 .00 .00 .00 .06 W-I/O * * _ HMSK6000 T2 T2 10 SOKY600 .00 .00 .00 .06 W-I/O * ** Help Information = PF1 *********TMONTEST******** PF Key Assignments = PA1 *** As you can see, when running the DYNVTOC, TMON shows a large amount of CPU and Delay. The Job seems to run through normally, even quickly, so the temp CPU demand is not an much of a concern as the fact the job seems to cause all other resources in the VSE machine to bog down. It does not matter how low we set the priority of the job. As you can see above job (DYNVTOC) has a priority of 24, much lower than all the other active resources. Notice the TMON partition (LSSSTART) has a priority of 1 and still I had to wait 10 to 15 sec., through several "TVSE05005I - THE REQUESTED DATA IS NOT AVAILABLE. RC= X'0000001C'" messages from TMON I finally can get a screen back. Since higher priority applications are so bogged down by these multi step jobs. I feel that the problem lies in either the VSE or VM dispatcher. Has anyone else seen anything like this? Sorry for the long email .. but I felt I needed to explain with examples to get the problem across. Tim ~~~~~~~~~~~~~~~~~~~~~~~~ Tim Joyce Sr. Systems Programmer Alex Lee, Inc. Email : [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> Phone: (828) 725-4448 Fax: (828) 725-4800