Re: VM lockup due to storage typo

Lee Stewart Tue, 15 Sep 2009 13:55:45 -0700

From the tn3270 sessions hanging to the phone call to me - 2-3 minutes.From then till we decided we had to IPL - maybe 15-20 minutes. But 30minutes (maybe 45-60 till all the apps were back up) on a major onlinesystem is a lot. It was 35 minutes from the message capping thevirtual storage at 8TB till the IPL time from Q CPLEVEL. So no, notlong considering the size. And yes, I suspect it would PGT004 eventually.

And yes, if CP unceremoniously chopped my wrong size from 9.7TB to 8TB,why could it not do the same to either a user specified system limit ora "this is the biggest machine this CP can run in this configuration"...


Lee

Gentry, Stephen wrote:

What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-----Original Message-----
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem.One of Alan's first precepts of information security and integrity is
that the guest cannot be allowed to harm the CP. This clearly violates
that.
Regards,Richard Schuh
-----Original Message-----
From: The IBM z/VM Operating System[mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
Sent: Tuesday, September 15, 2009 9:19 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo
CP wouldn't know at IPL time, the guest would, not could, butwould cause such harm.
Just because you say you can use xxx GB, doesn't mean youwould actually use them.
When page fills, it over flows to spool.
When spool fills, CP abends on the next pageout.

Tom Duerbusch
THD Consulting
Marcy Cortes <marcy.d.cor...@wellsfargo.com> 9/15/2009
11:02 AM >>>
See a thread on this list with subject "Sanity check?" fromOct 2007 for what happened when I did the same thing ;)
You probably filled page space.
I still think IBM should refuse to IPL a guest that willcause such harm.
Marcy"This message may contain confidential and/or privilegedinformation. If you are not the addressee or authorized toreceive this for the addressee, you must not use, copy,disclose, or take any action based on this message or anyinformation herein. If you have received this message inerror, please advise the sender immediately by reply e-mailand delete this message. Thank you for your cooperation."
-----Original Message-----
From: The IBM z/VM Operating System[mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo
Does anyone have an idea of how we might have gotten out ofthis without an IPL?
VM LPAR has 175G of memory and a flock of Linux Oracle guests...Several guests needed more memory added so the directory wasupdated and one by one the guests shutdown, logged off andback on. So far, so good.
But... In changing the memory for many guests, and it beinglate at night after a long day, while meaning to set aguest's memory to 9728M, it got set to 9728G. When thatguest was cycled we see the message on the console that it'smemory was limited to 8TB (HCPLGN093E), then the VM systemappeared to freeze.
We couldn't get in via TCP/IP, or the HMC Operating SystemMessages screen, or the HMC Integrated 3270.
Finally had to IPL. Even that was wierd as I'd haveexpected the LoadNormal to shutdown, it just IPLed. We did NoAutolog, fixed the typoand all came back up ok...
I suspect CP was scrambling paging everything in the worldout as Linuxtried to initialize that 8TB of memory... But I'm surprisedI couldn'teven get into the HMC consoles (to kill just that one guestas opposed to all of them)..
Any thoughts?
Lee
--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.comWeb: www.siriuscom.com


--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com

Re: VM lockup due to storage typo

Reply via email to