Re: VM lockup due to storage typo

2009-09-22 Thread David Boyes
 I don't think the analogy to a ping attack is a particularly fair
 one.  Yes, from the perspective of an innocent third user, they
 look the same, perhaps, but they aren't.  

??? In both cases, normal function of the innocent guest is disrupted by a 
force beyond it's control through no fault of it's own. The function is 
disrupted by a lack of shared resources available to the innocent guest due 
to trying to service what appears to be legitimate resource requests to 
another  theoretically innocent guest. 

 If the attack were made
 through some sort of security gate that defaults to closed state
 which the sysadmin had accidentally opened and left open, I think
 that would  be a more fair analogy.  Quibbling over details,
 perhaps, but there is an important difference.

Network floods have nothing innately to do with security states. You can 
produce exactly the same effect within a local segment with no outside 
connection, FW or any other security gates involved (misconfigure any DECnet 
device that boots via MOP and see what happens), so I don't see the subtle 
difference here -- one device banging out traffic without regard for other 
systems on the same network segment starves access to the other systems on the 
same segment, denying them the ability to function normally. Barks like a duck, 
swims like a duck, it'll do for duck soup, as a friend of mine says.  

But, as you say, let's concentrate on fixing the problem, not blaming the 
symptoms. 


Re: VM lockup due to storage typo

2009-09-21 Thread Bill Holder
I don't think the analogy to a ping attack is a particularly fair 
one.  Yes, from the perspective of an innocent third user, they 
look the same, perhaps, but they aren't.  If the attack were made 
through some sort of security gate that defaults to closed state 
which the sysadmin had accidentally opened and left open, I think 
that would  be a more fair analogy.  Quibbling over details, 
perhaps, but there is an important difference.  

On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes dbo...@sinenomine.net w
rote:

On 9/18/09 9:32 AM, Bill Holder hold...@us.ibm.com wrote:

 That is indeed one important question, but there was another one, the
 question of whether this was a denial of service attack exposure, whic
h i
 t
 is not.  

I think that's a point of view question.

If I am another user on the same VM system, happy within my cozy little
class G box, and the hypervisor admin does something outside of my contr
ol
to some OTHER user that causes CP to choke, then from the original user'
s
perspective it IS a DOS attack because it's something that is out of my
control, starves ME, and causes ME to choke without reason.

An analagous parallel case in the distributed system world would be a pi
ng
flood attack on a network segment. The innocent get hurt along with the
intended target by being starved of access to the network, and thus lose
 the
ability to function according to design.

From the hypervisor admin's POV, then yeah, it's just doing what it's to
ld
to do. It's correct operation, working as documented.

I think Bill Schuh and Marcy and myself are arguing for the former
viewpoint. I think you and Adam are arguing from the latter view.

 I'm not disagreeing that it would be nice if there were some sor
 t
 of are you sure safety net before the system proceeded to try to do
 something suicidal, but that's a design and requirements question, not
 a
 defect question.

I think we're all in violent agreement on that point. Now, the question 
is
what is the best way to put a safety on that gun? 

=
===


Re: VM lockup due to storage typo

2009-09-18 Thread Bill Holder
That is indeed one important question, but there was another one, the
question of whether this was a denial of service attack exposure, which i
t
is not.  I'm not disagreeing that it would be nice if there were some sor
t
of are you sure safety net before the system proceeded to try to do
something suicidal, but that's a design and requirements question, not a
defect question.

- Bill Holder, z/VM Development, IBM

On Thu, 17 Sep 2009 17:36:44 -0400, David Boyes dbo...@sinenomine.net w
rote:

On 9/17/09 2:16 PM, Adam Thornton athorn...@sinenomine.net wrote:

 
 Administrator typo is not a failure mode the operating system is
 designed to protect you from.

That may be true now, but I think the point of the argument is that it
should not be. 

On VMS, if you have a SYSTEM priv bit set, the system will still warn yo
u if
you're about to do something that seems stupid. If there is an architect
ed
limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a
problem), then it's not too unreasonable for the system to take defensiv
e
measures and issue a warning that all is not right in in the kingdom of
Denmark, cream or no cream dresses.

It seems like a basic defense that if CP notices you starting something 
that
it KNOWS it may not have resources to complete, requiring confirmation t
hat
you know what you're doing (or about to do) is a good defensive measure.


Did the system do what you told it to do when you told it to do it? Yes.

Whether it should march off a cliff without at least questioning the ord
er
is the question at hand.

-- db

=
===


Re: VM lockup due to storage typo

2009-09-18 Thread Huegel, Thomas
A little OT, but curiosity calls.. What is the max. storage that z/LINUX
can use? 

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of David Boyes
Sent: Thursday, September 17, 2009 4:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

On 9/17/09 2:16 PM, Adam Thornton athorn...@sinenomine.net wrote:

 
 Administrator typo is not a failure mode the operating system is 
 designed to protect you from.

That may be true now, but I think the point of the argument is that it
should not be. 

On VMS, if you have a SYSTEM priv bit set, the system will still warn
you if you're about to do something that seems stupid. If there is an
architected limit (note that the 9.7TB got clipped to 8TB, so SOMETHING
noticed a problem), then it's not too unreasonable for the system to
take defensive measures and issue a warning that all is not right in in
the kingdom of Denmark, cream or no cream dresses.

It seems like a basic defense that if CP notices you starting something
that it KNOWS it may not have resources to complete, requiring
confirmation that you know what you're doing (or about to do) is a good
defensive measure.

Did the system do what you told it to do when you told it to do it? Yes.
Whether it should march off a cliff without at least questioning the
order is the question at hand.

-- db


Re: VM lockup due to storage typo

2009-09-18 Thread Bill Holder
I see this as three separate questions (with my answers):

Is it a denial of service attack exposure?
- Clearly not.

Is it a defect?
- I don't believe so, for the base issue of whether VM
  should allow a privileged user do do something destructive,
  though there may well be defects or scalability / constraint
  shortcomings exposed by the hang (we'd need to see a dump to
  understand what's really happening).

Is this an area ripe for improvement, could/should VM be
smarter about preventing a privileged from doing something
dangerous or destructive?
- Sure.  I won't tell you not to open a requirement.  

- Bill Holder, z/VM Development, IBM


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 9:32 AM, Bill Holder hold...@us.ibm.com wrote:

 That is indeed one important question, but there was another one, the
 question of whether this was a denial of service attack exposure, which i
 t
 is not.  

I think that's a point of view question.

If I am another user on the same VM system, happy within my cozy little
class G box, and the hypervisor admin does something outside of my control
to some OTHER user that causes CP to choke, then from the original user's
perspective it IS a DOS attack because it's something that is out of my
control, starves ME, and causes ME to choke without reason.

An analagous parallel case in the distributed system world would be a ping
flood attack on a network segment. The innocent get hurt along with the
intended target by being starved of access to the network, and thus lose the
ability to function according to design.

From the hypervisor admin's POV, then yeah, it's just doing what it's told
to do. It's correct operation, working as documented.

I think Bill Schuh and Marcy and myself are arguing for the former
viewpoint. I think you and Adam are arguing from the latter view.

 I'm not disagreeing that it would be nice if there were some sor
 t
 of are you sure safety net before the system proceeded to try to do
 something suicidal, but that's a design and requirements question, not a
 defect question.

I think we're all in violent agreement on that point. Now, the question is
what is the best way to put a safety on that gun? 


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 9:38 AM, Huegel, Thomas thue...@kable.com wrote:

 A little OT, but curiosity calls.. What is the max. storage that z/LINUX
 can use? 

Last time I looked at the Linux memory management code (a while back) it was
4TB, but that's probably expanded by now. The documented z/VM limit of 8TB
has been around for a while; I think that appeared in 5.2.


Re: VM lockup due to storage typo

2009-09-18 Thread Adam Thornton

On Sep 18, 2009, at 9:11 AM, David Boyes wrote:



I think we're all in violent agreement on that point. Now, the  
question is

what is the best way to put a safety on that gun?


Oooh!  Oooh!  Pick me!  Mandatory User Access Control dialog boxes  
that pop up and make you click OK any time you want to breathe.


Adam


Re: VM lockup due to storage typo

2009-09-18 Thread Rob van der Heij
On Fri, Sep 18, 2009 at 4:11 PM, David Boyes dbo...@sinenomine.net wrote:

 I think we're all in violent agreement on that point. Now, the question is
 what is the best way to put a safety on that gun?

IMHO the suggested solutions so far merely bend the barrel upwards.
This may deflect the bullet from your own foot in some usage
scenarios, but likely hurts other feet and makes the thing in general
hard to aim ;-)

Rob


Re: VM lockup due to storage typo

2009-09-18 Thread Brian Nielsen
On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes dbo...@sinenomine.net 

wrote:

I think we're all in violent agreement on that point. Now, the question 
is
what is the best way to put a safety on that gun? 

Poetic Justice
Since the Linux OOM model is to kill a process, just kill some Linux 
virtual machine to free up space...
/Poetic Justice

Brian Nielsen


Re: VM lockup due to storage typo

2009-09-18 Thread Bill Holder
On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes dbo...@sinenomine.net w
rote:

...
I think we're all in violent agreement on that point. Now, the question 
is
what is the best way to put a safety on that gun? 

=
===

Is this a procedural or technical implementation question (or both)?  

For the former, I'd say a requirement is appropriate.  For the latter, 

let's have at it.  :)


Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that name 
:-)

Working as Documented is another version of WAD. My stance is that if the 
system dies because of a design feature, then perhaps that feature ought to 
be reconsidered. Certainly, there is no way to anticipate all possible feature 
failures, but when one comes up that is preventable, then the design ought to 
be tweaked. All of the discussion about whether it is or is not a DOS is 
totally irrelevant, especially to those who have been victimized.   

(I thought that Lyn Hadley eliminated WAD and BAD from the IBM vernacular years 
ago.)

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
 Sent: Friday, September 18, 2009 7:12 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 On 9/18/09 9:32 AM, Bill Holder hold...@us.ibm.com wrote:
 
  That is indeed one important question, but there was 
 another one, the 
  question of whether this was a denial of service attack exposure, 
  which i t is not.
 
 I think that's a point of view question.
 
 If I am another user on the same VM system, happy within my 
 cozy little class G box, and the hypervisor admin does 
 something outside of my control to some OTHER user that 
 causes CP to choke, then from the original user's perspective 
 it IS a DOS attack because it's something that is out of my 
 control, starves ME, and causes ME to choke without reason.
 
 An analagous parallel case in the distributed system world 
 would be a ping flood attack on a network segment. The 
 innocent get hurt along with the intended target by being 
 starved of access to the network, and thus lose the ability 
 to function according to design.
 
 From the hypervisor admin's POV, then yeah, it's just doing 
 what it's told to do. It's correct operation, working as documented.
 
 I think Bill Schuh and Marcy and myself are arguing for the 
 former viewpoint. I think you and Adam are arguing from the 
 latter view.
 
  I'm not disagreeing that it would be nice if there were 
 some sor t of 
  are you sure safety net before the system proceeded to try to do 
  something suicidal, but that's a design and requirements 
 question, not 
  a defect question.
 
 I think we're all in violent agreement on that point. Now, 
 the question is what is the best way to put a safety on that gun? 
 

Re: VM lockup due to storage typo

2009-09-18 Thread John P. Baker
Personally, I have always preferred BAC (Broken As Coded).

John P. Baker

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Friday, September 18, 2009 11:58 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that
name :-)

Working as Documented is another version of WAD. My stance is that if the
system dies because of a design feature, then perhaps that feature ought
to be reconsidered. Certainly, there is no way to anticipate all possible
feature failures, but when one comes up that is preventable, then the design
ought to be tweaked. All of the discussion about whether it is or is not a
DOS is totally irrelevant, especially to those who have been victimized.   

(I thought that Lyn Hadley eliminated WAD and BAD from the IBM vernacular
years ago.)

Regards, 
Richard Schuh


Re: VM lockup due to storage typo

2009-09-18 Thread Bob Levad
I think the real problem here is that when CP is thrashing about for
whatever reason, it can be very hard to get control of a VM prompt to
manually fix things.  Perhaps if CP could determine that some resource is
being sorely abused, it could degrade the offending machine at least to the
point that a favored user can do a bit of problem determination and possibly
force the offender(s).

Our operator (PROPST) machine has option quickdsp and share rel 1.  I
hope it never goes astray, but I also have a bit of hope that I will be able
to re-connect to it if some other virtual machine buggers the system so I
can straighten things out.

Bob.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Friday, September 18, 2009 12:11 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

While you are at it, make it self-healing, including the updating of the
source code. Or at least include a Medical Tricorder with each system.:-)

 We recognize that CP must be more forgiving and we are working to that 
 end, examining a variety of solutions that include inertial dampening, 
 tritanium plating, Kevlar(R), stacks of phone books, as well as taking 
 the gun away from you and beating you over the head with it (aka the 
 retaliatory baseball bat subroutine).
 
You may need dedicated DUMP packs in order to be able to do this. CP may
have outgrown the size of the dump space and cannot allocate a larger space
as a result of the problem. 

 The bottom line is that none of us want the system to go out to lunch.
 That doesn't serve anyone's purposes.  If it happens, get a restart 
 dump and let us know.  Sometimes it's *not* your fault.  Really!  :-)
 
 Alan Altmark
 z/VM Development
 IBM Endicott
 =

This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender.  This 
information may be legally privileged.  The information is intended only for 
the use of the individual or entity named above.  If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution, 
or the taking of any action in reliance on or regarding the contents of this 
electronically transmitted information is strictly prohibited.


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 11:58 AM, Schuh, Richard rsc...@visa.com wrote:

 Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that
 name :-)

It's your lawful good alter ego, arch nemesis of Chuckie. The Saturday
morning cartoon starring the Billster debuts next TV season, along with
Danger at Rockland Island: Endicott in Peril and The Poughkeepsie Seven,
a drama about seven virtualization protestors illegally imprisoned and
tortured in building 705 for resisting the One True OS for System z. 8-)


Re: VM lockup due to storage typo

2009-09-18 Thread Brian Nielsen
On Fri, 18 Sep 2009 13:49:27 -0400, David Boyes dbo...@sinenomine.net 

wrote:

On 9/18/09 11:38 AM, Bill Holder hold...@us.ibm.com wrote:

 On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes dbo...@sinenomine.net
 
w
 rote:
 I think we're all in violent agreement on that point. Now, the questi
on
 is
 what is the best way to put a safety on that gun?
 Is this a procedural or technical implementation question (or both)?
 For the former, I'd say a requirement is appropriate.

OK, got that covered and done.

 For the latter,  
 let's have at it.  :)

As I suggested in the requirement:

Possible solution would be to provide a SYSTEM CONFIG option
(Check_Resource_Alloc_Sanity for discussion purposes) and associated SET

command to check LOGIN, DEF STOR, and IPL events to determine whether th
e
requested resources (default virtual storage size for LOGIN, new value f
or
virtual storage for DEF STOR, and current virtual storage size at time o
f
issue for IPL) are greater than the current physical storage and defined

paging space. If check is true, then issue a warning message and cancel 

the
action. 

Option defaults to ON, can be turned off by class A user SET command.

Not perfect, but would catch most of the scenarios that have been 
discussed
so far. 

A scenario that hasn't been mentioned deals with draining a PAGE volume.

The calculation of defined paging space might be considered fuzzy if a 

PAGE volume is being DRAINed.  Of course, you could be strict and conside
r 
such a volume as undefined, but there will be cases where storage 
requirements for a guest are less than the available page space but put 

the total demand above defined paging space.

Brian


Re: VM lockup due to storage typo

2009-09-18 Thread Tom Duerbusch
The problem I would have, is my MAINT user is defined with 1 GB.  That is so I 
can process large reader files.
The very vast majority of the time, I'm only using a few MB.

Would you fix, prevent MAINT from logging on, when we are at, or near the 
discussed problem?
Operations also has some userids of a similar nature.

Tom Duerbusch
THD Consulting

 David Boyes dbo...@sinenomine.net 9/18/2009 12:49 PM 
On 9/18/09 11:38 AM, Bill Holder hold...@us.ibm.com wrote:

 On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes dbo...@sinenomine.net w
 rote:
 I think we're all in violent agreement on that point. Now, the question
 is
 what is the best way to put a safety on that gun?
 Is this a procedural or technical implementation question (or both)?
 For the former, I'd say a requirement is appropriate.

OK, got that covered and done.

 For the latter,  
 let's have at it.  :)

As I suggested in the requirement:

Possible solution would be to provide a SYSTEM CONFIG option
(Check_Resource_Alloc_Sanity for discussion purposes) and associated SET
command to check LOGIN, DEF STOR, and IPL events to determine whether the
requested resources (default virtual storage size for LOGIN, new value for
virtual storage for DEF STOR, and current virtual storage size at time of
issue for IPL) are greater than the current physical storage and defined
paging space. If check is true, then issue a warning message and cancel the
action. 

Option defaults to ON, can be turned off by class A user SET command.

Not perfect, but would catch most of the scenarios that have been discussed
so far. 


Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
Does the current physical storage refer to main or main + xstore? Also, is 
there any consideration of the total virtual storage or working sets of the 
in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a 
dozen users of 991G each logging on to my system that has only 1.02TB total 
page+physical memory.

It might be better to have a config file maximum and simply measure VM size 
against it - a MAXSTORE directory option that has been generalized, so to 
speak. Of course, any MAXSTORE directory entry that is lower would be 
respected. SET commands could temporarily lift or lower the limit for the 
system or for specific users. 

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
 Sent: Friday, September 18, 2009 10:49 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 On 9/18/09 11:38 AM, Bill Holder hold...@us.ibm.com wrote:
 
  On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes 
  dbo...@sinenomine.net w
  rote:
  I think we're all in violent agreement on that point. Now, the 
  question
  is
  what is the best way to put a safety on that gun?
  Is this a procedural or technical implementation question (or both)?
  For the former, I'd say a requirement is appropriate.
 
 OK, got that covered and done.
 
  For the latter,
  let's have at it.  :)
 
 As I suggested in the requirement:
 
 Possible solution would be to provide a SYSTEM CONFIG option 
 (Check_Resource_Alloc_Sanity for discussion purposes) and 
 associated SET command to check LOGIN, DEF STOR, and IPL 
 events to determine whether the requested resources (default 
 virtual storage size for LOGIN, new value for virtual storage 
 for DEF STOR, and current virtual storage size at time of 
 issue for IPL) are greater than the current physical storage 
 and defined paging space. If check is true, then issue a 
 warning message and cancel the action. 
 
 Option defaults to ON, can be turned off by class A user SET command.
 
 Not perfect, but would catch most of the scenarios that have 
 been discussed so far. 
 

Re: VM lockup due to storage typo

2009-09-18 Thread Marcy Cortes
VM64461 puts the brakes on console spooling by detecting that something crazy 
is going on and may exhaust all of vm's memory and pauses the virtual machine 
to allow the writes to disk to take place and the memory to get back under 
control. 
I believe messages are put out.   My understanding of that may be off a little, 
but that's the gist of it.

I'd like to see something like that.  If a virtual machine is up and running 
and CP sees that it is grabbing all of the page space at an excessive rate or 
if it is in danger not getting its page management blocks into memory then stun 
it (or maybe even a parm that says no one user can use more the x% of page).   
Put out a message to Operator about Userid BIGBAD has been halted due to 
excessive memory consumption or something like that.


Marcy 

This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Schuh, Richard
Sent: Friday, September 18, 2009 1:28 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: [IBMVM] VM lockup due to storage typo

Does the current physical storage refer to main or main + xstore? Also, is 
there any consideration of the total virtual storage or working sets of the 
in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a 
dozen users of 991G each logging on to my system that has only 1.02TB total 
page+physical memory.

It might be better to have a config file maximum and simply measure VM size 
against it - a MAXSTORE directory option that has been generalized, so to 
speak. Of course, any MAXSTORE directory entry that is lower would be 
respected. SET commands could temporarily lift or lower the limit for the 
system or for specific users. 

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
 Sent: Friday, September 18, 2009 10:49 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 On 9/18/09 11:38 AM, Bill Holder hold...@us.ibm.com wrote:
 
  On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes 
  dbo...@sinenomine.net w
  rote:
  I think we're all in violent agreement on that point. Now, the 
  question
  is
  what is the best way to put a safety on that gun?
  Is this a procedural or technical implementation question (or both)?
  For the former, I'd say a requirement is appropriate.
 
 OK, got that covered and done.
 
  For the latter,
  let's have at it.  :)
 
 As I suggested in the requirement:
 
 Possible solution would be to provide a SYSTEM CONFIG option 
 (Check_Resource_Alloc_Sanity for discussion purposes) and 
 associated SET command to check LOGIN, DEF STOR, and IPL 
 events to determine whether the requested resources (default 
 virtual storage size for LOGIN, new value for virtual storage 
 for DEF STOR, and current virtual storage size at time of 
 issue for IPL) are greater than the current physical storage 
 and defined paging space. If check is true, then issue a 
 warning message and cancel the action. 
 
 Option defaults to ON, can be turned off by class A user SET command.
 
 Not perfect, but would catch most of the scenarios that have 
 been discussed so far. 
 

Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
The action when spool fills has been to make the virtual printers and punches 
not ready for any user attempting to write. That does keep the system from 
crashing, but most systems running in the various VMs do not know how to handle 
it. Recovery can be a problem. It is almost as bad as recovering from a crashed 
SFS server. Pausing the spool hog(s) is a good idea, especially if it can be 
done early enough to prevent devices from being made not ready. 

Pausing page space hogs may be tougher to do. I can IPL a TPF system that is 
streaming dumps and not do whatever caused it to dump. I can also purge the 
individual dump files. I have no such action that I can take for a page space 
hog. In fact, the space it occupies will remain allocated it until it either 
logs off or does a system reset. About the only thing I can do is force it. I 
suppose it would be possible redefine its storage, but that would leave it in a 
virtual system reset state, so I might as well force it. 


Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
 Sent: Friday, September 18, 2009 1:42 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 VM64461 puts the brakes on console spooling by detecting that 
 something crazy is going on and may exhaust all of vm's 
 memory and pauses the virtual machine to allow the writes to 
 disk to take place and the memory to get back under control. 
 I believe messages are put out.   My understanding of that 
 may be off a little, but that's the gist of it.
 
 I'd like to see something like that.  If a virtual machine is 
 up and running and CP sees that it is grabbing all of the 
 page space at an excessive rate or if it is in danger not 
 getting its page management blocks into memory then stun it 
 (or maybe even a parm that says no one user can use more the 
 x% of page).   Put out a message to Operator about Userid 
 BIGBAD has been halted due to excessive memory consumption 
 or something like that.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard
 Sent: Friday, September 18, 2009 1:28 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: [IBMVM] VM lockup due to storage typo
 
 Does the current physical storage refer to main or main + 
 xstore? Also, is there any consideration of the total virtual 
 storage or working sets of the in-Queue, in-memory, or 
 logged-on users in the calculation? I wouldn't want a dozen 
 users of 991G each logging on to my system that has only 
 1.02TB total page+physical memory.
 
 It might be better to have a config file maximum and simply 
 measure VM size against it - a MAXSTORE directory option that 
 has been generalized, so to speak. Of course, any MAXSTORE 
 directory entry that is lower would be respected. SET 
 commands could temporarily lift or lower the limit for the 
 system or for specific users. 
 
 Regards,
 Richard Schuh 
 
  
 
  -Original Message-
  From: The IBM z/VM Operating System
  [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
  Sent: Friday, September 18, 2009 10:49 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: Re: VM lockup due to storage typo
  
  On 9/18/09 11:38 AM, Bill Holder hold...@us.ibm.com wrote:
  
   On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes 
   dbo...@sinenomine.net w
   rote:
   I think we're all in violent agreement on that point. Now, the 
   question
   is
   what is the best way to put a safety on that gun?
   Is this a procedural or technical implementation question 
 (or both)?
   For the former, I'd say a requirement is appropriate.
  
  OK, got that covered and done.
  
   For the latter,
   let's have at it.  :)
  
  As I suggested in the requirement:
  
  Possible solution would be to provide a SYSTEM CONFIG option 
  (Check_Resource_Alloc_Sanity for discussion purposes) and 
 associated 
  SET command to check LOGIN, DEF STOR, and IPL events to determine 
  whether the requested resources (default virtual storage size for 
  LOGIN, new value for virtual storage for DEF STOR, and 
 current virtual 
  storage size at time of issue for IPL) are greater than the current 
  physical storage and defined paging space. If check is true, then 
  issue a warning message and cancel the action.
  
  Option defaults to ON, can be turned off by class A user 
 SET command.
  
  Not perfect, but would catch most of the scenarios that have been 
  discussed so far.
  

Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 4:27 PM, Schuh, Richard rsc...@visa.com wrote:

 Does the current physical storage refer to main or main + xstore? Also, is
 there any consideration of the total virtual storage or working sets of the
 in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a
 dozen users of 991G each logging on to my system that has only 1.02TB total
 page+physical memory.
 
 It might be better to have a config file maximum and simply measure VM size
 against it - a MAXSTORE directory option that has been generalized, so to
 speak. Of course, any MAXSTORE directory entry that is lower would be
 respected. SET commands could temporarily lift or lower the limit for the
 system or for specific users.

AFAICT, most of the Xstore I see out there is configured to be page cache,
so I usually would think of it as configured online paging space.

I posed the problem in the requirement as generally as possible. Most cases,
IBM doesn't like too specific suggestions in requirements, so I kept my
suggestion pretty generalized.

If others submit requirements, I suspect it'll be more likely to get their
attention and get a solution created. 


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 3:41 PM, Brian Nielsen bniel...@sco.idaho.gov wrote:


 A scenario that hasn't been mentioned deals with draining a PAGE volume.
 
 The calculation of defined paging space might be considered fuzzy if a
 
 PAGE volume is being DRAINed.  Of course, you could be strict and conside
 r 
 such a volume as undefined, but there will be cases where storage
 requirements for a guest are less than the available page space but put
 
 the total demand above defined paging space.

Good point. I think that I would consider a page volume marked as draining
as unavailable space as soon as CP starts the DRAIN operation, but you're
right that the wording I used is ambiguous. I'll change it to read
available and online paging space.

I'll wait a day or so and see if anyone else has comments and resubmit it.


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 3:50 PM, Tom Duerbusch duerbus...@stlouiscity.com wrote:

 The problem I would have, is my MAINT user is defined with 1 GB.  That is so I
 can process large reader files.
 The very vast majority of the time, I'm only using a few MB.
 Would you fix, prevent MAINT from logging on, when we are at, or near the
 discussed problem?
 Operations also has some userids of a similar nature.

I don't want to be too prescriptive here -- gotta give Alan something to
chew on -- but I would expect that there would need to be some exemption
mechanism for userids that are known to need extra humungous virtual machine
sizes and are known to be reasonably well behaved.

If IBM shipped an ESM by default (even an awful one), I'd say that should be
done in the ESM, but that's another crusade. 


Re: VM lockup due to storage typo

2009-09-18 Thread Lee Stewart
While I agree it's not a DoS attack exposure, the system issued no 
messages and allowed no input on any console (via tn3270, OSA ICC 
console, HMC 3270 or HMC Operating system messages).  If we had a way to 
enter a command or two (probably an IND first), we could have forced off 
the offender and not hard crashed 30+ other Oracle servers.


As someone suggested, CP was probably busy allocating paging structures 
etc.  But should that be to the exclusion of any console input or 
operator control?   To have an entire LPAR appear hung to all consoles, 
and all Linuxes become non-responsive for 15-20-30 minutes certainly 
seems like a DoS to me...


Lee

Bill Holder wrote:

I see this as three separate questions (with my answers):

Is it a denial of service attack exposure?
- Clearly not.

Is it a defect?
- I don't believe so, for the base issue of whether VM
  should allow a privileged user do do something destructive,
  though there may well be defects or scalability / constraint
  shortcomings exposed by the hang (we'd need to see a dump to
  understand what's really happening).

Is this an area ripe for improvement, could/should VM be
smarter about preventing a privileged from doing something
dangerous or destructive?
- Sure.  I won't tell you not to open a requirement.  


- Bill Holder, z/VM Development, IBM




--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-18 Thread Rich Smrcina

Adam Thornton wrote:

On Sep 18, 2009, at 9:11 AM, David Boyes wrote:



I think we're all in violent agreement on that point. Now, the 
question is

what is the best way to put a safety on that gun?


Oooh!  Oooh!  Pick me!  Mandatory User Access Control dialog boxes 
that pop up and make you click OK any time you want to breathe.


Adam


Would those be 3270 flower boxes?

--
Rich Smrcina


Re: VM lockup due to storage typo

2009-09-18 Thread Alan Altmark
On Friday, 09/18/2009 at 10:13 EDT, David Boyes dbo...@sinenomine.net 
wrote:
 On 9/18/09 9:32 AM, Bill Holder hold...@us.ibm.com wrote:
 
  That is indeed one important question, but there was another one, the
  question of whether this was a denial of service attack exposure, 
which
  it is not.
 
 I think that's a point of view question.

It's all very Humpty Dumpty.  :-)  Integrity has a precise meaning with 
regard to APARs.   The *guest* is not doing anything to annoy CP.  CP is 
actually annoying himself trying to instantiate the guest.  Until control 
is given to the guest, nothing can be attributed to the guest.  The walls 
between guests and between the guest and CP have not been breached.  Ergo, 
no integrity problem.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
I don't entirely agree.  The action of the guest did not cause harm
to CP, it was the action of the operations staff which did.  This
is not a denial of service case that I can see.

Bill Holder
z/VM Development, Memory Management team leader, IBM

On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard rsc...@visa.com wrot
e:

Maybe CP couldn't know that the guest would do something bad, but it sho
uld
know that it has opened itself to the possibility that the guest could, i
n
normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is th
at
the guest cannot be allowed to harm the CP. This clearly violates that.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
 Sent: Tuesday, September 15, 2009 9:19 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 CP wouldn't know at IPL time, the guest would, not could, but 
 would cause such harm.
 
 Just because you say you can use xxx GB, doesn't mean you 
 would actually use them.
 
 When page fills, it over flows to spool.
 When spool fills, CP abends on the next pageout.
 
 Tom Duerbusch
 THD Consulting
 
  Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009 
 11:02 AM 
 See a thread on this list with subject Sanity check? from 
 Oct 2007 for what happened when I did the same thing ;)
 
 You probably filled page space.
 
 I still think IBM should refuse to IPL a guest that will 
 cause such harm.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
 Sent: Tuesday, September 15, 2009 8:39 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: [IBMVM] VM lockup due to storage typo
 
 Does anyone have an idea of how we might have gotten out of 
 this without an IPL?
 
 VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
 Several guests needed more memory added so the directory was 
 updated and one by one the guests shutdown, logged off and 
 back on.  So far, so good.
 
 But... In changing the memory for many guests, and it being 
 late at night after a long day, while meaning to set a 
 guest's memory to 9728M, it got set to 9728G.  When that 
 guest was cycled we see the message on the console that it's 
 memory was limited to 8TB (HCPLGN093E), then the VM system 
 appeared to freeze.
 
 We couldn't get in via TCP/IP, or the HMC Operating System 
 Messages screen, or the HMC Integrated 3270.
 
 Finally had to IPL.   Even that was wierd as I'd have 
 expected the Load 
 Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 

 and all came back up ok...
 
 I suspect CP was scrambling paging everything in the world 
 out as Linux 
 tried to initialize that 8TB of memory...   But I'm surprised 
 I couldn't 
 even get into the HMC consoles (to kill just that one guest 
 as opposed to all of them)..
 
 Any thoughts?
 Lee
 -- 
 
 Lee Stewart, Senior SE
 Sirius Computer Solutions
 Phone: (303) 996-7122
 Email: lee.stew...@siriuscom.com 
 Web:   www.siriuscom.com
 
=
===


Re: VM lockup due to storage typo

2009-09-17 Thread P S
On Thu, Sep 17, 2009 at 9:14 AM, Bill Holder hold...@us.ibm.com wrote:
 I don't entirely agree.  The action of the guest did not cause harm
 to CP, it was the action of the operations staff which did.  This
 is not a denial of service case that I can see.

Hm. So by that rationale, we can make STORE H class G, because it
won't be the *guest* harming CP, it will be the end-user who types the
command.


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
I should point out that this hang is likely being misunderstood here.  

While this scenario will indeed drive paging over the edge, that's not 

likely what happened.  If paging had been driven to that point, the 
system would have quickly taken a PGT004 abend and restarted.  Instead, 

I believe what happened is likely a most difficult to solve variant on
something that was mentioned before: that is, difficulty allocating CP
structures required to represent the massive amount of storage.  Page 
tables are only part of the problem.  The upper level DAT tables (region 

and segment) can be up to 4 frames long, and once storage utilization 
becomes heavy enough, it becomes fragmented (PGMBK allocation being 
a factor here), making it very difficult for CP to allocate contiguous 

sets of 3s and 4s.  We spent quite a bit of effort in z/VM 5.3.0 
addressing the PGMBK side of this issue, but the harder problem of 
the upper level tables remains as a likely constraint point.  

Occurrences of this sort of problem are likely to result in temporary 
or permanent hangs of both individual users and eventually the entire 
system, which supports the theory in this case.  I'd really need to 
see a dump of the system in question to confirm this hypothesis, 
however.  

Bill Holder
z/VM Development, Memory Management team lead, IBM


Re: VM lockup due to storage typo

2009-09-17 Thread Quay, Jonathan (IHG)
It sounds very similar in symptom to my minidisk cache overcommitment
problem that resulted in CP thrashing (and an APAR).

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Bill Holder
Sent: Thursday, September 17, 2009 12:34 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

I should point out that this hang is likely being misunderstood here.  =

While this scenario will indeed drive paging over the edge, that's not =

likely what happened.  If paging had been driven to that point, the 
system would have quickly taken a PGT004 abend and restarted.  Instead,
=

I believe what happened is likely a most difficult to solve variant on
something that was mentioned before: that is, difficulty allocating CP
structures required to represent the massive amount of storage.  Page 
tables are only part of the problem.  The upper level DAT tables (region
=

and segment) can be up to 4 frames long, and once storage utilization 
becomes heavy enough, it becomes fragmented (PGMBK allocation being 
a factor here), making it very difficult for CP to allocate contiguous =

sets of 3s and 4s.  We spent quite a bit of effort in z/VM 5.3.0 
addressing the PGMBK side of this issue, but the harder problem of 
the upper level tables remains as a likely constraint point.  

Occurrences of this sort of problem are likely to result in temporary 
or permanent hangs of both individual users and eventually the entire 
system, which supports the theory in this case.  I'd really need to 
see a dump of the system in question to confirm this hypothesis, 
however.  

Bill Holder
z/VM Development, Memory Management team lead, IBM  


Re: VM lockup due to storage typo

2009-09-17 Thread Rob van der Heij
On Thu, Sep 17, 2009 at 6:34 PM, Bill Holder hold...@us.ibm.com wrote:

 Occurrences of this sort of problem are likely to result in temporary
 or permanent hangs of both individual users and eventually the entire
 system, which supports the theory in this case.  I'd really need to
 see a dump of the system in question to confirm this hypothesis,
 however.

And I think Lee has not yet mentioned how much paging space he had
allocated. With a 175G LPAR you would think he has at least 175G worth
of virtual machines, so 350G of paging space... for the moment the
next virtual machine went over the edge. I very much doubt he was that
well prepared. With that amount of space, things might have gotten
slow but there's a fair chance CP would have survived the abuse.

Rob


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
No, not at all, that's not what I was saying; what you propose would
obviously be an exposure.  A privileged user (operations staff) can issue

that today.  Putting a loaded gun in the hands of a class G user is not a
t
all the same thing.  Anything a user at a keyboard can do, a guest progra
m
can do, generally, and they all have to be protected.

On Thu, 17 Sep 2009 09:23:11 -0700, P S zosw...@gmail.com wrote:

On Thu, Sep 17, 2009 at 9:14 AM, Bill Holder hold...@us.ibm.com wrote:

 I don't entirely agree.  The action of the guest did not cause harm
 to CP, it was the action of the operations staff which did.  This
 is not a denial of service case that I can see.

Hm. So by that rationale, we can make STORE H class G, because it
won't be the *guest* harming CP, it will be the end-user who types the
command.


Re: VM lockup due to storage typo

2009-09-17 Thread Schuh, Richard
An IPL isn't an action? True, the guest was not aware that it would harm the 
system, but absent that action by the guest, there would not have been a 
problem. The guest was an unwitting agent, a part of a bot net, as it were.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
 Sent: Thursday, September 17, 2009 9:14 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 I don't entirely agree.  The action of the guest did not 
 cause harm to CP, it was the action of the operations staff 
 which did.  This is not a denial of service case that I can see.
 
 Bill Holder
 z/VM Development, Memory Management team leader, IBM
 
 On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard 
 rsc...@visa.com wrot=
 e:
 
 Maybe CP couldn't know that the guest would do something bad, but it 
 sho=
 uld
 know that it has opened itself to the possibility that the 
 guest could, i= n normal operation, cause the problem. 
 One of Alan's first precepts of information security and 
 integrity is 
 th=
 at
 the guest cannot be allowed to harm the CP. This clearly 
 violates that.
 
 Regards,
 Richard Schuh
 
  
 
  -Original Message-
  From: The IBM z/VM Operating System
  [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
  Sent: Tuesday, September 15, 2009 9:19 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: Re: VM lockup due to storage typo
  
  CP wouldn't know at IPL time, the guest would, not could, 
 but would 
  cause such harm.
  
  Just because you say you can use xxx GB, doesn't mean you would 
  actually use them.
  
  When page fills, it over flows to spool.
  When spool fills, CP abends on the next pageout.
  
  Tom Duerbusch
  THD Consulting
  
   Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009
  11:02 AM 
  See a thread on this list with subject Sanity check? 
 from Oct 2007 
  for what happened when I did the same thing ;)
  
  You probably filled page space.
  
  I still think IBM should refuse to IPL a guest that will 
 cause such 
  harm.
  
  
  Marcy
  
  This message may contain confidential and/or privileged 
 information. 
  If you are not the addressee or authorized to receive this for the 
  addressee, you must not use, copy, disclose, or take any 
 action based 
  on this message or any information herein. If you have 
 received this 
  message in error, please advise the sender immediately by reply 
  e-mail and delete this message. Thank you for your cooperation.
  
  
  -Original Message-
  From: The IBM z/VM Operating System 
  [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
  Sent: Tuesday, September 15, 2009 8:39 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: [IBMVM] VM lockup due to storage typo
  
  Does anyone have an idea of how we might have gotten out of 
  this without an IPL?
  
  VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
  Several guests needed more memory added so the directory was 
  updated and one by one the guests shutdown, logged off and 
  back on.  So far, so good.
  
  But... In changing the memory for many guests, and it being 
  late at night after a long day, while meaning to set a 
  guest's memory to 9728M, it got set to 9728G.  When that 
  guest was cycled we see the message on the console that it's 
  memory was limited to 8TB (HCPLGN093E), then the VM system 
  appeared to freeze.
  
  We couldn't get in via TCP/IP, or the HMC Operating System 
  Messages screen, or the HMC Integrated 3270.
  
  Finally had to IPL.   Even that was wierd as I'd have 
  expected the Load 
  Normal to shutdown, it just IPLed.   We did NoAutolog, 
 fixed the typo =
 
  and all came back up ok...
  
  I suspect CP was scrambling paging everything in the world 
  out as Linux 
  tried to initialize that 8TB of memory...   But I'm surprised 
  I couldn't 
  even get into the HMC consoles (to kill just that one guest 
  as opposed to all of them)..
  
  Any thoughts?
  Lee
  -- 
  
  Lee Stewart, Senior SE
  Sirius Computer Solutions
  Phone: (303) 996-7122
  Email: lee.stew...@siriuscom.com 
  Web:   www.siriuscom.com
  =
 ==
 ===
 

Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
Sure, true enough, but the exposure was not caused by the guest
action.  Yes, it wouldn't have happened had the guest not logged
on an IPLed, but that wasn't the root cause, the typo was.
The action of the class G user didn't cause the problem, therefore
it's not a Denial of Service attack case.  Note that I'm not
saying it's not APARable, however.

Regards,
- Bill Holder

On Thu, 17 Sep 2009 10:21:05 -0700, Schuh, Richard rsc...@visa.com wrot
e:

An IPL isn't an action? True, the guest was not aware that it would harm

the system, but absent that action by the guest, there would not have bee
n a
problem. The guest was an unwitting agent, a part of a bot net, as it wer
e.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
 Sent: Thursday, September 17, 2009 9:14 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 I don't entirely agree.  The action of the guest did not 
 cause harm to CP, it was the action of the operations staff 
 which did.  This is not a denial of service case that I can see.
 
 Bill Holder
 z/VM Development, Memory Management team leader, IBM
 
 On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard 
 rsc...@visa.com wrot=
 e:
 
 Maybe CP couldn't know that the guest would do something bad, but it 

 sho=
 uld
 know that it has opened itself to the possibility that the 
 guest could, i= n normal operation, cause the problem. 
 One of Alan's first precepts of information security and 
 integrity is 
 th=
 at
 the guest cannot be allowed to harm the CP. This clearly 
 violates that.
 
 Regards,
 Richard Schuh
 
  
 
  -Original Message-
  From: The IBM z/VM Operating System
  [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
  Sent: Tuesday, September 15, 2009 9:19 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: Re: VM lockup due to storage typo
  
  CP wouldn't know at IPL time, the guest would, not could, 
 but would 
  cause such harm.
  
  Just because you say you can use xxx GB, doesn't mean you would 
  actually use them.
  
  When page fills, it over flows to spool.
  When spool fills, CP abends on the next pageout.
  
  Tom Duerbusch
  THD Consulting
  
   Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009
  11:02 AM 
  See a thread on this list with subject Sanity check? 
 from Oct 2007 
  for what happened when I did the same thing ;)
  
  You probably filled page space.
  
  I still think IBM should refuse to IPL a guest that will 
 cause such 
  harm.
  
  
  Marcy
  
  This message may contain confidential and/or privileged 
 information. 
  If you are not the addressee or authorized to receive this for the 

  addressee, you must not use, copy, disclose, or take any 
 action based 
  on this message or any information herein. If you have 
 received this 
  message in error, please advise the sender immediately by reply 
  e-mail and delete this message. Thank you for your cooperation.
  
  
  -Original Message-
  From: The IBM z/VM Operating System 
  [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
  Sent: Tuesday, September 15, 2009 8:39 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: [IBMVM] VM lockup due to storage typo
  
  Does anyone have an idea of how we might have gotten out of 
  this without an IPL?
  
  VM LPAR has 175G of memory and a flock of Linux Oracle guests... 

  Several guests needed more memory added so the directory was 
  updated and one by one the guests shutdown, logged off and 
  back on.  So far, so good.
  
  But... In changing the memory for many guests, and it being 
  late at night after a long day, while meaning to set a 
  guest's memory to 9728M, it got set to 9728G.  When that 
  guest was cycled we see the message on the console that it's 
  memory was limited to 8TB (HCPLGN093E), then the VM system 
  appeared to freeze.
  
  We couldn't get in via TCP/IP, or the HMC Operating System 
  Messages screen, or the HMC Integrated 3270.
  
  Finally had to IPL.   Even that was wierd as I'd have 
  expected the Load 
  Normal to shutdown, it just IPLed.   We did NoAutolog, 
 fixed the typo =
 
  and all came back up ok...
  
  I suspect CP was scrambling paging everything in the world 
  out as Linux 
  tried to initialize that 8TB of memory...   But I'm surprised 
  I couldn't 
  even get into the HMC consoles (to kill just that one guest 
  as opposed to all of them)..
  
  Any thoughts?
  Lee
  -- 
  
  Lee Stewart, Senior SE
  Sirius Computer Solutions
  Phone: (303) 996-7122
  Email: lee.stew...@siriuscom.com 
  Web:   www.siriuscom.com
  ===
==
 
==
 ===



Re: VM lockup due to storage typo

2009-09-17 Thread Schuh, Richard
I don't think you can differentiate between the root cause and the immediate 
cause when it comes to security and integrity. You may not necessarily be able 
to detect the root cause, but you must protect the system against the immediate 
cause if at all possible.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
 Sent: Thursday, September 17, 2009 10:35 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 Sure, true enough, but the exposure was not caused by the 
 guest action.  Yes, it wouldn't have happened had the guest 
 not logged on an IPLed, but that wasn't the root cause, the typo was.
 The action of the class G user didn't cause the problem, 
 therefore it's not a Denial of Service attack case.  Note 
 that I'm not saying it's not APARable, however.
 
 Regards,
 - Bill Holder
 
 On Thu, 17 Sep 2009 10:21:05 -0700, Schuh, Richard 
 rsc...@visa.com wrot=
 e:
 
 An IPL isn't an action? True, the guest was not aware that it would 
 harm=
 
 the system, but absent that action by the guest, there would 
 not have bee= n a problem. The guest was an unwitting agent, 
 a part of a bot net, as it wer= e.
 
 Regards,
 Richard Schuh
 
  
 
  -Original Message-
  From: The IBM z/VM Operating System
  [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
  Sent: Thursday, September 17, 2009 9:14 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: Re: VM lockup due to storage typo
  
  I don't entirely agree.  The action of the guest did not 
 cause harm 
  to CP, it was the action of the operations staff which 
 did.  This is 
  not a denial of service case that I can see.
  
  Bill Holder
  z/VM Development, Memory Management team leader, IBM
  
  On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard 
 rsc...@visa.com 
  wrot=
  e:
  
  Maybe CP couldn't know that the guest would do something 
 bad, but it 
  =
 
  sho=
  uld
  know that it has opened itself to the possibility that the guest 
  could, i= n normal operation, cause the problem.
  One of Alan's first precepts of information security and
  integrity is
  th=
  at
  the guest cannot be allowed to harm the CP. This clearly violates 
  that.
  
  Regards,
  Richard Schuh
  
   
  
   -Original Message-
   From: The IBM z/VM Operating System 
   [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
   Sent: Tuesday, September 15, 2009 9:19 AM
   To: IBMVM@LISTSERV.UARK.EDU
   Subject: Re: VM lockup due to storage typo
   
   CP wouldn't know at IPL time, the guest would, not could,
  but would
   cause such harm.
   
   Just because you say you can use xxx GB, doesn't mean you would 
   actually use them.
   
   When page fills, it over flows to spool.
   When spool fills, CP abends on the next pageout.
   
   Tom Duerbusch
   THD Consulting
   
Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009
   11:02 AM 
   See a thread on this list with subject Sanity check? 
  from Oct 2007
   for what happened when I did the same thing ;)
   
   You probably filled page space.
   
   I still think IBM should refuse to IPL a guest that will
  cause such
   harm.
   
   
   Marcy
   
   This message may contain confidential and/or privileged
  information. 
   If you are not the addressee or authorized to receive 
 this for the 
   =
 
   addressee, you must not use, copy, disclose, or take any
  action based
   on this message or any information herein. If you have
  received this
   message in error, please advise the sender immediately by reply 
   e-mail and delete this message. Thank you for your cooperation.
   
   
   -Original Message-
   From: The IBM z/VM Operating System 
   [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
   Sent: Tuesday, September 15, 2009 8:39 AM
   To: IBMVM@LISTSERV.UARK.EDU
   Subject: [IBMVM] VM lockup due to storage typo
   
   Does anyone have an idea of how we might have gotten 
 out of this 
   without an IPL?
   
   VM LPAR has 175G of memory and a flock of Linux Oracle 
 guests... =
 
   Several guests needed more memory added so the directory was 
   updated and one by one the guests shutdown, logged off and back 
   on.  So far, so good.
   
   But... In changing the memory for many guests, and it 
 being late 
   at night after a long day, while meaning to set a 
 guest's memory 
   to 9728M, it got set to 9728G.  When that guest was 
 cycled we see 
   the message on the console that it's memory was limited to 8TB 
   (HCPLGN093E), then the VM system appeared to freeze.
   
   We couldn't get in via TCP/IP, or the HMC Operating System 
   Messages screen, or the HMC Integrated 3270.
   
   Finally had to IPL.   Even that was wierd as I'd have 
   expected the Load 
   Normal to shutdown, it just IPLed.   We did NoAutolog, 
  fixed the typo =
  
   and all came back up ok...
   
   I suspect CP was scrambling paging everything in the 
 world out

Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
I'd agree with that point in cases where it's less clear, but in
this case, it's perfectly clear that the user action would have
been harmless if not for the administrator typo.  I don't disagree
that more protection at the user action level would be nice in 
this case, that's really different discussion than whether this
constitutes a denial of service exposure.  

There's a reason that trusted users are called that, because 
they have the power to shoot themselves, and the entire system.  
We cannot protect against every possible harmful act by trusted
users, whether accidental or malicious. 

Regards,
- Bill Holder

On Thu, 17 Sep 2009 10:48:53 -0700, Schuh, Richard rsc...@visa.com wrot
e:

I don't think you can differentiate between the root cause and the
immediate cause when it comes to security and integrity. You may not
necessarily be able to detect the root cause, but you must protect the
system against the immediate cause if at all possible.

Regards, 
Richard Schuh 




Re: VM lockup due to storage typo

2009-09-17 Thread P S
On Thu, Sep 17, 2009 at 10:58 AM, Bill Holder hold...@us.ibm.com wrote:
 I'd agree with that point in cases where it's less clear, but in
 this case, it's perfectly clear that the user action would have
 been harmless if not for the administrator typo.  I don't disagree
 that more protection at the user action level would be nice in
 this case, that's really different discussion than whether this
 constitutes a denial of service exposure.

OK, I buy that. If the sysprog does a UCR to make SHUTDOWN class G, it
isn't VM's fault if a user issues SHUTDOWN.


Re: VM lockup due to storage typo

2009-09-17 Thread Lee Stewart
FYI, the system in question had about 175GB of page space - 22 mod 9s. 
Currently the system does NO paging.  All the guests fit within real 
storage.  (Of course there will eventually be more guests on that LPAR, 
so sooner or later we'll start to page.)


Lee

Rob van der Heij wrote:

On Thu, Sep 17, 2009 at 6:34 PM, Bill Holder hold...@us.ibm.com wrote:


Occurrences of this sort of problem are likely to result in temporary
or permanent hangs of both individual users and eventually the entire
system, which supports the theory in this case.  I'd really need to
see a dump of the system in question to confirm this hypothesis,
however.


And I think Lee has not yet mentioned how much paging space he had
allocated. With a 175G LPAR you would think he has at least 175G worth
of virtual machines, so 350G of paging space... for the moment the
next virtual machine went over the edge. I very much doubt he was that
well prepared. With that amount of space, things might have gotten
slow but there's a fair chance CP would have survived the abuse.

Rob




--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-17 Thread Adam Thornton

On Sep 17, 2009, at 1:58 PM, Bill Holder wrote:


I'd agree with that point in cases where it's less clear, but in
this case, it's perfectly clear that the user action would have
been harmless if not for the administrator typo


Yabbut

Administrator typo is not a failure mode the operating system is  
designed to protect you from.  If you have authority to edit the user  
directory, then, well, your gun, your foot.


Adam


Re: VM lockup due to storage typo

2009-09-17 Thread David Boyes
On 9/17/09 2:16 PM, Adam Thornton athorn...@sinenomine.net wrote:

 
 Administrator typo is not a failure mode the operating system is
 designed to protect you from.

That may be true now, but I think the point of the argument is that it
should not be. 

On VMS, if you have a SYSTEM priv bit set, the system will still warn you if
you're about to do something that seems stupid. If there is an architected
limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a
problem), then it's not too unreasonable for the system to take defensive
measures and issue a warning that all is not right in in the kingdom of
Denmark, cream or no cream dresses.

It seems like a basic defense that if CP notices you starting something that
it KNOWS it may not have resources to complete, requiring confirmation that
you know what you're doing (or about to do) is a good defensive measure.

Did the system do what you told it to do when you told it to do it? Yes.
Whether it should march off a cliff without at least questioning the order
is the question at hand.

-- db


Re: VM lockup due to storage typo

2009-09-17 Thread Adam Thornton

On Sep 17, 2009, at 5:36 PM, David Boyes wrote:

Whether it should march off a cliff without at least questioning the  
order

is the question at hand.


Of course it should.

Yes, my Unix is showing.

Adam


Re: VM lockup due to storage typo

2009-09-17 Thread Marcy Cortes
Well, there is precedence here of VM dev fixing things that are too large/too 
much that take down VM
See VM64461 and VM6
 
I'll probably look into the possibility of a vmsecure exit to add a safety to 
my gun for now.

Marcy 

This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.

Re: VM lockup due to storage typo

2009-09-16 Thread Alan Altmark
On Tuesday, 09/15/2009 at 04:50 EDT, Marcy Cortes 
marcy.d.cor...@wellsfargo.com wrote:
 So are you saying that what Lee and I both did to shoot our systems 
should 
 APAR'able?  Or should it be a requirement?  Or is it going to be a your 
gun, 
 your foot answer?

I was just answering the Is it an integrity problem? question:  No, it 
isn't an integrity problem.  The sysadmin did something that ultimately 
caused the system to lock up.  (That doesn't mean it was the sysadmin's 
fault, however.)

If you feel you have found a defect, open a PMR.  That's how you find out 
if something is really APARable.  :-)

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Kris Buelens
2009/9/15 Schuh, Richard rsc...@visa.com

 The same might be said for page space. Someone could access a dataspace
 enabled directory and take up page space. We could easily take up 48G of
 page space here by starting 24 machines that each access different d/s
 directories at 2G each.


Dataspace enabled directories are not paged out to paging space; the CP
paging operations for it are issued against the minidisks of the SFS
servers; neither are all dataspace pages brought in storage at the moment of
ACCESS.  The SFS dataspaces are called mapped dataspaces.  A small
exception: the structures holding the FST blocks, they are not mapped to the
SFS server minidisks, they can page paged out to CP space (and obviously
CP's page management blocks occupy some storage too).
DB2/VM at the other hand, it can also use non-mapped dataspaces.

-- 
Kris Buelens,
IBM Belgium, VM customer support


Re: VM lockup due to storage typo

2009-09-16 Thread RPN01
I don't think, in this case, it is the user causing the problem at all. The
user didn't define their storage allocation, and in practice can't do that
at all. So the user didn't set up the situation which caused the integrity
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,
decisions are left to him. The system doesn't question what he does, within
the definition of the syntax, semantics and limitations of the directory
entries and commands. If you want to define a large virtual machine, should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to add
more page space before logging in the new directory entry. Maybe a warning
of User  exceeds paging space could have averted this situation, but
again, each user would have to be checked against the running system. It
shouldn't keep you from creating the entry, just let you know that there
might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the gun
at his toes.

-- 
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
In theory, theory and practice are the same, but
 in practice, theory and practice are different.




On 9/15/09 3:44 PM, Alan Altmark alan_altm...@us.ibm.com wrote:

 On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak sama...@gizmoworks.com
 wrote:
 I agree with that (the guest cannot be allowed to harm CP) but has
 that
 actually been formally - or even informally - accepted by the Powers
 That
 Be?
 
 Yes, it is in the Statement of System Integrity in the General Information
 Manual.
 
 I ask because I still remember, as though it were yesterday, opening a
 security/integrity APAR against VM back in the mid-1980's because any
 class G user could knock CP down by defining a shared and a nonshared
 device on the same virtual control unit, and being told that that was
 NOT
 a security or integrity issue, and that no fix would be forthcoming.
 
 Under today's rules, that would be an Integrity problem.
 
 o If a class G (only) user can repeatedly or with malice of forethought
 hang or abend CP, it WILL be classified as an integrity problem (denial of
 service).
 
 o If a class G user happens to do something that triggers an abend or hang
 due to a system malfunction, it will NOT be classified as an integrity
 problem.
 
 o If the system abends or hangs because it is overloaded (memory, CPU), it
 will NOT be classified as an integrity problem.
 
 o Just because it isn't an integrity problem doesn't mean it isn't a
 defect.
 
 Alan Altmark
 z/VM Development
 IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Mike Walter
I can't support DIRECTXA as the sole examination.  Paging volumes can be 
added at any time.  DIRECTXA only gets a change to look when it is run. 

If this even needs to be addressed (hence, this thoughtful thread), IMHO 
comparing the min and max virtual machine memory specification would be 
better done when the virtual machine is being built during 
logon/autolog/xautolog. 

OTOH, it would not hurt to have DIRECTXA provide that early warning so 
that when one finally does attempt to create the virtual machine, any 
typos might already have been displayed and corrected when DIRECTXA 
provided an early warning.  It's just plain embarrassing for an existing 
virtual machine to cause a problem because the sysprog made a wild (or 
uninformed) keystroke while editing the directory source ... another 
source of sysprog collateral damage.

Mike Walter
Hewitt Associates
The opinions expressed herein are mine alone, not my employer's.



RPN01 nix.rob...@mayo.edu 

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
09/16/2009 08:13 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: VM lockup due to storage typo






I don't think, in this case, it is the user causing the problem at all. 
The
user didn't define their storage allocation, and in practice can't do that
at all. So the user didn't set up the situation which caused the integrity
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,
decisions are left to him. The system doesn't question what he does, 
within
the definition of the syntax, semantics and limitations of the directory
entries and commands. If you want to define a large virtual machine, 
should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to add
more page space before logging in the new directory entry. Maybe a warning
of User  exceeds paging space could have averted this situation, but
again, each user would have to be checked against the running system. It
shouldn't keep you from creating the entry, just let you know that there
might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the gun
at his toes.

-- 
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
In theory, theory and practice are the same, but
 in practice, theory and practice are different.




On 9/15/09 3:44 PM, Alan Altmark alan_altm...@us.ibm.com wrote:

 On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
sama...@gizmoworks.com
 wrote:
 I agree with that (the guest cannot be allowed to harm CP) but has
 that
 actually been formally - or even informally - accepted by the Powers
 That
 Be?
 
 Yes, it is in the Statement of System Integrity in the General 
Information
 Manual.
 
 I ask because I still remember, as though it were yesterday, opening a
 security/integrity APAR against VM back in the mid-1980's because any
 class G user could knock CP down by defining a shared and a nonshared
 device on the same virtual control unit, and being told that that was
 NOT
 a security or integrity issue, and that no fix would be forthcoming.
 
 Under today's rules, that would be an Integrity problem.
 
 o If a class G (only) user can repeatedly or with malice of forethought
 hang or abend CP, it WILL be classified as an integrity problem (denial 
of
 service).
 
 o If a class G user happens to do something that triggers an abend or 
hang
 due to a system malfunction, it will NOT be classified as an integrity
 problem.
 
 o If the system abends or hangs because it is overloaded (memory, CPU), 
it
 will NOT be classified as an integrity problem.
 
 o Just because it isn't an integrity problem doesn't mean it isn't a
 defect.
 
 Alan Altmark
 z/VM Development
 IBM Endicott






The information contained in this e-mail and any accompanying documents may 
contain information that is confidential or otherwise protected from 
disclosure. If you are not the intended recipient of this message, or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message, including any attachments. Any 
dissemination, distribution or other use of the contents of this message by 
anyone other than the intended recipient is strictly prohibited. All messages 
sent to and from this e-mail address may be monitored as permitted by 
applicable law and regulations to ensure compliance with our internal policies 
and to protect our business. E-mails are not secure and cannot

Re: VM lockup due to storage typo

2009-09-16 Thread Brian Nielsen
And you also have to check during DEFINE STORAGE, DEFINE FB-512, and any 

other command or function that creates a pagable CP structure.

Brian Nielsen

On Wed, 16 Sep 2009 09:03:43 -0500, Mike Walter mike.wal...@hewitt.com 

wrote:

I can't support DIRECTXA as the sole examination.  Paging volumes can be

added at any time.  DIRECTXA only gets a change to look when it is run.

If this even needs to be addressed (hence, this thoughtful thread), IMHO

comparing the min and max virtual machine memory specification would be
better done when the virtual machine is being built during
logon/autolog/xautolog.

OTOH, it would not hurt to have DIRECTXA provide that early warning so
that when one finally does attempt to create the virtual machine, any
typos might already have been displayed and corrected when DIRECTXA
provided an early warning.  It's just plain embarrassing for an existing

virtual machine to cause a problem because the sysprog made a wild (or
uninformed) keystroke while editing the directory source ... another
source of sysprog collateral damage.

Mike Walter
Hewitt Associates
The opinions expressed herein are mine alone, not my employer's.



RPN01 nix.rob...@mayo.edu

Sent by: The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU
09/16/2009 08:13 AM
Please respond to
The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: VM lockup due to storage typo






I don't think, in this case, it is the user causing the problem at all.
The
user didn't define their storage allocation, and in practice can't do th
at
at all. So the user didn't set up the situation which caused the integri
ty
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,

decisions are left to him. The system doesn't question what he does,
within
the definition of the syntax, semantics and limitations of the directory

entries and commands. If you want to define a large virtual machine,
should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to a
dd
more page space before logging in the new directory entry. Maybe a warni
ng
of User  exceeds paging space could have averted this situation, b
ut
again, each user would have to be checked against the running system. It

shouldn't keep you from creating the entry, just let you know that there

might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the g
un
at his toes.

--
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
In theory, theory and practice are the same, but
 in practice, theory and practice are different.




On 9/15/09 3:44 PM, Alan Altmark alan_altm...@us.ibm.com wrote:

 On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak
sama...@gizmoworks.com
 wrote:
 I agree with that (the guest cannot be allowed to harm CP) but has
 that
 actually been formally - or even informally - accepted by the Powers
 That
 Be?

 Yes, it is in the Statement of System Integrity in the General
Information
 Manual.

 I ask because I still remember, as though it were yesterday, opening 
a
 security/integrity APAR against VM back in the mid-1980's because any

 class G user could knock CP down by defining a shared and a nonshared

 device on the same virtual control unit, and being told that that was

 NOT
 a security or integrity issue, and that no fix would be forthcoming.

 Under today's rules, that would be an Integrity problem.

 o If a class G (only) user can repeatedly or with malice of forethough
t
 hang or abend CP, it WILL be classified as an integrity problem (denia
l
of
 service).

 o If a class G user happens to do something that triggers an abend or
hang
 due to a system malfunction, it will NOT be classified as an integri
ty
 problem.

 o If the system abends or hangs because it is overloaded (memory, CPU)
,
it
 will NOT be classified as an integrity problem.

 o Just because it isn't an integrity problem doesn't mean it isn't a
 defect.

 Alan Altmark
 z/VM Development
 IBM Endicott






The information contained in this e-mail and any accompanying documents 

may contain information that is confidential or otherwise protected from 

disclosure. If you are not the intended recipient of this message, or if 

this message has been addressed to you in error, please immediately alert
 
the sender by reply e-mail and then delete this message, including any 

attachments. Any dissemination, distribution or other use of the contents
 
of this message by anyone other than the intended recipient is strictly 

prohibited. All

Re: VM lockup due to storage typo

2009-09-16 Thread Rob van der Heij
This gun has been pointing in the same direction forever, but it *is*
a fact that with 64-bit CP the bullets are a lot bigger.

I am sure folks in Edicott are as creative as most of us (or worse,
take a look at ... ;-)  but we know that any safety that CP adds will
annoy people because they forgot to disable it when they still had the
option to do so, or because they drive with the safety off all day
anyway (how many are not using highly privileged CP userid for things
that don't need it - and really, it *is* dangerous)

The problem with the suggested check is that it is stronger than what
most people need. Also, the check is likely to be unfair (aiming at
the wrong victim) and potentially cause a Denial of Service. Would you
want MAINT unable to logon because that 5th Linux guest now logged on
(and you could only add the page pack if you could logon...)  So we
need an option for some users to override it, or an option to enforce
the check only for some users. One means that you may forget the
option, and the other means that within weeks people will ask why
can't I logon my Linux guest and the word will spread that you need
to issue a SET SRM OVERCOMM .

Linux has a similar check in that a process can't allocate more
virtual memory than you have available (in main and on swap, or you
get out-of-memory). This ensures that this process could eventually
get all it asks for. But when it does not immediately reference that
memory, it appears to be still available when the next process
allocates memory. So the check is pretty useless and does not protect
you at all.

I don't do operational work these days, so feel on the peanut gallery.
Maybe I grew up in a rather unique shop (or maybe staff reductions
have gotten rid of that luxury there too) but we had pretty strict
rules to minimize mistakes. Most configuration changes would be
checked by another pair of eyes or some code. Configuration files to
be replaced ran through XDIFF to inspect the changes. The nucleus map
was scanned for text decks picked up from the A-disk, etc. Various
health checks ran to compare RACF and the directory, check for certain
disks filling up, and many more. With CMS Pipelines it is often easy
to get an extra pair of eyes oversee your actions.

Rob


Re: VM lockup due to storage typo

2009-09-16 Thread Tom Duerbusch
If you bought the Dirmaint product or a simular product from another vender, 
couldn't a rule be setup to prevent this?

Anyway, there is not gonna be a way of preventing a systems programmer from 
doing anything we do.  We are suppose to be thinking.

For example, when I initialize, format or copy to a pack, I go thru, at least 3 
checks to make sure I have not transpose the CUA.  Saved me a lot of times.

A system programmer IS dangerous.  We can shutdown the system.  We can destroy 
the system (and then go peacefully in retirement).

You can't fix stupid and we are all, occassionaly, stupid.

Now you had this kind of problem, we all should learn from it.  After defining 
a new guest, log on to that guest and do a Q V ALL and see if it is right.

Been there, done that.

Tom Duerbusch
THD Consulting

Sent via BlackBerry by ATT

-Original Message-
From: RPN01 nix.rob...@mayo.edu

Date: Wed, 16 Sep 2009 08:13:57 
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo


I don't think, in this case, it is the user causing the problem at all. The
user didn't define their storage allocation, and in practice can't do that
at all. So the user didn't set up the situation which caused the integrity
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,
decisions are left to him. The system doesn't question what he does, within
the definition of the syntax, semantics and limitations of the directory
entries and commands. If you want to define a large virtual machine, should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to add
more page space before logging in the new directory entry. Maybe a warning
of User  exceeds paging space could have averted this situation, but
again, each user would have to be checked against the running system. It
shouldn't keep you from creating the entry, just let you know that there
might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the gun
at his toes.

-- 
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
In theory, theory and practice are the same, but
 in practice, theory and practice are different.




On 9/15/09 3:44 PM, Alan Altmark alan_altm...@us.ibm.com wrote:

 On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak sama...@gizmoworks.com
 wrote:
 I agree with that (the guest cannot be allowed to harm CP) but has
 that
 actually been formally - or even informally - accepted by the Powers
 That
 Be?
 
 Yes, it is in the Statement of System Integrity in the General Information
 Manual.
 
 I ask because I still remember, as though it were yesterday, opening a
 security/integrity APAR against VM back in the mid-1980's because any
 class G user could knock CP down by defining a shared and a nonshared
 device on the same virtual control unit, and being told that that was
 NOT
 a security or integrity issue, and that no fix would be forthcoming.
 
 Under today's rules, that would be an Integrity problem.
 
 o If a class G (only) user can repeatedly or with malice of forethought
 hang or abend CP, it WILL be classified as an integrity problem (denial of
 service).
 
 o If a class G user happens to do something that triggers an abend or hang
 due to a system malfunction, it will NOT be classified as an integrity
 problem.
 
 o If the system abends or hangs because it is overloaded (memory, CPU), it
 will NOT be classified as an integrity problem.
 
 o Just because it isn't an integrity problem doesn't mean it isn't a
 defect.
 
 Alan Altmark
 z/VM Development
 IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Alan Altmark
On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 nix.rob...@mayo.edu wrote:
 I don't think, in this case, it is the user causing the problem at all. 
The
 user didn't define their storage allocation, and in practice can't do 
that
 at all. So the user didn't set up the situation which caused the 
integrity
 issue, the system administrator did.

That was my point to Marcy: Not an integrity problem.  The system is 
obeying the sysadmin's instructions.

 To my mind, if this requires addressing, it should be in the DIRECTXA
 command, so as to help the system administrator in avoiding aiming the 
gun
 at his toes.

DIRECTXA has no context in which to make such warnings.  Placing limits at 
LOGON would only apply to resource availability to hold the needed control 
structures.  When the guest begins to run and actually use all that 
memory, then another line of defense is needed.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Schuh, Richard
Logon would not be the right or only place to put it. DEF STOR is another 
possible place to err if the maximum storage was too high. Perhaps a check of 
virtual storage at IPL time. That is a common point that must be traversed no 
matter where the error occurred. 

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
 Sent: Wednesday, September 16, 2009 10:20 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 
 nix.rob...@mayo.edu wrote:
  I don't think, in this case, it is the user causing the 
 problem at all. 
 The
  user didn't define their storage allocation, and in 
 practice can't do
 that
  at all. So the user didn't set up the situation which caused the
 integrity
  issue, the system administrator did.
 
 That was my point to Marcy: Not an integrity problem.  The 
 system is obeying the sysadmin's instructions.
 
  To my mind, if this requires addressing, it should be in 
 the DIRECTXA 
  command, so as to help the system administrator in avoiding 
 aiming the
 gun
  at his toes.
 
 DIRECTXA has no context in which to make such warnings.  
 Placing limits at LOGON would only apply to resource 
 availability to hold the needed control structures.  When the 
 guest begins to run and actually use all that memory, then 
 another line of defense is needed.
 
 Alan Altmark
 z/VM Development
 IBM Endicott
 

Re: VM lockup due to storage typo

2009-09-16 Thread P S
On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard rsc...@visa.com wrote:
 Logon would not be the right or only place to put it. DEF STOR is another 
 possible place to err if the maximum storage was too high. Perhaps a check of 
 virtual storage at IPL time. That is a common point that must be traversed no 
 matter where the error occurred.

Suggest this not get hung up on But it won't be perfect ideas. For
DIRMAINT, perhaps a site configuration option could say Warn me if a
userid is defined with either storage limit above x. Similarly, at
LOGON or DEFINE STORAGE, if the VMsize is  than the total page space
defined, a warning would be useful.

This doesn't help for aggregate overload (20x1GB with 4GB of page
space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system
into the ground before the operator (what operator?) can react, etc.,
but it would at least give some more informed consent.

In this era of Big Numbers and big Linux guests, this is probably more
important than it used to be -- in days of yore, if you accidentally
defined a 32MB guest on an 8MB system, (a) there probably WAS enough
page space, and (b) the user was probably CMS and wouldn't touch the
pages that fast anyway.


Re: VM lockup due to storage typo

2009-09-16 Thread Huegel, Thomas
I don't know that I want CP to do anything different than it does now
EXCEPT I want z/VM to a) keep running and b) have some facility that I
can use to be able to examine the system to find/fix the problem... I
don't know/care how that get's done, maybe reserving some page space for
CP and/or a special 'hook' into the HMC.. I'll leave that up to the
developers.   

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of P S
Sent: Wednesday, September 16, 2009 12:53 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard rsc...@visa.com
wrote:
 Logon would not be the right or only place to put it. DEF STOR is
another possible place to err if the maximum storage was too high.
Perhaps a check of virtual storage at IPL time. That is a common point
that must be traversed no matter where the error occurred.

Suggest this not get hung up on But it won't be perfect ideas. For
DIRMAINT, perhaps a site configuration option could say Warn me if a
userid is defined with either storage limit above x. Similarly, at
LOGON or DEFINE STORAGE, if the VMsize is  than the total page space
defined, a warning would be useful.

This doesn't help for aggregate overload (20x1GB with 4GB of page
space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system
into the ground before the operator (what operator?) can react, etc.,
but it would at least give some more informed consent.

In this era of Big Numbers and big Linux guests, this is probably more
important than it used to be -- in days of yore, if you accidentally
defined a 32MB guest on an 8MB system, (a) there probably WAS enough
page space, and (b) the user was probably CMS and wouldn't touch the
pages that fast anyway.


Re: VM lockup due to storage typo

2009-09-16 Thread David Boyes
On 9/15/09 12:09 PM, Daniel P. Martin dmar...@gizmoworks.com wrote:

 *cough*SHARE requirement?*cough*

WAVV requirement WRIBDB04 submitted.

I suggested a SYSTEM CONFIG option and corresponding SET command to warn
user/operator and optionally halt IPL if a user requested LOGON or issued an
IPL command with a default VM size greater than the sum of real memory and
configured PAGE space. Normal setting would be MEMSANITY ON, but the SET
MEMSANITY OFF command would still allow experienced admins to shoot
themselves in the foot if necessary.

IBM: Since I seem to be Designated Requirements Dude these days, maybe you
should just give me direct login access to the requirements DB. It'd save
time, and you'd get requirements earlier in the planning cycle. 8-)

-- db


Re: VM lockup due to storage typo

2009-09-16 Thread Lee Stewart

I guess as the one who got bit, I'd offer one easy suggestion...

The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it 
to 8T as the architecture limit.  Why not have an option (not enabled by 
default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It could 
take numbers (like the USER storage specification), or OFF to indicate 
no checking.   And maybe something like RSS for Real Storage Size to say 
you can't logon with or define storage to more than the amount of Real 
Storage.


And if you really wanted a full circle, then a directory option that 
said this one user could override that setting.


That said I'm kind of swamped for the next two weeks, but after that if 
someone wants to coach me on writing a requirement, I will...


Lee

Alan Altmark wrote:

On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 nix.rob...@mayo.edu wrote:
I don't think, in this case, it is the user causing the problem at all. 

The
user didn't define their storage allocation, and in practice can't do 

that
at all. So the user didn't set up the situation which caused the 

integrity

issue, the system administrator did.


That was my point to Marcy: Not an integrity problem.  The system is 
obeying the sysadmin's instructions.



To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the 

gun

at his toes.


DIRECTXA has no context in which to make such warnings.  Placing limits at 
LOGON would only apply to resource availability to hold the needed control 
structures.  When the guest begins to run and actually use all that 
memory, then another line of defense is needed.


Alan Altmark
z/VM Development
IBM Endicott




--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-16 Thread Ethan Lanz
On Wed, Sep 16, 2009 at 3:06 PM, Huegel, Thomas thue...@kable.com wrote:

 I don't know that I want CP to do anything different than it does now
 EXCEPT I want z/VM to a) keep running and b) have some facility that I
 can use to be able to examine the system to find/fix the problem... I


I agree.  The mainframe has a long history of managing over committed
resources, but Linux is presenting new challenges since it was not written
to be virtualized.

Rob noted earlier:
 One of the problems with booting Linux is that it determines the size
 of the virtual machine by testing pages rather than ask CP about it.

It seems to me that this will become a problem in other virtual environments
as well and, similar to the timer tick problem, another opportunity for the
mainframe to show Linux a better way to behave.

If Linux does not use up all available space when it starts, there is
opportunity to monitor and intervene before it gets critical. Then we do not
have to worry about making sure all our virtual blocks fit in the virtual
toy box.


 don't know/care how that get's done, maybe reserving some page space for
 CP and/or a special 'hook' into the HMC.. I'll leave that up to the
 developers.

 -Original Message-

 From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
 Behalf Of P S
 Sent: Wednesday, September 16, 2009 12:53 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo

 On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard rsc...@visa.com
 wrote:
  Logon would not be the right or only place to put it. DEF STOR is
 another possible place to err if the maximum storage was too high.
 Perhaps a check of virtual storage at IPL time. That is a common point
 that must be traversed no matter where the error occurred.

 Suggest this not get hung up on But it won't be perfect ideas. For
 DIRMAINT, perhaps a site configuration option could say Warn me if a
 userid is defined with either storage limit above x. Similarly, at
 LOGON or DEFINE STORAGE, if the VMsize is  than the total page space
 defined, a warning would be useful.

 This doesn't help for aggregate overload (20x1GB with 4GB of page
 space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system
 into the ground before the operator (what operator?) can react, etc.,
 but it would at least give some more informed consent.

 In this era of Big Numbers and big Linux guests, this is probably more
 important than it used to be -- in days of yore, if you accidentally
 defined a 32MB guest on an 8MB system, (a) there probably WAS enough
 page space, and (b) the user was probably CMS and wouldn't touch the
 pages that fast anyway.


Ethan


Re: VM lockup due to storage typo

2009-09-16 Thread Alan Altmark
On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart 
lstewart.dsgr...@attglobal.net wrote:
 I guess as the one who got bit, I'd offer one easy suggestion...
 
 The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it
 to 8T as the architecture limit.  Why not have an option (not enabled by
 default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It could
 take numbers (like the USER storage specification), or OFF to indicate
 no checking.   And maybe something like RSS for Real Storage Size to say
 you can't logon with or define storage to more than the amount of Real
 Storage.
 
 And if you really wanted a full circle, then a directory option that
 said this one user could override that setting.
 
 That said I'm kind of swamped for the next two weeks, but after that if
 someone wants to coach me on writing a requirement, I will...

For DIRMAINT, look at the DVHXRA/B/C exits to implement whatever kind of 
policy limits you like.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Ron Schmiedge
I've been trying to follow the discussion and wondering if the
directory control statement

MAXSTORAGE

would have provided some protection from the finger check problem?



On Wed, Sep 16, 2009 at 2:59 PM, Alan Altmark alan_altm...@us.ibm.com wrote:
 On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart
 lstewart.dsgr...@attglobal.net wrote:
 I guess as the one who got bit, I'd offer one easy suggestion...

 The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it
 to 8T as the architecture limit.  Why not have an option (not enabled by
 default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It could
 take numbers (like the USER storage specification), or OFF to indicate
 no checking.   And maybe something like RSS for Real Storage Size to say
 you can't logon with or define storage to more than the amount of Real
 Storage.

 And if you really wanted a full circle, then a directory option that
 said this one user could override that setting.

 That said I'm kind of swamped for the next two weeks, but after that if
 someone wants to coach me on writing a requirement, I will...

 For DIRMAINT, look at the DVHXRA/B/C exits to implement whatever kind of
 policy limits you like.

 Alan Altmark
 z/VM Development
 IBM Endicott



Re: VM lockup due to storage typo

2009-09-16 Thread Schuh, Richard
Only if it were included in every directory entry, or at least the one in 
question. Having a global MAXSTORAGE would be better protection.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Ron Schmiedge
 Sent: Wednesday, September 16, 2009 2:20 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 I've been trying to follow the discussion and wondering if 
 the directory control statement
 
 MAXSTORAGE
 
 would have provided some protection from the finger check problem?
 
 
 
 On Wed, Sep 16, 2009 at 2:59 PM, Alan Altmark 
 alan_altm...@us.ibm.com wrote:
  On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart 
  lstewart.dsgr...@attglobal.net wrote:
  I guess as the one who got bit, I'd offer one easy suggestion...
 
  The finger check asked for 9728G (9.7+T), VM 
 unceremoniously chopped 
  it to 8T as the architecture limit.  Why not have an option (not 
  enabled by
  default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It 
  could take numbers (like the USER storage specification), 
 or OFF to 
  indicate no checking.   And maybe something like RSS for 
 Real Storage 
  Size to say you can't logon with or define storage to more 
 than the 
  amount of Real Storage.
 
  And if you really wanted a full circle, then a directory 
 option that 
  said this one user could override that setting.
 
  That said I'm kind of swamped for the next two weeks, but 
 after that 
  if someone wants to coach me on writing a requirement, I will...
 
  For DIRMAINT, look at the DVHXRA/B/C exits to implement 
 whatever kind 
  of policy limits you like.
 
  Alan Altmark
  z/VM Development
  IBM Endicott
 
 

Re: VM lockup due to storage typo

2009-09-16 Thread Lee Stewart
Not really as we were dealing with a lot of guests.  So the only 
practical place to put it would be in a profile.  But according to usage 
note #1:  A maximum storage setting on a USER statement overrides a 
MAXSTORAGE statement in a profile.


So it would have no effect...

Lee

Ron Schmiedge wrote:

I've been trying to follow the discussion and wondering if the
directory control statement

MAXSTORAGE

would have provided some protection from the finger check problem?


--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-16 Thread Ron Schmiedge
Unless you set MAXSTORAGE in the profile and used * as the upper limit
in the USER entry. Then if you change the lower limit to be higher
than the setting in the profile, you get an error.

On Wed, Sep 16, 2009 at 3:48 PM, Lee Stewart
lstewart.dsgr...@attglobal.net wrote:
 Not really as we were dealing with a lot of guests.  So the only practical
 place to put it would be in a profile.  But according to usage note #1:  A
 maximum storage setting on a USER statement overrides a MAXSTORAGE statement
 in a profile.

 So it would have no effect...

 Lee

 Ron Schmiedge wrote:

 I've been trying to follow the discussion and wondering if the
 directory control statement

 MAXSTORAGE

 would have provided some protection from the finger check problem?

 --

 Lee Stewart, Senior SE
 Sirius Computer Solutions
 Phone: (303) 996-7122
 Email: lee.stew...@siriuscom.com
 Web:   www.siriuscom.com



Re: VM lockup due to storage typo

2009-09-15 Thread Marcy Cortes
See a thread on this list with subject Sanity check? from Oct 2007 for what 
happened when I did the same thing ;)

You probably filled page space.

I still think IBM should refuse to IPL a guest that will cause such harm.


Marcy 

This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?

VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.

But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.

We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.

Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...

I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..

Any thoughts?
Lee
-- 

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-15 Thread O'Brien, Dennis L
Lee,
Do the userid you were trying to log onto and your external security manager 
both have OPTION QUICKDSP in the directory?  Your operator userid should also 
have QUICKDSP.

         Dennis O'Brien

My computer beat me at chess, but it was no match for me in kickboxing.

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 08:39
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?

VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.

But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.

We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.

Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...

I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..

Any thoughts?
Lee
-- 

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-15 Thread Lee Stewart
No ESM (yet)...  Operator and Maint both have QUICKDSP, the Linux guests 
do NOT have it...

Lee

O'Brien, Dennis L wrote:

Lee,
Do the userid you were trying to log onto and your external security manager 
both have OPTION QUICKDSP in the directory?  Your operator userid should also 
have QUICKDSP.

 Dennis O'Brien

My computer beat me at chess, but it was no match for me in kickboxing.

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 08:39
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.


But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..


Any thoughts?
Lee


--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-15 Thread Tom Duerbusch
CP wouldn't know at IPL time, the guest would, not could, but would cause such 
harm.

Just because you say you can use xxx GB, doesn't mean you would actually use 
them.

When page fills, it over flows to spool.
When spool fills, CP abends on the next pageout.

Tom Duerbusch
THD Consulting

 Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009 11:02 AM 
See a thread on this list with subject Sanity check? from Oct 2007 for what 
happened when I did the same thing ;)

You probably filled page space.

I still think IBM should refuse to IPL a guest that will cause such harm.


Marcy 

This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU 
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?

VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.

But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.

We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.

Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...

I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..

Any thoughts?
Lee
-- 

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com 
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
Both Page and Spool space!!! When you get to the end of spool, there is nothing 
further that can be done. This ought to be considered a bug. Surely CP has the 
information it needs to determine that the virtual storage size is way too big 
to be accommodated and should reject the logon. This ought to be reported. If 
the logon is not rejected, CP ought to keep enough storage in reserve so that 
the operator can still get in and force the offender. 

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
 Sent: Tuesday, September 15, 2009 9:03 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 See a thread on this list with subject Sanity check? from 
 Oct 2007 for what happened when I did the same thing ;)
 
 You probably filled page space.
 
 I still think IBM should refuse to IPL a guest that will 
 cause such harm.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
 Sent: Tuesday, September 15, 2009 8:39 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: [IBMVM] VM lockup due to storage typo
 
 Does anyone have an idea of how we might have gotten out of 
 this without an IPL?
 
 VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
 Several guests needed more memory added so the directory was 
 updated and one by one the guests shutdown, logged off and 
 back on.  So far, so good.
 
 But... In changing the memory for many guests, and it being 
 late at night after a long day, while meaning to set a 
 guest's memory to 9728M, it got set to 9728G.  When that 
 guest was cycled we see the message on the console that it's 
 memory was limited to 8TB (HCPLGN093E), then the VM system 
 appeared to freeze.
 
 We couldn't get in via TCP/IP, or the HMC Operating System 
 Messages screen, or the HMC Integrated 3270.
 
 Finally had to IPL.   Even that was wierd as I'd have 
 expected the Load 
 Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
 and all came back up ok...
 
 I suspect CP was scrambling paging everything in the world 
 out as Linux 
 tried to initialize that 8TB of memory...   But I'm surprised 
 I couldn't 
 even get into the HMC consoles (to kill just that one guest 
 as opposed to all of them)..
 
 Any thoughts?
 Lee
 -- 
 
 Lee Stewart, Senior SE
 Sirius Computer Solutions
 Phone: (303) 996-7122
 Email: lee.stew...@siriuscom.com
 Web:   www.siriuscom.com
 

Re: VM lockup due to storage typo

2009-09-15 Thread Daniel P. Martin

*cough*SHARE requirement?*cough*

Marcy Cortes wrote:

See a thread on this list with subject Sanity check? from Oct 2007 for what 
happened when I did the same thing ;)

You probably filled page space.

I still think IBM should refuse to IPL a guest that will cause such harm.


Marcy 


This message may contain confidential and/or privileged information. If you are not 
the addressee or authorized to receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply e-mail and 
delete this message. Thank you for your cooperation.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.


But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..


Any thoughts?
Lee
  


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
This should be treated as a bug. It is not an enhancement or new feature, it 
brought a running system down. And it probably did not take a dump. 

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Daniel P. Martin
 Sent: Tuesday, September 15, 2009 9:09 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 *cough*SHARE requirement?*cough*
 
 Marcy Cortes wrote:
  See a thread on this list with subject Sanity check? from 
 Oct 2007 
  for what happened when I did the same thing ;)
 
  You probably filled page space.
 
  I still think IBM should refuse to IPL a guest that will 
 cause such harm.
 
 
  Marcy
 
  This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
  -Original Message-
  From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] 
  On Behalf Of Lee Stewart
  Sent: Tuesday, September 15, 2009 8:39 AM
  To: IBMVM@LISTSERV.UARK.EDU
  Subject: [IBMVM] VM lockup due to storage typo
 
  Does anyone have an idea of how we might have gotten out of this 
  without an IPL?
 
  VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
  Several guests needed more memory added so the directory 
 was updated 
  and one by one the guests shutdown, logged off and back on. 
  So far, so good.
 
  But... In changing the memory for many guests, and it being late at 
  night after a long day, while meaning to set a guest's memory to 
  9728M, it got set to 9728G.  When that guest was cycled we see the 
  message on the console that it's memory was limited to 8TB 
  (HCPLGN093E), then the VM system appeared to freeze.
 
  We couldn't get in via TCP/IP, or the HMC Operating System Messages 
  screen, or the HMC Integrated 3270.
 
  Finally had to IPL.   Even that was wierd as I'd have 
 expected the Load 
  Normal to shutdown, it just IPLed.   We did NoAutolog, 
 fixed the typo 
  and all came back up ok...
 
  I suspect CP was scrambling paging everything in the world 
 out as Linux 
  tried to initialize that 8TB of memory...   But I'm 
 surprised I couldn't 
  even get into the HMC consoles (to kill just that one guest 
 as opposed 
  to all of them)..
 
  Any thoughts?
  Lee

 

Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
Maybe CP couldn't know that the guest would do something bad, but it should 
know that it has opened itself to the possibility that the guest could, in 
normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is that the 
guest cannot be allowed to harm the CP. This clearly violates that.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
 Sent: Tuesday, September 15, 2009 9:19 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 CP wouldn't know at IPL time, the guest would, not could, but 
 would cause such harm.
 
 Just because you say you can use xxx GB, doesn't mean you 
 would actually use them.
 
 When page fills, it over flows to spool.
 When spool fills, CP abends on the next pageout.
 
 Tom Duerbusch
 THD Consulting
 
  Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009 
 11:02 AM 
 See a thread on this list with subject Sanity check? from 
 Oct 2007 for what happened when I did the same thing ;)
 
 You probably filled page space.
 
 I still think IBM should refuse to IPL a guest that will 
 cause such harm.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
 Sent: Tuesday, September 15, 2009 8:39 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: [IBMVM] VM lockup due to storage typo
 
 Does anyone have an idea of how we might have gotten out of 
 this without an IPL?
 
 VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
 Several guests needed more memory added so the directory was 
 updated and one by one the guests shutdown, logged off and 
 back on.  So far, so good.
 
 But... In changing the memory for many guests, and it being 
 late at night after a long day, while meaning to set a 
 guest's memory to 9728M, it got set to 9728G.  When that 
 guest was cycled we see the message on the console that it's 
 memory was limited to 8TB (HCPLGN093E), then the VM system 
 appeared to freeze.
 
 We couldn't get in via TCP/IP, or the HMC Operating System 
 Messages screen, or the HMC Integrated 3270.
 
 Finally had to IPL.   Even that was wierd as I'd have 
 expected the Load 
 Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
 and all came back up ok...
 
 I suspect CP was scrambling paging everything in the world 
 out as Linux 
 tried to initialize that 8TB of memory...   But I'm surprised 
 I couldn't 
 even get into the HMC consoles (to kill just that one guest 
 as opposed to all of them)..
 
 Any thoughts?
 Lee
 -- 
 
 Lee Stewart, Senior SE
 Sirius Computer Solutions
 Phone: (303) 996-7122
 Email: lee.stew...@siriuscom.com 
 Web:   www.siriuscom.com
 

Re: VM lockup due to storage typo

2009-09-15 Thread Thomas Kern
The difference between CMS and Linux in this case is just a matter of tim
e
before problems occur. Linux wants to use all of its storage early, CMS u
ses
all of its storage over time. Both will use all of their storage eventual
ly. 

CP is built to overcommit storage. It just lets you REALLY overcommit
storage. But it would be nice if there was some sort of sanity check in
there somewhere. 

/Tom Kern

On Tue, 15 Sep 2009 13:12:38 -0400, Bruce Hayden bjhay...@gmail.com wro
te:

The problem isn't that you did an IPL, it is that you IPLed Linux.  An
IPL of CMS in an 8 TB machine doesn't have any delay or cause a
problem:

def stor 8t
STORAGE = 8T
Storage cleared - system reset.
i cms
z/VM V5.4.02009-07-13 11:58

Ready; T=0.01/0.01 13:06:21
q v stor
STORAGE = 8T
Ready; T=0.01/0.01 13:06:26
q stor
STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED =
 0
Ready; T=0.01/0.01 13:06:57

An IPL of ZCMS blows up, though.  Maybe they didn't test it with that
large storage.

On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes
marcy.d.cor...@wellsfargo.com wrote:
 See a thread on this list with subject Sanity check? from Oct 2007 f
or
what happened when I did the same thing ;)

 You probably filled page space.

 I still think IBM should refuse to IPL a guest that will cause such ha
rm.


 Marcy



--
Bruce Hayden
Linux on System z Advanced Technical Support
IBM, Endicott, NY


Re: VM lockup due to storage typo

2009-09-15 Thread Tom Duerbusch
CMS will free its storage after the command is complete.

However, do a peek on a very large reader element, such as a OS dump, and CMS 
just might use up all of its storage, just like any other guest might.

It isn't a matter of time, it is a matter of usage.

Tom Duerbusch
THD Consulting

 Thomas Kern tlk_sysp...@yahoo.com 9/15/2009 12:48 PM 
The difference between CMS and Linux in this case is just a matter of time
before problems occur. Linux wants to use all of its storage early, CMS uses
all of its storage over time. Both will use all of their storage eventually. 

CP is built to overcommit storage. It just lets you REALLY overcommit
storage. But it would be nice if there was some sort of sanity check in
there somewhere. 

/Tom Kern

On Tue, 15 Sep 2009 13:12:38 -0400, Bruce Hayden bjhay...@gmail.com wrote:

The problem isn't that you did an IPL, it is that you IPLed Linux.  An
IPL of CMS in an 8 TB machine doesn't have any delay or cause a
problem:

def stor 8t
STORAGE = 8T
Storage cleared - system reset.
i cms
z/VM V5.4.02009-07-13 11:58

Ready; T=0.01/0.01 13:06:21
q v stor
STORAGE = 8T
Ready; T=0.01/0.01 13:06:26
q stor
STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED = 0
Ready; T=0.01/0.01 13:06:57

An IPL of ZCMS blows up, though.  Maybe they didn't test it with that
large storage.

On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes
marcy.d.cor...@wellsfargo.com wrote:
 See a thread on this list with subject Sanity check? from Oct 2007 for
what happened when I did the same thing ;)

 You probably filled page space.

 I still think IBM should refuse to IPL a guest that will cause such harm.


 Marcy



--
Bruce Hayden
Linux on System z Advanced Technical Support
IBM, Endicott, NY


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
CMS, being a 32-bit system, will probably never use 3TB of memory. Perhaps 
z/CMS, when it becomes a reality, might but the current CMS is another story. 

Regards, 
Richard Schuh 

 

  CMS u= ses all of its storage over 
 time. Both will use all of their storage eventual= ly. 
 


Re: VM lockup due to storage typo

2009-09-15 Thread Gentry, Stephen
What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is
that the guest cannot be allowed to harm the CP. This clearly violates
that.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
 Sent: Tuesday, September 15, 2009 9:19 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 CP wouldn't know at IPL time, the guest would, not could, but 
 would cause such harm.
 
 Just because you say you can use xxx GB, doesn't mean you 
 would actually use them.
 
 When page fills, it over flows to spool.
 When spool fills, CP abends on the next pageout.
 
 Tom Duerbusch
 THD Consulting
 
  Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009 
 11:02 AM 
 See a thread on this list with subject Sanity check? from 
 Oct 2007 for what happened when I did the same thing ;)
 
 You probably filled page space.
 
 I still think IBM should refuse to IPL a guest that will 
 cause such harm.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
 Sent: Tuesday, September 15, 2009 8:39 AM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: [IBMVM] VM lockup due to storage typo
 
 Does anyone have an idea of how we might have gotten out of 
 this without an IPL?
 
 VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
 Several guests needed more memory added so the directory was 
 updated and one by one the guests shutdown, logged off and 
 back on.  So far, so good.
 
 But... In changing the memory for many guests, and it being 
 late at night after a long day, while meaning to set a 
 guest's memory to 9728M, it got set to 9728G.  When that 
 guest was cycled we see the message on the console that it's 
 memory was limited to 8TB (HCPLGN093E), then the VM system 
 appeared to freeze.
 
 We couldn't get in via TCP/IP, or the HMC Operating System 
 Messages screen, or the HMC Integrated 3270.
 
 Finally had to IPL.   Even that was wierd as I'd have 
 expected the Load 
 Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
 and all came back up ok...
 
 I suspect CP was scrambling paging everything in the world 
 out as Linux 
 tried to initialize that 8TB of memory...   But I'm surprised 
 I couldn't 
 even get into the HMC consoles (to kill just that one guest 
 as opposed to all of them)..
 
 Any thoughts?
 Lee
 -- 
 
 Lee Stewart, Senior SE
 Sirius Computer Solutions
 Phone: (303) 996-7122
 Email: lee.stew...@siriuscom.com 
 Web:   www.siriuscom.com
 


Re: VM lockup due to storage typo

2009-09-15 Thread Steve Marak
I agree with that (the guest cannot be allowed to harm CP) but has that 
actually been formally - or even informally - accepted by the Powers That 
Be?

I ask because I still remember, as though it were yesterday, opening a 
security/integrity APAR against VM back in the mid-1980's because any 
class G user could knock CP down by defining a shared and a nonshared 
device on the same virtual control unit, and being told that that was NOT 
a security or integrity issue, and that no fix would be forthcoming. 

But at least I'm not bitter about it. 

Steve

On Tue, 15 Sep 2009, Schuh, Richard wrote:

 One of Alan's first precepts of information security and integrity is 
 that the guest cannot be allowed to harm the CP. This clearly violates 
 that.
 
 Regards, 
 Richard Schuh 

-- Steve Marak
-- sama...@gizmoworks.com


Re: VM lockup due to storage typo

2009-09-15 Thread Tom Duerbusch
Good point.

When I have hit this, I got a PAGxxx type error and CP automatically reipl'ed.

Like I said, when the offending user starts allocating pages, all the other 
machines will abend on a paging error when their recently used pages are tried 
to be paged out.  Eventually, some of CP pagable pages will be the least 
recently used pages and BAM!  PAGxxx CP abend.  Automatic restart in progress...

Tom Duerbusch
THD Consulting

 Gentry, Stephen stephen.gen...@lafayettelife.com 9/15/2009 12:13 PM 
What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU 
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is
that the guest cannot be allowed to harm the CP. This clearly violates
that.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
 Sent: Tuesday, September 15, 2009 9:19 AM
 To: IBMVM@LISTSERV.UARK.EDU 
 Subject: Re: VM lockup due to storage typo
 
 CP wouldn't know at IPL time, the guest would, not could, but 
 would cause such harm.
 
 Just because you say you can use xxx GB, doesn't mean you 
 would actually use them.
 
 When page fills, it over flows to spool.
 When spool fills, CP abends on the next pageout.
 
 Tom Duerbusch
 THD Consulting
 
  Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009 
 11:02 AM 
 See a thread on this list with subject Sanity check? from 
 Oct 2007 for what happened when I did the same thing ;)
 
 You probably filled page space.
 
 I still think IBM should refuse to IPL a guest that will 
 cause such harm.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
 Sent: Tuesday, September 15, 2009 8:39 AM
 To: IBMVM@LISTSERV.UARK.EDU 
 Subject: [IBMVM] VM lockup due to storage typo
 
 Does anyone have an idea of how we might have gotten out of 
 this without an IPL?
 
 VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
 Several guests needed more memory added so the directory was 
 updated and one by one the guests shutdown, logged off and 
 back on.  So far, so good.
 
 But... In changing the memory for many guests, and it being 
 late at night after a long day, while meaning to set a 
 guest's memory to 9728M, it got set to 9728G.  When that 
 guest was cycled we see the message on the console that it's 
 memory was limited to 8TB (HCPLGN093E), then the VM system 
 appeared to freeze.
 
 We couldn't get in via TCP/IP, or the HMC Operating System 
 Messages screen, or the HMC Integrated 3270.
 
 Finally had to IPL.   Even that was wierd as I'd have 
 expected the Load 
 Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
 and all came back up ok...
 
 I suspect CP was scrambling paging everything in the world 
 out as Linux 
 tried to initialize that 8TB of memory...   But I'm surprised 
 I couldn't 
 even get into the HMC consoles (to kill just that one guest 
 as opposed to all of them)..
 
 Any thoughts?
 Lee
 -- 
 
 Lee Stewart, Senior SE
 Sirius Computer Solutions
 Phone: (303) 996-7122
 Email: lee.stew...@siriuscom.com 
 Web:   www.siriuscom.com 
 


Re: VM lockup due to storage typo

2009-09-15 Thread Alan Altmark
On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak sama...@gizmoworks.com 
wrote:
 I agree with that (the guest cannot be allowed to harm CP) but has 
that
 actually been formally - or even informally - accepted by the Powers 
That
 Be?

Yes, it is in the Statement of System Integrity in the General Information 
Manual.

 I ask because I still remember, as though it were yesterday, opening a
 security/integrity APAR against VM back in the mid-1980's because any
 class G user could knock CP down by defining a shared and a nonshared
 device on the same virtual control unit, and being told that that was 
NOT
 a security or integrity issue, and that no fix would be forthcoming.

Under today's rules, that would be an Integrity problem.

o If a class G (only) user can repeatedly or with malice of forethought 
hang or abend CP, it WILL be classified as an integrity problem (denial of 
service).

o If a class G user happens to do something that triggers an abend or hang 
due to a system malfunction, it will NOT be classified as an integrity 
problem.

o If the system abends or hangs because it is overloaded (memory, CPU), it 
will NOT be classified as an integrity problem.

o Just because it isn't an integrity problem doesn't mean it isn't a 
defect.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-15 Thread Marcy Cortes
So are you saying that what Lee and I both did to shoot our systems should 
APAR'able?  Or should it be a requirement?  Or is it going to be a your gun, 
your foot answer?


Marcy 
 
This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Alan Altmark
Sent: Tuesday, September 15, 2009 1:45 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: [IBMVM] VM lockup due to storage typo

On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak sama...@gizmoworks.com 
wrote:
 I agree with that (the guest cannot be allowed to harm CP) but has 
that
 actually been formally - or even informally - accepted by the Powers 
That
 Be?

Yes, it is in the Statement of System Integrity in the General Information 
Manual.

 I ask because I still remember, as though it were yesterday, opening a
 security/integrity APAR against VM back in the mid-1980's because any
 class G user could knock CP down by defining a shared and a nonshared
 device on the same virtual control unit, and being told that that was 
NOT
 a security or integrity issue, and that no fix would be forthcoming.

Under today's rules, that would be an Integrity problem.

o If a class G (only) user can repeatedly or with malice of forethought 
hang or abend CP, it WILL be classified as an integrity problem (denial of 
service).

o If a class G user happens to do something that triggers an abend or hang 
due to a system malfunction, it will NOT be classified as an integrity 
problem.

o If the system abends or hangs because it is overloaded (memory, CPU), it 
will NOT be classified as an integrity problem.

o Just because it isn't an integrity problem doesn't mean it isn't a 
defect.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-15 Thread Lee Stewart
From the tn3270 sessions hanging to the phone call to me - 2-3 minutes. 
 From then till we decided we had to IPL - maybe 15-20 minutes.  But 30 
minutes (maybe 45-60 till all the apps were back up) on a major online 
system is a lot.   It was 35 minutes from the message capping the 
virtual storage at 8TB till the IPL time from Q CPLEVEL.  So no, not 
long considering the size.  And yes, I suspect it would PGT004 eventually.


And yes, if CP unceremoniously chopped my wrong size from 9.7TB to 8TB, 
why could it not do the same to either a user specified system limit or 
a this is the biggest machine this CP can run in this configuration...


Lee

Gentry, Stephen wrote:

What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is

that the guest cannot be allowed to harm the CP. This clearly violates
that.

Regards, 
Richard Schuh 

 


-Original Message-
From: The IBM z/VM Operating System 
[mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch

Sent: Tuesday, September 15, 2009 9:19 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

CP wouldn't know at IPL time, the guest would, not could, but 
would cause such harm.


Just because you say you can use xxx GB, doesn't mean you 
would actually use them.


When page fills, it over flows to spool.
When spool fills, CP abends on the next pageout.

Tom Duerbusch
THD Consulting

Marcy Cortes marcy.d.cor...@wellsfargo.com 9/15/2009 

11:02 AM 
See a thread on this list with subject Sanity check? from 
Oct 2007 for what happened when I did the same thing ;)


You probably filled page space.

I still think IBM should refuse to IPL a guest that will 
cause such harm.



Marcy 

This message may contain confidential and/or privileged 
information. If you are not the addressee or authorized to 
receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any 
information herein. If you have received this message in 
error, please advise the sender immediately by reply e-mail 
and delete this message. Thank you for your cooperation.



-Original Message-
From: The IBM z/VM Operating System 
[mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart

Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of 
this without an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was 
updated and one by one the guests shutdown, logged off and 
back on.  So far, so good.


But... In changing the memory for many guests, and it being 
late at night after a long day, while meaning to set a 
guest's memory to 9728M, it got set to 9728G.  When that 
guest was cycled we see the message on the console that it's 
memory was limited to 8TB (HCPLGN093E), then the VM system 
appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System 
Messages screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have 
expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world 
out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised 
I couldn't 
even get into the HMC consoles (to kill just that one guest 
as opposed to all of them)..


Any thoughts?
Lee
--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com 
Web:   www.siriuscom.com







--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996

Re: VM lockup due to storage typo

2009-09-15 Thread Lee Stewart

Gee, I guess we're in good company!   ;-)

It does seem to me that CP should be smart enough to look at a 175GB 
real storage, 4GB Xstor, and xx number of page packs and say not in our 
wildest dreams can we run an 8TB virtual guest...


Or maybe at the point that the 8TB guest starts choking off all other 
activity and wildly filling page space


Lee

Marcy Cortes wrote:

See a thread on this list with subject Sanity check? from Oct 2007 for what 
happened when I did the same thing ;)

You probably filled page space.

I still think IBM should refuse to IPL a guest that will cause such harm.


Marcy 


This message may contain confidential and/or privileged information. If you are not 
the addressee or authorized to receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply e-mail and 
delete this message. Thank you for your cooperation.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.


But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..


Any thoughts?
Lee


--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
Seems to me that he said it was either an integrity problem or a defect. I 
would think that either would me meat for the APAR grinder.

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
 Sent: Tuesday, September 15, 2009 1:50 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 So are you saying that what Lee and I both did to shoot our 
 systems should APAR'able?  Or should it be a requirement?  Or 
 is it going to be a your gun, your foot answer?
 
 
 Marcy 
  
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
 Sent: Tuesday, September 15, 2009 1:45 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: [IBMVM] VM lockup due to storage typo
 
 On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
 sama...@gizmoworks.com
 wrote:
  I agree with that (the guest cannot be allowed to harm CP) but has
 that
  actually been formally - or even informally - accepted by the Powers
 That
  Be?
 
 Yes, it is in the Statement of System Integrity in the 
 General Information Manual.
 
  I ask because I still remember, as though it were 
 yesterday, opening a 
  security/integrity APAR against VM back in the mid-1980's 
 because any 
  class G user could knock CP down by defining a shared and a 
 nonshared 
  device on the same virtual control unit, and being told 
 that that was
 NOT
  a security or integrity issue, and that no fix would be forthcoming.
 
 Under today's rules, that would be an Integrity problem.
 
 o If a class G (only) user can repeatedly or with malice of 
 forethought hang or abend CP, it WILL be classified as an 
 integrity problem (denial of service).
 
 o If a class G user happens to do something that triggers an 
 abend or hang due to a system malfunction, it will NOT be 
 classified as an integrity problem.
 
 o If the system abends or hangs because it is overloaded 
 (memory, CPU), it will NOT be classified as an integrity problem.
 
 o Just because it isn't an integrity problem doesn't mean it 
 isn't a defect.
 
 Alan Altmark
 z/VM Development
 IBM Endicott
 

Re: VM lockup due to storage typo

2009-09-15 Thread Huegel, Thomas
I would think that IBM would be scurring to fix what is obviously a
problem.
After all they are not Microsoft... 

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 4:13 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Seems to me that he said it was either an integrity problem or a defect.
I would think that either would me meat for the APAR grinder.

Regards,
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System
 [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
 Sent: Tuesday, September 15, 2009 1:50 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
 So are you saying that what Lee and I both did to shoot our systems 
 should APAR'able?  Or should it be a requirement?  Or is it going to 
 be a your gun, your foot answer?
 
 
 Marcy
  
 This message may contain confidential and/or privileged information. 
 If you are not the addressee or authorized to receive this for the 
 addressee, you must not use, copy, disclose, or take any action based 
 on this message or any information herein. If you have received this 
 message in error, please advise the sender immediately by reply e-mail

 and delete this message. Thank you for your cooperation.
 
 
 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
 Sent: Tuesday, September 15, 2009 1:45 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: [IBMVM] VM lockup due to storage typo
 
 On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
 sama...@gizmoworks.com
 wrote:
  I agree with that (the guest cannot be allowed to harm CP) but has
 that
  actually been formally - or even informally - accepted by the Powers
 That
  Be?
 
 Yes, it is in the Statement of System Integrity in the 
 General Information Manual.
 
  I ask because I still remember, as though it were 
 yesterday, opening a 
  security/integrity APAR against VM back in the mid-1980's 
 because any 
  class G user could knock CP down by defining a shared and a 
 nonshared 
  device on the same virtual control unit, and being told 
 that that was
 NOT
  a security or integrity issue, and that no fix would be forthcoming.
 
 Under today's rules, that would be an Integrity problem.
 
 o If a class G (only) user can repeatedly or with malice of 
 forethought hang or abend CP, it WILL be classified as an 
 integrity problem (denial of service).
 
 o If a class G user happens to do something that triggers an 
 abend or hang due to a system malfunction, it will NOT be 
 classified as an integrity problem.
 
 o If the system abends or hangs because it is overloaded 
 (memory, CPU), it will NOT be classified as an integrity problem.
 
 o Just because it isn't an integrity problem doesn't mean it 
 isn't a defect.
 
 Alan Altmark
 z/VM Development
 IBM Endicott
 


Re: VM lockup due to storage typo (OT)

2009-09-15 Thread Schuh, Richard
Marcy,

Did you get to attend any of those parties at the Malibu mansion?

Regards, 
Richard Schuh 

 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
 Sent: Tuesday, September 15, 2009 2:16 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: VM lockup due to storage typo
 
  
 Gee, I guess we're in good company!   ;-)
 You betcha! (I'm in MN today, I can say that).
 
 At least mine was a test/dev system :)  If had done it to a 
 prod system, I'm sure someone here would have had IBM 
 answering questions ...  It's one of those things that fell 
 down low on the to pursue list - bigger fish frying.
 
 
 Marcy 
 
 This message may contain confidential and/or privileged 
 information. If you are not the addressee or authorized to 
 receive this for the addressee, you must not use, copy, 
 disclose, or take any action based on this message or any 
 information herein. If you have received this message in 
 error, please advise the sender immediately by reply e-mail 
 and delete this message. Thank you for your cooperation.
 

Re: VM lockup due to storage typo

2009-09-15 Thread Rob van der Heij
On Tue, Sep 15, 2009 at 11:18 PM, Robert J Brenneman bren...@gmail.com wrote:

 Admittedly - not 8TB in a 200G box, as Lee tried to do, and it was on
 z/VM 5.1, so it didn't have the system execution space stuff of later
 z/VM releases. It did teach the lesson that more page packs can only
 get you so far. At some point the system data structures needed to
 support the enormous guest just wont fit. This may be a reasonable
 calculation to make within CP as a sanity check.

If a factor of 2 does not make a difference, then try an order of
magnitude. :-)

One of the problems with booting Linux is that it determines the size
of the virtual machine by testing pages rather than ask CP about it.
If I remember right, it tries the first page of every architectured
segment. And to make it worse, it uses a test that also forces CP to
initialize the page frame. Which means that CP must also allocate a
PGMBK to hold the page tables to span that segment. So for each MB of
virtual machine storage, 3 pages must be allocated.
When I get the math right, the 8 TB virtual machine will very quickly
require 96 GB worth of page frames. That needs to come from
somewhere... A decent paging subsystem can fill up a single 3390-3 in
a minute or two.

And although we tell people that you need to add one 3390-3 page pack
for every GB of Linux server you define, there's still folks who think
we talk nonsense because with the first few Linux guests their z/VM
system did not page at all. But once you do start to page, page space
utilization growth is not subtle. It's more like shifting your cup of
coffee towards the edge of the table.

Rob


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
 
 One of the problems with booting Linux is that it determines 
 the size of the virtual machine by testing pages rather than 
 ask CP about it.

It only took TPF and its predecessors 35 years to get this right. :-)
Way back in VM/370 R3 I had a diag that could be used. We did talk 
the ACP Systems folks at TWA into using the diag instead of touching
Every page. We also had a mod in SVS to do the same (among other things).

 
 If I remember right, it tries the first page of every 
 architectured segment. 

It could be worse. Earlier systems (OS/360, MVS line of systems, ACP,   
VM, etc.) touched every page. The touching was usually done by setting
the storage key. 


Regards, 
Richard Schuh