Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-22 Thread Schuh, Richard
Now THAT would have exposed the problem. :-) Actually, he already tried that 
and the result has been this discussion.

Regards, 
Richard Schuh 

 

> I'm sure that there are a couple other ways of preventing the 
> problem, like IPL'ing the machine first and doing a Q V ALL 
> to see what resources you really did ask for, could have 
> stopped the problemif the Systems Programmer did it.


Re: VM lockup due to storage typo

2009-09-22 Thread David Boyes
> I don't think the analogy to a ping attack is a particularly fair
> one.  Yes, from the perspective of an innocent third user, they
> look the same, perhaps, but they aren't.  

??? In both cases, normal function of the "innocent" guest is disrupted by a 
force beyond it's control through no fault of it's own. The function is 
disrupted by a lack of shared resources available to the "innocent" guest due 
to trying to service what appears to be "legitimate" resource requests to 
another  theoretically "innocent" guest. 

> If the attack were made
> through some sort of security gate that defaults to "closed" state
> which the sysadmin had accidentally opened and left open, I think
> that would  be a more fair analogy.  Quibbling over details,
> perhaps, but there is an important difference.

Network floods have nothing innately to do with security states. You can 
produce exactly the same effect within a local segment with no outside 
connection, FW or any other "security" gates involved (misconfigure any DECnet 
device that boots via MOP and see what happens), so I don't see the subtle 
difference here -- one device banging out traffic without regard for other 
systems on the same network segment starves access to the other systems on the 
same segment, denying them the ability to function normally. Barks like a duck, 
swims like a duck, it'll do for duck soup, as a friend of mine says.  

But, as you say, let's concentrate on fixing the problem, not blaming the 
symptoms. 


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-21 Thread Tom Duerbusch
I mentioned earlier some sort of preferred paging space for CP areas, kind of 
like the DUMP area and SPOL.

But either way, it still depends on a Systems Programmer, which was the weak 
link in this discussion.

Recall that a Systems Programmer caused the problem of authorizing an 8 TB 
guest.  And that System's Programmer will never do that again, IMHO.

So setting up preferred paging area, or paging pools, is just another thing 
that most of us will never do, until we get shot in the foot.  

I bet that there are more VM systems that are running without a DUMP area then 
with.  And they are the smaller shops that may be able to handle an outage 
better than others.  

The DIRMAINT exit to prevent this amount of storage from being authorized, 
would have stopped itthat is, if the Systems Programmer did it.

Your VM performance monitor could have purged the machine and stopped itif 
the Systems Programmer did it.

I'm sure that there are a couple other ways of preventing the problem, like 
IPL'ing the machine first and doing a Q V ALL to see what resources you really 
did ask for, could have stopped the problemif the Systems Programmer did it.

Perhaps we are just too dangerous to be around anymore .  Time to hide us 
behind panels and such

Tom Duerbusch
THD Consulting

>>> "John P. Baker"  9/19/2009 11:21 AM >>>
All,

 

Since we have now beat the issue of storage management to death, I would
like to set forth some concrete ideas for consideration.

 

First, it has been pointed out that it may not currently be possible to
LOGON to MAINT or OPERATOR or to some other service machine in order to
diagnose the problem.

 

I recommend that the idea of splitting page space into multiple pools be
considered, where individual users can be assigned to different pools.  For
the purposes of discussion, let us consider that following enhancement:

 

. In the SYSTEM CONFIG file

o   DEFBACKSTGPOOL pool-id-8

o   BACKSTGPOOL pool-id-8 volser-6

. In the CP directory

o   OPTION BACKSTGPOOL pool-name-8

. Extend the CLASS B CP QUERY command

o   QUERY BACKSTGPOOL user-id-8

o   QUERY DEFBACKSTGPOOL

. Extend the CLASS B CP SET command

o   SET BACKSTGPOOL user-id-8 {DEFAULT | pool-name-8}

. Extend the CLASS G CP QUERY command

o   QUERY BACKSTGPOOL

 

Each paging volume will be allocated to a specific backing storage pool.

 

A LOGON will be rejected if the backing storage pool does not exist.

 

The SET BACKSTGPOOL command will be rejected if the backing storage pool
does not exist.

 

Second, provide a specification on whether a virtual machine requires full
backing storage for its defined memory size.

 

. In the SYSTEM CONFIG file

o   DEFBACKSTG {SYSTEM | VMSIZE}

. In the CP directory

o   OPTION BACKSTG {DEFAULT | SYSTEM | VMSIZE}

. Extend the CLASS B CP QUERY command

o   QUERY BACKSTG user-id-8

o   QUERY DEFBACKSTG

. Extend the CLASS B CP SET command

o   SET BACKSTG user-id-8 { DEFAULT | SYSTEM | VMSIZE}

. Extend the CLASS G CP QUERY command

o   QUERY BACKSTG

 

If BACKSTG is set or defaulted to SYSTEM, page allocation will continue to
operate as it does today.

 

If BACKSTG is set or defaulted to VMSIZE, there must be available within the
backing storage spool sufficient space to accommodate the entirety of the
specified VMSIZE, otherwise the LOGON, DEFINE STORAGE, or SET BACKSTG
command will be failed.

 

The SETBACKSTG command will force a virtual machine reset to occur.

 

These changes will address some of the issues raised.  I am certain that
other changes would be required, and that other ideas should be considered.
Please post your ideas.  Don't hesitate to point out any problems.

 

John P. Baker


Re: VM lockup due to storage typo

2009-09-21 Thread Bill Holder
I don't think the analogy to a ping attack is a particularly fair 
one.  Yes, from the perspective of an innocent third user, they 
look the same, perhaps, but they aren't.  If the attack were made 
through some sort of security gate that defaults to "closed" state 
which the sysadmin had accidentally opened and left open, I think 
that would  be a more fair analogy.  Quibbling over details, 
perhaps, but there is an important difference.  

On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes  w
rote:

>On 9/18/09 9:32 AM, "Bill Holder"  wrote:
>
>> That is indeed one important question, but there was another one, the
>> question of whether this was a denial of service attack exposure, whic
h i
>> t
>> is not.  
>
>I think that's a point of view question.
>
>If I am another user on the same VM system, happy within my cozy little
>class G box, and the hypervisor admin does something outside of my contr
ol
>to some OTHER user that causes CP to choke, then from the original user'
s
>perspective it IS a DOS attack because it's something that is out of my
>control, starves ME, and causes ME to choke without reason.
>
>An analagous parallel case in the distributed system world would be a pi
ng
>flood attack on a network segment. The innocent get hurt along with the
>intended target by being starved of access to the network, and thus lose
 the
>ability to function according to design.
>
>From the hypervisor admin's POV, then yeah, it's just doing what it's to
ld
>to do. It's correct operation, working as documented.
>
>I think Bill Schuh and Marcy and myself are arguing for the former
>viewpoint. I think you and Adam are arguing from the latter view.
>
>> I'm not disagreeing that it would be nice if there were some sor
>> t
>> of "are you sure" safety net before the system proceeded to try to do
>> something suicidal, but that's a design and requirements question, not
 a
>> defect question.
>
>I think we're all in violent agreement on that point. Now, the question 
is
>what is the best way to put a safety on that gun? 
>
=
===


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-21 Thread John P. Baker
Bill,

You may well be correct.  Of course, that permits me to pose the question of
how such a condition could effectively be avoided.  Ideas, anyone?

John P. Baker

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Bill Holder
Sent: Monday, September 21, 2009 11:32 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Storage Management Enhancement Ideas (was: VM lockup due to
storage typo)

These are very interesting ideas, but I suspect (no way to prove, since no
doc will be forthcoming) that the hang was not a paging issue, but rather a
central storage fragmentation issue involving attempts to allocate four
contiguous frames for region and segment tables.  Don't let me throw cold
water on the current discussion, though, I just wanted to point out that all
of the interesting paging ideas probably wouldn't help the situation that
triggered this entire discussion.

- Bill Holder, z/VM Development, IBM


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-21 Thread Bill Holder
These are very interesting ideas, but I suspect (no way to prove, since n
o
doc will be forthcoming) that the hang was not a paging issue, but rather
 a
central storage fragmentation issue involving attempts to allocate four
contiguous frames for region and segment tables.  Don't let me throw cold

water on the current discussion, though, I just wanted to point out that 
all
of the interesting paging ideas probably wouldn't help the situation that

triggered this entire discussion.

- Bill Holder, z/VM Development, IBM


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-20 Thread David Boyes
On 9/20/09 4:26 AM, "Rob van der Heij"  wrote:
> 
> Most performance tuning gets harder when you split resources and
> consumers in different groups and manage them separately. Sharing is
> easier with large numbers.
> Rob

Although with SSD coming back into vogue, the idea of swap vs page (shades
of HPO) might be worth considering again. If the goal is to get a very large
number of pages out of the way quickly and/or adding some additional levels
of paging hierarchy back into CP, I can see where that would have merit. 


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-20 Thread John P. Baker
Rob,

In many instances you would be correct.  However, in this case, the
decisions targeting a specific backing storage pool are made either at LOGON
time or during a DEFINE STORAGE command.  This is actually a very simple
approach to the problem.  Also, once the backup storage pool placement
decision is made, there should be no impact on the instruction path length.

John P. Baker

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Rob van der Heij
Sent: Sunday, September 20, 2009 4:26 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Storage Management Enhancement Ideas (was: VM lockup due to
storage typo)

On Sat, Sep 19, 2009 at 6:21 PM, John P. Baker 
wrote:

I don't like the idea to use only a subset of your paging capacity for
part of the workload. It's not just about space but also about
throughput. This is imho a very complicated approach to exclude some
(small) important users from an OOM killer. The real question is
whether you can do an OOM killer at all and achieve something useful
by doing so.

Most performance tuning gets harder when you split resources and
consumers in different groups and manage them separately. Sharing is
easier with large numbers.

Rob
-- 
Rob van der Heij
Velocity Software
http://www.velocitysoftware.com/


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-20 Thread Rob van der Heij
On Sat, Sep 19, 2009 at 6:21 PM, John P. Baker  wrote:

> I recommend that the idea of splitting page space into multiple pools be
> considered, where individual users can be assigned to different pools.  For
> the purposes of discussion, let us consider that following enhancement:

I don't like the idea to use only a subset of your paging capacity for
part of the workload. It's not just about space but also about
throughput. This is imho a very complicated approach to exclude some
(small) important users from an OOM killer. The real question is
whether you can do an OOM killer at all and achieve something useful
by doing so.

Most performance tuning gets harder when you split resources and
consumers in different groups and manage them separately. Sharing is
easier with large numbers.

Rob
-- 
Rob van der Heij
Velocity Software
http://www.velocitysoftware.com/


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-19 Thread John P. Baker
Rich,

Something else that comes to mind is that page space spills into spool space
when page space fills up.

It may be worth considering to provide system configuration options (both a
default and for each backing storage pool) that would determine whether page
over-allocation could be spilled into spool space.

John P. Baker

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Rich Smrcina
Sent: Saturday, September 19, 2009 1:19 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Storage Management Enhancement Ideas (was: VM lockup due to
storage typo)

Nicely written

-- 
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2010 - Apr 9-14, 2010 Covington, KY


Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-19 Thread Rich Smrcina

Nicely written

John P. Baker wrote:


All,

Since we have now beat the issue of storage management to death, I 
would like to set forth some concrete ideas for consideration.


First, it has been pointed out that it may not currently be possible 
to LOGON to MAINT or OPERATOR or to some other service machine in 
order to diagnose the problem.


I recommend that the idea of splitting page space into multiple pools 
be considered, where individual users can be assigned to different 
pools. For the purposes of discussion, let us consider that following 
enhancement:


· In the SYSTEM CONFIG file

o DEFBACKSTGPOOL pool-id-8

o BACKSTGPOOL pool-id-8 volser-6

· In the CP directory

o OPTION BACKSTGPOOL pool-name-8

· Extend the CLASS B CP QUERY command

o QUERY BACKSTGPOOL user-id-8

o QUERY DEFBACKSTGPOOL

· Extend the CLASS B CP SET command

o SET BACKSTGPOOL user-id-8 {DEFAULT | pool-name-8}

· Extend the CLASS G CP QUERY command

o QUERY BACKSTGPOOL

Each paging volume will be allocated to a specific backing storage pool.

A LOGON will be rejected if the backing storage pool does not exist.

The SET BACKSTGPOOL command will be rejected if the backing storage 
pool does not exist.


Second, provide a specification on whether a virtual machine requires 
full backing storage for its defined memory size.


· In the SYSTEM CONFIG file

o DEFBACKSTG {SYSTEM | VMSIZE}

· In the CP directory

o OPTION BACKSTG {DEFAULT | SYSTEM | VMSIZE}

· Extend the CLASS B CP QUERY command

o QUERY BACKSTG user-id-8

o QUERY DEFBACKSTG

· Extend the CLASS B CP SET command

o SET BACKSTG user-id-8 { DEFAULT | SYSTEM | VMSIZE}

· Extend the CLASS G CP QUERY command

o QUERY BACKSTG

If BACKSTG is set or defaulted to SYSTEM, page allocation will 
continue to operate as it does today.


If BACKSTG is set or defaulted to VMSIZE, there must be available 
within the backing storage spool sufficient space to accommodate the 
entirety of the specified VMSIZE, otherwise the LOGON, DEFINE STORAGE, 
or SET BACKSTG command will be failed.


The SETBACKSTG command will force a virtual machine reset to occur.

These changes will address some of the issues raised. I am certain 
that other changes would be required, and that other ideas should be 
considered. Please post your ideas. Don’t hesitate to point out any 
problems.


John P. Baker




--
Rich Smrcina
Phone: 414-491-6001
http://www.linkedin.com/in/richsmrcina

Catch the WAVV! http://www.wavv.org
WAVV 2010 - Apr 9-14, 2010 Covington, KY


Storage Management Enhancement Ideas (was: VM lockup due to storage typo)

2009-09-19 Thread John P. Baker
All,

 

Since we have now beat the issue of storage management to death, I would
like to set forth some concrete ideas for consideration.

 

First, it has been pointed out that it may not currently be possible to
LOGON to MAINT or OPERATOR or to some other service machine in order to
diagnose the problem.

 

I recommend that the idea of splitting page space into multiple pools be
considered, where individual users can be assigned to different pools.  For
the purposes of discussion, let us consider that following enhancement:

 

. In the SYSTEM CONFIG file

o   DEFBACKSTGPOOL pool-id-8

o   BACKSTGPOOL pool-id-8 volser-6

. In the CP directory

o   OPTION BACKSTGPOOL pool-name-8

. Extend the CLASS B CP QUERY command

o   QUERY BACKSTGPOOL user-id-8

o   QUERY DEFBACKSTGPOOL

. Extend the CLASS B CP SET command

o   SET BACKSTGPOOL user-id-8 {DEFAULT | pool-name-8}

. Extend the CLASS G CP QUERY command

o   QUERY BACKSTGPOOL

 

Each paging volume will be allocated to a specific backing storage pool.

 

A LOGON will be rejected if the backing storage pool does not exist.

 

The SET BACKSTGPOOL command will be rejected if the backing storage pool
does not exist.

 

Second, provide a specification on whether a virtual machine requires full
backing storage for its defined memory size.

 

. In the SYSTEM CONFIG file

o   DEFBACKSTG {SYSTEM | VMSIZE}

. In the CP directory

o   OPTION BACKSTG {DEFAULT | SYSTEM | VMSIZE}

. Extend the CLASS B CP QUERY command

o   QUERY BACKSTG user-id-8

o   QUERY DEFBACKSTG

. Extend the CLASS B CP SET command

o   SET BACKSTG user-id-8 { DEFAULT | SYSTEM | VMSIZE}

. Extend the CLASS G CP QUERY command

o   QUERY BACKSTG

 

If BACKSTG is set or defaulted to SYSTEM, page allocation will continue to
operate as it does today.

 

If BACKSTG is set or defaulted to VMSIZE, there must be available within the
backing storage spool sufficient space to accommodate the entirety of the
specified VMSIZE, otherwise the LOGON, DEFINE STORAGE, or SET BACKSTG
command will be failed.

 

The SETBACKSTG command will force a virtual machine reset to occur.

 

These changes will address some of the issues raised.  I am certain that
other changes would be required, and that other ideas should be considered.
Please post your ideas.  Don't hesitate to point out any problems.

 

John P. Baker



Re: VM lockup due to storage typo

2009-09-18 Thread Alan Altmark
On Friday, 09/18/2009 at 10:13 EDT, David Boyes  
wrote:
> On 9/18/09 9:32 AM, "Bill Holder"  wrote:
> 
> > That is indeed one important question, but there was another one, the
> > question of whether this was a denial of service attack exposure, 
which
> > it is not.
> 
> I think that's a point of view question.

It's all very Humpty Dumpty.  :-)  "Integrity" has a precise meaning with 
regard to APARs.   The *guest* is not doing anything to annoy CP.  CP is 
actually annoying himself trying to instantiate the guest.  Until control 
is given to the guest, nothing can be attributed to the guest.  The walls 
between guests and between the guest and CP have not been breached.  Ergo, 
no integrity problem.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-18 Thread Rich Smrcina

Adam Thornton wrote:

On Sep 18, 2009, at 9:11 AM, David Boyes wrote:



I think we're all in violent agreement on that point. Now, the 
question is

what is the best way to put a safety on that gun?


Oooh!  Oooh!  Pick me!  Mandatory User Access Control dialog boxes 
that pop up and make you click OK any time you want to breathe.


Adam


Would those be 3270 flower boxes?

--
Rich Smrcina


Re: VM lockup due to storage typo

2009-09-18 Thread Lee Stewart
While I agree it's not a DoS "attack" exposure, the system issued no 
messages and allowed no input on any console (via tn3270, OSA ICC 
console, HMC 3270 or HMC Operating system messages).  If we had a way to 
enter a command or two (probably an IND first), we could have forced off 
the offender and not hard crashed 30+ other Oracle servers.


As someone suggested, CP was probably busy allocating paging structures 
etc.  But should that be to the exclusion of any console input or 
operator control?   To have an entire LPAR appear hung to all consoles, 
and all Linuxes become non-responsive for 15-20-30 minutes certainly 
seems like a DoS to me...


Lee

Bill Holder wrote:

I see this as three separate questions (with my answers):

Is it a denial of service attack exposure?
- Clearly not.

Is it a defect?
- I don't believe so, for the base issue of whether VM
  should allow a privileged user do do something destructive,
  though there may well be defects or scalability / constraint
  shortcomings exposed by the hang (we'd need to see a dump to
  understand what's really happening).

Is this an area ripe for improvement, could/should VM be
smarter about preventing a privileged from doing something
dangerous or destructive?
- Sure.  I won't tell you not to open a requirement.  


- Bill Holder, z/VM Development, IBM




--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 3:41 PM, "Brian Nielsen"  wrote:


> A scenario that hasn't been mentioned deals with draining a PAGE volume.
> 
> The calculation of "defined paging space" might be considered fuzzy if a
> 
> PAGE volume is being DRAINed.  Of course, you could be strict and conside
> r 
> such a volume as undefined, but there will be cases where storage
> requirements for a guest are less than the available page space but put
> 
> the total demand above "defined paging space".

Good point. I think that I would consider a page volume marked as draining
as unavailable space as soon as CP starts the DRAIN operation, but you're
right that the wording I used is ambiguous. I'll change it to read
"available and online paging space".

I'll wait a day or so and see if anyone else has comments and resubmit it.


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 3:50 PM, "Tom Duerbusch"  wrote:

> The problem I would have, is my MAINT user is defined with 1 GB.  That is so I
> can process large reader files.
> The very vast majority of the time, I'm only using a few MB.
> Would you fix, prevent MAINT from logging on, when we are at, or near the
> discussed problem?
> Operations also has some userids of a similar nature.

I don't want to be too prescriptive here -- gotta give Alan something to
chew on -- but I would expect that there would need to be some exemption
mechanism for userids that are known to need extra humungous virtual machine
sizes and are known to be reasonably well behaved.

If IBM shipped an ESM by default (even an awful one), I'd say that should be
done in the ESM, but that's another crusade. 


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 4:27 PM, "Schuh, Richard"  wrote:

> Does "the current physical storage" refer to main or main + xstore? Also, is
> there any consideration of the total virtual storage or working sets of the
> in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a
> dozen users of 991G each logging on to my system that has only 1.02TB total
> page+physical memory.
> 
> It might be better to have a config file maximum and simply measure VM size
> against it - a MAXSTORE directory option that has been generalized, so to
> speak. Of course, any MAXSTORE directory entry that is lower would be
> respected. SET commands could temporarily lift or lower the limit for the
> system or for specific users.

AFAICT, most of the Xstore I see out there is configured to be page cache,
so I usually would think of it as configured online paging space.

I posed the problem in the requirement as generally as possible. Most cases,
IBM doesn't like too specific suggestions in requirements, so I kept my
suggestion pretty generalized.

If others submit requirements, I suspect it'll be more likely to get their
attention and get a solution created. 


Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
The action when spool fills has been to make the virtual printers and punches 
not ready for any user attempting to write. That does keep the system from 
crashing, but most systems running in the various VMs do not know how to handle 
it. Recovery can be a problem. It is almost as bad as recovering from a crashed 
SFS server. Pausing the spool hog(s) is a good idea, especially if it can be 
done early enough to prevent devices from being made not ready. 

Pausing page space hogs may be tougher to do. I can IPL a TPF system that is 
streaming dumps and not do whatever caused it to dump. I can also purge the 
individual dump files. I have no such action that I can take for a page space 
hog. In fact, the space it occupies will remain allocated it until it either 
logs off or does a system reset. About the only thing I can do is force it. I 
suppose it would be possible redefine its storage, but that would leave it in a 
virtual system reset state, so I might as well force it. 


Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
> Sent: Friday, September 18, 2009 1:42 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> VM64461 puts the brakes on console spooling by detecting that 
> something crazy is going on and may exhaust all of vm's 
> memory and pauses the virtual machine to allow the writes to 
> disk to take place and the memory to get back under control. 
> I believe messages are put out.   My understanding of that 
> may be off a little, but that's the gist of it.
> 
> I'd like to see something like that.  If a virtual machine is 
> up and running and CP sees that it is grabbing all of the 
> page space at an excessive rate or if it is in danger not 
> getting its page management blocks into memory then stun it 
> (or maybe even a parm that says no one user can use more the 
> x% of page).   Put out a message to Operator about "Userid 
> BIGBAD has been halted due to excessive memory consumption" 
> or something like that.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard
> Sent: Friday, September 18, 2009 1:28 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: [IBMVM] VM lockup due to storage typo
> 
> Does "the current physical storage" refer to main or main + 
> xstore? Also, is there any consideration of the total virtual 
> storage or working sets of the in-Queue, in-memory, or 
> logged-on users in the calculation? I wouldn't want a dozen 
> users of 991G each logging on to my system that has only 
> 1.02TB total page+physical memory.
> 
> It might be better to have a config file maximum and simply 
> measure VM size against it - a MAXSTORE directory option that 
> has been generalized, so to speak. Of course, any MAXSTORE 
> directory entry that is lower would be respected. SET 
> commands could temporarily lift or lower the limit for the 
> system or for specific users. 
> 
> Regards,
> Richard Schuh 
> 
>  
> 
> > -----Original Message-
> > From: The IBM z/VM Operating System
> > [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
> > Sent: Friday, September 18, 2009 10:49 AM
> > To: IBMVM@LISTSERV.UARK.EDU
> > Subject: Re: VM lockup due to storage typo
> > 
> > On 9/18/09 11:38 AM, "Bill Holder"  wrote:
> > 
> > > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes 
> > >  w
> > > rote:
> > >> I think we're all in violent agreement on that point. Now, the 
> > >> question
> > > is
> > >> what is the best way to put a safety on that gun?
> > > Is this a procedural or technical implementation question 
> (or both)?
> > > For the former, I'd say a requirement is appropriate.
> > 
> > OK, got that covered and done.
> > 
> > > For the latter,
> > > let's have at it.  :)
> > 
> > As I suggested in the requirement:
> > 
> > Possible solution would be to provide a SYSTEM CONFIG option 
> > (Check_Resource_Alloc_Sanity for discussion purposes) and 
> associated 
> &g

Re: VM lockup due to storage typo

2009-09-18 Thread Marcy Cortes
VM64461 puts the brakes on console spooling by detecting that something crazy 
is going on and may exhaust all of vm's memory and pauses the virtual machine 
to allow the writes to disk to take place and the memory to get back under 
control. 
I believe messages are put out.   My understanding of that may be off a little, 
but that's the gist of it.

I'd like to see something like that.  If a virtual machine is up and running 
and CP sees that it is grabbing all of the page space at an excessive rate or 
if it is in danger not getting its page management blocks into memory then stun 
it (or maybe even a parm that says no one user can use more the x% of page).   
Put out a message to Operator about "Userid BIGBAD has been halted due to 
excessive memory consumption" or something like that.


Marcy 

"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Schuh, Richard
Sent: Friday, September 18, 2009 1:28 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: [IBMVM] VM lockup due to storage typo

Does "the current physical storage" refer to main or main + xstore? Also, is 
there any consideration of the total virtual storage or working sets of the 
in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a 
dozen users of 991G each logging on to my system that has only 1.02TB total 
page+physical memory.

It might be better to have a config file maximum and simply measure VM size 
against it - a MAXSTORE directory option that has been generalized, so to 
speak. Of course, any MAXSTORE directory entry that is lower would be 
respected. SET commands could temporarily lift or lower the limit for the 
system or for specific users. 

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
> Sent: Friday, September 18, 2009 10:49 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> On 9/18/09 11:38 AM, "Bill Holder"  wrote:
> 
> > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes 
> >  w
> > rote:
> >> I think we're all in violent agreement on that point. Now, the 
> >> question
> > is
> >> what is the best way to put a safety on that gun?
> > Is this a procedural or technical implementation question (or both)?
> > For the former, I'd say a requirement is appropriate.
> 
> OK, got that covered and done.
> 
> > For the latter,
> > let's have at it.  :)
> 
> As I suggested in the requirement:
> 
> Possible solution would be to provide a SYSTEM CONFIG option 
> (Check_Resource_Alloc_Sanity for discussion purposes) and 
> associated SET command to check LOGIN, DEF STOR, and IPL 
> events to determine whether the requested resources (default 
> virtual storage size for LOGIN, new value for virtual storage 
> for DEF STOR, and current virtual storage size at time of 
> issue for IPL) are greater than the current physical storage 
> and defined paging space. If check is true, then issue a 
> warning message and cancel the action. 
> 
> Option defaults to ON, can be turned off by class A user SET command.
> 
> Not perfect, but would catch most of the scenarios that have 
> been discussed so far. 
> 

Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
Does "the current physical storage" refer to main or main + xstore? Also, is 
there any consideration of the total virtual storage or working sets of the 
in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a 
dozen users of 991G each logging on to my system that has only 1.02TB total 
page+physical memory.

It might be better to have a config file maximum and simply measure VM size 
against it - a MAXSTORE directory option that has been generalized, so to 
speak. Of course, any MAXSTORE directory entry that is lower would be 
respected. SET commands could temporarily lift or lower the limit for the 
system or for specific users. 

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
> Sent: Friday, September 18, 2009 10:49 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> On 9/18/09 11:38 AM, "Bill Holder"  wrote:
> 
> > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes 
> >  w
> > rote:
> >> I think we're all in violent agreement on that point. Now, the 
> >> question
> > is
> >> what is the best way to put a safety on that gun?
> > Is this a procedural or technical implementation question (or both)?
> > For the former, I'd say a requirement is appropriate.
> 
> OK, got that covered and done.
> 
> > For the latter,
> > let's have at it.  :)
> 
> As I suggested in the requirement:
> 
> Possible solution would be to provide a SYSTEM CONFIG option 
> (Check_Resource_Alloc_Sanity for discussion purposes) and 
> associated SET command to check LOGIN, DEF STOR, and IPL 
> events to determine whether the requested resources (default 
> virtual storage size for LOGIN, new value for virtual storage 
> for DEF STOR, and current virtual storage size at time of 
> issue for IPL) are greater than the current physical storage 
> and defined paging space. If check is true, then issue a 
> warning message and cancel the action. 
> 
> Option defaults to ON, can be turned off by class A user SET command.
> 
> Not perfect, but would catch most of the scenarios that have 
> been discussed so far. 
> 

Re: VM lockup due to storage typo

2009-09-18 Thread Tom Duerbusch
The problem I would have, is my MAINT user is defined with 1 GB.  That is so I 
can process large reader files.
The very vast majority of the time, I'm only using a few MB.

Would you fix, prevent MAINT from logging on, when we are at, or near the 
discussed problem?
Operations also has some userids of a similar nature.

Tom Duerbusch
THD Consulting

>>> David Boyes  9/18/2009 12:49 PM >>>
On 9/18/09 11:38 AM, "Bill Holder"  wrote:

> On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes  w
> rote:
>> I think we're all in violent agreement on that point. Now, the question
> is
>> what is the best way to put a safety on that gun?
> Is this a procedural or technical implementation question (or both)?
> For the former, I'd say a requirement is appropriate.

OK, got that covered and done.

> For the latter,  
> let's have at it.  :)

As I suggested in the requirement:

Possible solution would be to provide a SYSTEM CONFIG option
(Check_Resource_Alloc_Sanity for discussion purposes) and associated SET
command to check LOGIN, DEF STOR, and IPL events to determine whether the
requested resources (default virtual storage size for LOGIN, new value for
virtual storage for DEF STOR, and current virtual storage size at time of
issue for IPL) are greater than the current physical storage and defined
paging space. If check is true, then issue a warning message and cancel the
action. 

Option defaults to ON, can be turned off by class A user SET command.

Not perfect, but would catch most of the scenarios that have been discussed
so far. 


Re: VM lockup due to storage typo

2009-09-18 Thread Brian Nielsen
On Fri, 18 Sep 2009 13:49:27 -0400, David Boyes  

wrote:

>On 9/18/09 11:38 AM, "Bill Holder"  wrote:
>
>> On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes  
w
>> rote:
>>> I think we're all in violent agreement on that point. Now, the questi
on
>> is
>>> what is the best way to put a safety on that gun?
>> Is this a procedural or technical implementation question (or both)?
>> For the former, I'd say a requirement is appropriate.
>
>OK, got that covered and done.
>
>> For the latter,  
>> let's have at it.  :)
>
>As I suggested in the requirement:
>
>Possible solution would be to provide a SYSTEM CONFIG option
>(Check_Resource_Alloc_Sanity for discussion purposes) and associated SET

>command to check LOGIN, DEF STOR, and IPL events to determine whether th
e
>requested resources (default virtual storage size for LOGIN, new value f
or
>virtual storage for DEF STOR, and current virtual storage size at time o
f
>issue for IPL) are greater than the current physical storage and defined

>paging space. If check is true, then issue a warning message and cancel 

the
>action. 
>
>Option defaults to ON, can be turned off by class A user SET command.
>
>Not perfect, but would catch most of the scenarios that have been 
discussed
>so far. 

A scenario that hasn't been mentioned deals with draining a PAGE volume.

The calculation of "defined paging space" might be considered fuzzy if a 

PAGE volume is being DRAINed.  Of course, you could be strict and conside
r 
such a volume as undefined, but there will be cases where storage 
requirements for a guest are less than the available page space but put 

the total demand above "defined paging space".

Brian


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 11:38 AM, "Bill Holder"  wrote:

> On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes  w
> rote:
>> I think we're all in violent agreement on that point. Now, the question
> is
>> what is the best way to put a safety on that gun?
> Is this a procedural or technical implementation question (or both)?
> For the former, I'd say a requirement is appropriate.

OK, got that covered and done.

> For the latter,  
> let's have at it.  :)

As I suggested in the requirement:

Possible solution would be to provide a SYSTEM CONFIG option
(Check_Resource_Alloc_Sanity for discussion purposes) and associated SET
command to check LOGIN, DEF STOR, and IPL events to determine whether the
requested resources (default virtual storage size for LOGIN, new value for
virtual storage for DEF STOR, and current virtual storage size at time of
issue for IPL) are greater than the current physical storage and defined
paging space. If check is true, then issue a warning message and cancel the
action. 

Option defaults to ON, can be turned off by class A user SET command.

Not perfect, but would catch most of the scenarios that have been discussed
so far. 


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 11:58 AM, "Schuh, Richard"  wrote:

> Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that
> name :-)

It's your lawful good alter ego, arch nemesis of Chuckie. The Saturday
morning cartoon starring the Billster debuts next TV season, along with
"Danger at Rockland Island: Endicott in Peril" and "The Poughkeepsie Seven",
a drama about seven virtualization protestors illegally imprisoned and
tortured in building 705 for resisting the One True OS for System z. 8-)


Re: VM lockup due to storage typo

2009-09-18 Thread Bob Levad
I think the real problem here is that when CP is thrashing about for
whatever reason, it can be very hard to get control of a VM prompt to
manually fix things.  Perhaps if CP could determine that some resource is
being sorely abused, it could degrade the offending machine at least to the
point that a favored user can do a bit of problem determination and possibly
force the offender(s).

Our operator (PROPST) machine has option quickdsp and share rel 1.  I
hope it never goes astray, but I also have a bit of hope that I will be able
to re-connect to it if some other virtual machine buggers the system so I
can straighten things out.

Bob.


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Friday, September 18, 2009 12:11 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

While you are at it, make it self-healing, including the updating of the
source code. Or at least include a Medical Tricorder with each system.:-)

> We recognize that CP must be more forgiving and we are working to that 
> end, examining a variety of solutions that include inertial dampening, 
> tritanium plating, Kevlar(R), stacks of phone books, as well as taking 
> the gun away from you and beating you over the head with it (aka "the 
> retaliatory baseball bat subroutine").
 
You may need dedicated DUMP packs in order to be able to do this. CP may
have outgrown the size of the dump space and cannot allocate a larger space
as a result of the problem. 

> The bottom line is that none of us want the system to go out to lunch.
> That doesn't serve anyone's purposes.  If it happens, get a restart 
> dump and let us know.  Sometimes it's *not* your fault.  Really!  :-)
> 
> Alan Altmark
> z/VM Development
> IBM Endicott
> =

This electronic transmission and any documents accompanying this electronic 
transmission contain confidential information belonging to the sender.  This 
information may be legally privileged.  The information is intended only for 
the use of the individual or entity named above.  If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution, 
or the taking of any action in reliance on or regarding the contents of this 
electronically transmitted information is strictly prohibited.


Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
While you are at it, make it self-healing, including the updating of the source 
code. Or at least include a Medical Tricorder with each system.:-)

> We recognize that CP must be more forgiving and we are 
> working to that 
> end, examining a variety of solutions that include inertial 
> dampening, 
> tritanium plating, Kevlar(R), stacks of phone books, as well 
> as taking the 
> gun away from you and beating you over the head with it (aka "the 
> retaliatory baseball bat subroutine").
 
You may need dedicated DUMP packs in order to be able to do this. CP may have 
outgrown the size of the dump space and cannot allocate a larger space as a 
result of the problem. 

> The bottom line is that none of us want the system to go out 
> to lunch. 
> That doesn't serve anyone's purposes.  If it happens, get a 
> restart dump 
> and let us know.  Sometimes it's *not* your fault.  Really!  :-)
> 
> Alan Altmark
> z/VM Development
> IBM Endicott
> 

Re: VM lockup due to storage typo

2009-09-18 Thread Alan Altmark
On Thursday, 09/17/2009 at 01:22 EDT, "Schuh, Richard"  
wrote:
> An IPL isn't an action? True, the guest was not aware that it would harm 
the 
> system, but absent that action by the guest, there would not have been a 

> problem. The guest was an unwitting agent, a part of a bot net, as it 
were.

The case where the administrator loads the chamber and the user pulls the 
trigger to cause an outage is, admittedly, near a line between "normal 
defect" and "integrity defect".  Who, exactly, caused the problem?  I 
can't blame the user - they just logged on with no opportunity (or 
responsibility!) to review their directory prior to login (how?).  This 
particular problem must be laid at the feet of the sysadmin with all due 
ceremony, along with any other administrative snafu.

But I assert that even that is a red herring.  The central issue is not 
who chambered the weapon or who pulled the trigger.  Rather, it is an 
issue centered on how much shielding is or should be present to mitigate 
mistakes or errors in judgement by the sysadmins, and, to some extent, 
from CP's own attempts to make you happy.

We recognize that CP must be more forgiving and we are working to that 
end, examining a variety of solutions that include inertial dampening, 
tritanium plating, Kevlar(R), stacks of phone books, as well as taking the 
gun away from you and beating you over the head with it (aka "the 
retaliatory baseball bat subroutine").

The bottom line is that none of us want the system to go out to lunch. 
That doesn't serve anyone's purposes.  If it happens, get a restart dump 
and let us know.  Sometimes it's *not* your fault.  Really!  :-)

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-18 Thread John P. Baker
Personally, I have always preferred BAC (Broken As Coded).

John P. Baker

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Friday, September 18, 2009 11:58 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that
name :-)

Working as Documented is another version of WAD. My stance is that if the
system dies because of a design "feature", then perhaps that feature ought
to be reconsidered. Certainly, there is no way to anticipate all possible
feature failures, but when one comes up that is preventable, then the design
ought to be tweaked. All of the discussion about whether it is or is not a
DOS is totally irrelevant, especially to those who have been victimized.   

(I thought that Lyn Hadley eliminated WAD and BAD from the IBM vernacular
years ago.)

Regards, 
Richard Schuh


Re: VM lockup due to storage typo

2009-09-18 Thread Schuh, Richard
Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that name 
:-)

Working as Documented is another version of WAD. My stance is that if the 
system dies because of a design "feature", then perhaps that feature ought to 
be reconsidered. Certainly, there is no way to anticipate all possible feature 
failures, but when one comes up that is preventable, then the design ought to 
be tweaked. All of the discussion about whether it is or is not a DOS is 
totally irrelevant, especially to those who have been victimized.   

(I thought that Lyn Hadley eliminated WAD and BAD from the IBM vernacular years 
ago.)

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes
> Sent: Friday, September 18, 2009 7:12 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> On 9/18/09 9:32 AM, "Bill Holder"  wrote:
> 
> > That is indeed one important question, but there was 
> another one, the 
> > question of whether this was a denial of service attack exposure, 
> > which i t is not.
> 
> I think that's a point of view question.
> 
> If I am another user on the same VM system, happy within my 
> cozy little class G box, and the hypervisor admin does 
> something outside of my control to some OTHER user that 
> causes CP to choke, then from the original user's perspective 
> it IS a DOS attack because it's something that is out of my 
> control, starves ME, and causes ME to choke without reason.
> 
> An analagous parallel case in the distributed system world 
> would be a ping flood attack on a network segment. The 
> innocent get hurt along with the intended target by being 
> starved of access to the network, and thus lose the ability 
> to function according to design.
> 
> From the hypervisor admin's POV, then yeah, it's just doing 
> what it's told to do. It's correct operation, working as documented.
> 
> I think Bill Schuh and Marcy and myself are arguing for the 
> former viewpoint. I think you and Adam are arguing from the 
> latter view.
> 
> > I'm not disagreeing that it would be nice if there were 
> some sor t of 
> > "are you sure" safety net before the system proceeded to try to do 
> > something suicidal, but that's a design and requirements 
> question, not 
> > a defect question.
> 
> I think we're all in violent agreement on that point. Now, 
> the question is what is the best way to put a safety on that gun? 
> 

Re: VM lockup due to storage typo

2009-09-18 Thread Bill Holder
On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes  w
rote:

...
>I think we're all in violent agreement on that point. Now, the question 
is
>what is the best way to put a safety on that gun? 
>
=
===

Is this a procedural or technical implementation question (or both)?  

For the former, I'd say a requirement is appropriate.  For the latter, 

let's have at it.  :)


Re: VM lockup due to storage typo

2009-09-18 Thread Brian Nielsen
On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes  

wrote:

>I think we're all in violent agreement on that point. Now, the question 
is
>what is the best way to put a safety on that gun? 


Since the Linux OOM model is to kill a process, just kill some Linux 
virtual machine to free up space...


Brian Nielsen


Re: VM lockup due to storage typo

2009-09-18 Thread Rob van der Heij
On Fri, Sep 18, 2009 at 4:11 PM, David Boyes  wrote:

> I think we're all in violent agreement on that point. Now, the question is
> what is the best way to put a safety on that gun?

IMHO the suggested solutions so far merely bend the barrel upwards.
This may deflect the bullet from your own foot in some usage
scenarios, but likely hurts other feet and makes the thing in general
hard to aim ;-)

Rob


Re: VM lockup due to storage typo

2009-09-18 Thread Adam Thornton

On Sep 18, 2009, at 9:11 AM, David Boyes wrote:



I think we're all in violent agreement on that point. Now, the  
question is

what is the best way to put a safety on that gun?


Oooh!  Oooh!  Pick me!  Mandatory User Access Control dialog boxes  
that pop up and make you click OK any time you want to breathe.


Adam


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 9:38 AM, "Huegel, Thomas"  wrote:

> A little OT, but curiosity calls.. What is the max. storage that z/LINUX
> can use? 

Last time I looked at the Linux memory management code (a while back) it was
4TB, but that's probably expanded by now. The documented z/VM limit of 8TB
has been around for a while; I think that appeared in 5.2.


Re: VM lockup due to storage typo

2009-09-18 Thread David Boyes
On 9/18/09 9:32 AM, "Bill Holder"  wrote:

> That is indeed one important question, but there was another one, the
> question of whether this was a denial of service attack exposure, which i
> t
> is not.  

I think that's a point of view question.

If I am another user on the same VM system, happy within my cozy little
class G box, and the hypervisor admin does something outside of my control
to some OTHER user that causes CP to choke, then from the original user's
perspective it IS a DOS attack because it's something that is out of my
control, starves ME, and causes ME to choke without reason.

An analagous parallel case in the distributed system world would be a ping
flood attack on a network segment. The innocent get hurt along with the
intended target by being starved of access to the network, and thus lose the
ability to function according to design.

>From the hypervisor admin's POV, then yeah, it's just doing what it's told
to do. It's correct operation, working as documented.

I think Bill Schuh and Marcy and myself are arguing for the former
viewpoint. I think you and Adam are arguing from the latter view.

> I'm not disagreeing that it would be nice if there were some sor
> t
> of "are you sure" safety net before the system proceeded to try to do
> something suicidal, but that's a design and requirements question, not a
> defect question.

I think we're all in violent agreement on that point. Now, the question is
what is the best way to put a safety on that gun? 


Re: VM lockup due to storage typo

2009-09-18 Thread Bill Holder
I see this as three separate questions (with my answers):

Is it a denial of service attack exposure?
- Clearly not.

Is it a defect?
- I don't believe so, for the base issue of whether VM
  should allow a privileged user do do something destructive,
  though there may well be defects or scalability / constraint
  shortcomings exposed by the hang (we'd need to see a dump to
  understand what's really happening).

Is this an area ripe for improvement, could/should VM be
smarter about preventing a privileged from doing something
dangerous or destructive?
- Sure.  I won't tell you not to open a requirement.  

- Bill Holder, z/VM Development, IBM


Re: VM lockup due to storage typo

2009-09-18 Thread Huegel, Thomas
A little OT, but curiosity calls.. What is the max. storage that z/LINUX
can use? 

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of David Boyes
Sent: Thursday, September 17, 2009 4:37 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

On 9/17/09 2:16 PM, "Adam Thornton"  wrote:

 
> "Administrator typo" is not a failure mode the operating system is 
> designed to protect you from.

That may be true now, but I think the point of the argument is that it
should not be. 

On VMS, if you have a SYSTEM priv bit set, the system will still warn
you if you're about to do something that seems stupid. If there is an
architected limit (note that the 9.7TB got clipped to 8TB, so SOMETHING
noticed a problem), then it's not too unreasonable for the system to
take defensive measures and issue a warning that all is not right in in
the kingdom of Denmark, cream or no cream dresses.

It seems like a basic defense that if CP notices you starting something
that it KNOWS it may not have resources to complete, requiring
confirmation that you know what you're doing (or about to do) is a good
defensive measure.

Did the system do what you told it to do when you told it to do it? Yes.
Whether it should march off a cliff without at least questioning the
order is the question at hand.

-- db


Re: VM lockup due to storage typo

2009-09-18 Thread Bill Holder
That is indeed one important question, but there was another one, the
question of whether this was a denial of service attack exposure, which i
t
is not.  I'm not disagreeing that it would be nice if there were some sor
t
of "are you sure" safety net before the system proceeded to try to do
something suicidal, but that's a design and requirements question, not a
defect question.

- Bill Holder, z/VM Development, IBM

On Thu, 17 Sep 2009 17:36:44 -0400, David Boyes  w
rote:

>On 9/17/09 2:16 PM, "Adam Thornton"  wrote:
>
> 
>> "Administrator typo" is not a failure mode the operating system is
>> designed to protect you from.
>
>That may be true now, but I think the point of the argument is that it
>should not be. 
>
>On VMS, if you have a SYSTEM priv bit set, the system will still warn yo
u if
>you're about to do something that seems stupid. If there is an architect
ed
>limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a
>problem), then it's not too unreasonable for the system to take defensiv
e
>measures and issue a warning that all is not right in in the kingdom of
>Denmark, cream or no cream dresses.
>
>It seems like a basic defense that if CP notices you starting something 
that
>it KNOWS it may not have resources to complete, requiring confirmation t
hat
>you know what you're doing (or about to do) is a good defensive measure.

>
>Did the system do what you told it to do when you told it to do it? Yes.

>Whether it should march off a cliff without at least questioning the ord
er
>is the question at hand.
>
>-- db
>
=
===


Re: VM lockup due to storage typo

2009-09-17 Thread Marcy Cortes
Well, there is precedence here of VM dev fixing things that are too large/too 
much that take down VM
See VM64461 and VM6
 
I'll probably look into the possibility of a vmsecure exit to add a safety to 
my gun for now.

Marcy 

"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."

Re: VM lockup due to storage typo

2009-09-17 Thread Adam Thornton

On Sep 17, 2009, at 5:36 PM, David Boyes wrote:

Whether it should march off a cliff without at least questioning the  
order

is the question at hand.


Of course it should.

Yes, my Unix is showing.

Adam


Re: VM lockup due to storage typo

2009-09-17 Thread David Boyes
On 9/17/09 2:16 PM, "Adam Thornton"  wrote:

 
> "Administrator typo" is not a failure mode the operating system is
> designed to protect you from.

That may be true now, but I think the point of the argument is that it
should not be. 

On VMS, if you have a SYSTEM priv bit set, the system will still warn you if
you're about to do something that seems stupid. If there is an architected
limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a
problem), then it's not too unreasonable for the system to take defensive
measures and issue a warning that all is not right in in the kingdom of
Denmark, cream or no cream dresses.

It seems like a basic defense that if CP notices you starting something that
it KNOWS it may not have resources to complete, requiring confirmation that
you know what you're doing (or about to do) is a good defensive measure.

Did the system do what you told it to do when you told it to do it? Yes.
Whether it should march off a cliff without at least questioning the order
is the question at hand.

-- db


Re: VM lockup due to storage typo

2009-09-17 Thread Adam Thornton

On Sep 17, 2009, at 1:58 PM, Bill Holder wrote:


I'd agree with that point in cases where it's less clear, but in
this case, it's perfectly clear that the user action would have
been harmless if not for the administrator typo


Yabbut

"Administrator typo" is not a failure mode the operating system is  
designed to protect you from.  If you have authority to edit the user  
directory, then, well, your gun, your foot.


Adam


Re: VM lockup due to storage typo

2009-09-17 Thread Lee Stewart
FYI, the system in question had about 175GB of page space - 22 mod 9s. 
Currently the system does NO paging.  All the guests fit within real 
storage.  (Of course there will eventually be more guests on that LPAR, 
so sooner or later we'll start to page.)


Lee

Rob van der Heij wrote:

On Thu, Sep 17, 2009 at 6:34 PM, Bill Holder  wrote:


Occurrences of this sort of problem are likely to result in temporary
or permanent hangs of both individual users and eventually the entire
system, which supports the theory in this case.  I'd really need to
see a dump of the system in question to confirm this hypothesis,
however.


And I think Lee has not yet mentioned how much paging space he had
allocated. With a 175G LPAR you would think he has at least 175G worth
of virtual machines, so 350G of paging space... for the moment the
next virtual machine went over the edge. I very much doubt he was that
well prepared. With that amount of space, things might have gotten
slow but there's a fair chance CP would have survived the abuse.

Rob




--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-17 Thread P S
On Thu, Sep 17, 2009 at 10:58 AM, Bill Holder  wrote:
> I'd agree with that point in cases where it's less clear, but in
> this case, it's perfectly clear that the user action would have
> been harmless if not for the administrator typo.  I don't disagree
> that more protection at the user action level would be nice in
> this case, that's really different discussion than whether this
> constitutes a denial of service exposure.

OK, I buy that. If the sysprog does a UCR to make SHUTDOWN class G, it
isn't VM's fault if a user issues SHUTDOWN.


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
I'd agree with that point in cases where it's less clear, but in
this case, it's perfectly clear that the user action would have
been harmless if not for the administrator typo.  I don't disagree
that more protection at the user action level would be nice in 
this case, that's really different discussion than whether this
constitutes a denial of service exposure.  

There's a reason that trusted users are called that, because 
they have the power to shoot themselves, and the entire system.  
We cannot protect against every possible harmful act by trusted
users, whether accidental or malicious. 

Regards,
- Bill Holder

On Thu, 17 Sep 2009 10:48:53 -0700, Schuh, Richard  wrot
e:

>I don't think you can differentiate between the root cause and the
immediate cause when it comes to security and integrity. You may not
necessarily be able to detect the root cause, but you must protect the
system against the immediate cause if at all possible.
>
>Regards, 
>Richard Schuh 
>
>


Re: VM lockup due to storage typo

2009-09-17 Thread Schuh, Richard
I don't think you can differentiate between the root cause and the immediate 
cause when it comes to security and integrity. You may not necessarily be able 
to detect the root cause, but you must protect the system against the immediate 
cause if at all possible.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
> Sent: Thursday, September 17, 2009 10:35 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> Sure, true enough, but the exposure was not caused by the 
> guest action.  Yes, it wouldn't have happened had the guest 
> not logged on an IPLed, but that wasn't the root cause, the typo was.
> The action of the class G user didn't cause the problem, 
> therefore it's not a Denial of Service attack case.  Note 
> that I'm not saying it's not APARable, however.
> 
> Regards,
> - Bill Holder
> 
> On Thu, 17 Sep 2009 10:21:05 -0700, Schuh, Richard 
>  wrot=
> e:
> 
> >An IPL isn't an action? True, the guest was not aware that it would 
> >harm=
> 
> the system, but absent that action by the guest, there would 
> not have bee= n a problem. The guest was an unwitting agent, 
> a part of a bot net, as it wer= e.
> >
> >Regards,
> >Richard Schuh
> >
> > 
> >
> >> -Original Message-
> >> From: The IBM z/VM Operating System
> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
> >> Sent: Thursday, September 17, 2009 9:14 AM
> >> To: IBMVM@LISTSERV.UARK.EDU
> >> Subject: Re: VM lockup due to storage typo
> >> 
> >> I don't entirely agree.  The action of the guest did not 
> cause harm 
> >> to CP, it was the action of the operations staff which 
> did.  This is 
> >> not a denial of service case that I can see.
> >> 
> >> Bill Holder
> >> z/VM Development, Memory Management team leader, IBM
> >> 
> >> On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard 
>  
> >> wrot=
> >> e:
> >> 
> >> >Maybe CP couldn't know that the guest would do something 
> bad, but it 
> >> >=
> 
> >> >sho=
> >> uld
> >> know that it has opened itself to the possibility that the guest 
> >> could, i= n normal operation, cause the problem.
> >> >One of Alan's first precepts of information security and
> >> integrity is
> >> >th=
> >> at
> >> the guest cannot be allowed to harm the CP. This clearly violates 
> >> that.
> >> >
> >> >Regards,
> >> >Richard Schuh
> >> >
> >> > 
> >> >
> >> >> -Original Message-
> >> >> From: The IBM z/VM Operating System 
> >> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> >> >> Sent: Tuesday, September 15, 2009 9:19 AM
> >> >> To: IBMVM@LISTSERV.UARK.EDU
> >> >> Subject: Re: VM lockup due to storage typo
> >> >> 
> >> >> CP wouldn't know at IPL time, the guest would, not could,
> >> but would
> >> >> cause such harm.
> >> >> 
> >> >> Just because you say you can use xxx GB, doesn't mean you would 
> >> >> actually use them.
> >> >> 
> >> >> When page fills, it over flows to spool.
> >> >> When spool fills, CP abends on the next pageout.
> >> >> 
> >> >> Tom Duerbusch
> >> >> THD Consulting
> >> >> 
> >> >> >>> Marcy Cortes  9/15/2009
> >> >> 11:02 AM >>>
> >> >> See a thread on this list with subject "Sanity check?" 
> >> from Oct 2007
> >> >> for what happened when I did the same thing ;)
> >> >> 
> >> >> You probably filled page space.
> >> >> 
> >> >> I still think IBM should refuse to IPL a guest that will
> >> cause such
> >> >> harm.
> >> >> 
> >> >> 
> >> >> Marcy
> >> >> 
> >> >> "This message may contain confidential and/or privileged
> >> information. 
> >> >> If you are not the addressee or authorized to receive 
> this for the 
> >> >> =
> 
> >> >> addressee, you must not use, copy, disclose, or take any
> >> action based
> >> >> on this me

Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
Sure, true enough, but the exposure was not caused by the guest
action.  Yes, it wouldn't have happened had the guest not logged
on an IPLed, but that wasn't the root cause, the typo was.
The action of the class G user didn't cause the problem, therefore
it's not a Denial of Service attack case.  Note that I'm not
saying it's not APARable, however.

Regards,
- Bill Holder

On Thu, 17 Sep 2009 10:21:05 -0700, Schuh, Richard  wrot
e:

>An IPL isn't an action? True, the guest was not aware that it would harm

the system, but absent that action by the guest, there would not have bee
n a
problem. The guest was an unwitting agent, a part of a bot net, as it wer
e.
>
>Regards, 
>Richard Schuh 
>
> 
>
>> -Original Message-
>> From: The IBM z/VM Operating System 
>> [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
>> Sent: Thursday, September 17, 2009 9:14 AM
>> To: IBMVM@LISTSERV.UARK.EDU
>> Subject: Re: VM lockup due to storage typo
>> 
>> I don't entirely agree.  The action of the guest did not 
>> cause harm to CP, it was the action of the operations staff 
>> which did.  This is not a denial of service case that I can see.
>> 
>> Bill Holder
>> z/VM Development, Memory Management team leader, IBM
>> 
>> On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard 
>>  wrot=
>> e:
>> 
>> >Maybe CP couldn't know that the guest would do something bad, but it 

>> >sho=
>> uld
>> know that it has opened itself to the possibility that the 
>> guest could, i= n normal operation, cause the problem. 
>> >One of Alan's first precepts of information security and 
>> integrity is 
>> >th=
>> at
>> the guest cannot be allowed to harm the CP. This clearly 
>> violates that.
>> >
>> >Regards,
>> >Richard Schuh
>> >
>> > 
>> >
>> >> -Original Message-
>> >> From: The IBM z/VM Operating System
>> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
>> >> Sent: Tuesday, September 15, 2009 9:19 AM
>> >> To: IBMVM@LISTSERV.UARK.EDU
>> >> Subject: Re: VM lockup due to storage typo
>> >> 
>> >> CP wouldn't know at IPL time, the guest would, not could, 
>> but would 
>> >> cause such harm.
>> >> 
>> >> Just because you say you can use xxx GB, doesn't mean you would 
>> >> actually use them.
>> >> 
>> >> When page fills, it over flows to spool.
>> >> When spool fills, CP abends on the next pageout.
>> >> 
>> >> Tom Duerbusch
>> >> THD Consulting
>> >> 
>> >> >>> Marcy Cortes  9/15/2009
>> >> 11:02 AM >>>
>> >> See a thread on this list with subject "Sanity check?" 
>> from Oct 2007 
>> >> for what happened when I did the same thing ;)
>> >> 
>> >> You probably filled page space.
>> >> 
>> >> I still think IBM should refuse to IPL a guest that will 
>> cause such 
>> >> harm.
>> >> 
>> >> 
>> >> Marcy
>> >> 
>> >> "This message may contain confidential and/or privileged 
>> information. 
>> >> If you are not the addressee or authorized to receive this for the 

>> >> addressee, you must not use, copy, disclose, or take any 
>> action based 
>> >> on this message or any information herein. If you have 
>> received this 
>> >> message in error, please advise the sender immediately by reply 
>> >> e-mail and delete this message. Thank you for your cooperation."
>> >> 
>> >> 
>> >> -Original Message-
>> >> From: The IBM z/VM Operating System 
>> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
>> >> Sent: Tuesday, September 15, 2009 8:39 AM
>> >> To: IBMVM@LISTSERV.UARK.EDU
>> >> Subject: [IBMVM] VM lockup due to storage typo
>> >> 
>> >> Does anyone have an idea of how we might have gotten out of 
>> >> this without an IPL?
>> >> 
>> >> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 

>> >> Several guests needed more memory added so the directory was 
>> >> updated and one by one the guests shutdown, logged off and 
>> >> back on.  So far, so good.
>> >> 
>> >> But... In changing the memory for many guests, and it being 
&

Re: VM lockup due to storage typo

2009-09-17 Thread Schuh, Richard
An IPL isn't an action? True, the guest was not aware that it would harm the 
system, but absent that action by the guest, there would not have been a 
problem. The guest was an unwitting agent, a part of a bot net, as it were.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder
> Sent: Thursday, September 17, 2009 9:14 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> I don't entirely agree.  The action of the guest did not 
> cause harm to CP, it was the action of the operations staff 
> which did.  This is not a denial of service case that I can see.
> 
> Bill Holder
> z/VM Development, Memory Management team leader, IBM
> 
> On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard 
>  wrot=
> e:
> 
> >Maybe CP couldn't know that the guest would do something bad, but it 
> >sho=
> uld
> know that it has opened itself to the possibility that the 
> guest could, i= n normal operation, cause the problem. 
> >One of Alan's first precepts of information security and 
> integrity is 
> >th=
> at
> the guest cannot be allowed to harm the CP. This clearly 
> violates that.
> >
> >Regards,
> >Richard Schuh
> >
> > 
> >
> >> -Original Message-
> >> From: The IBM z/VM Operating System
> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> >> Sent: Tuesday, September 15, 2009 9:19 AM
> >> To: IBMVM@LISTSERV.UARK.EDU
> >> Subject: Re: VM lockup due to storage typo
> >> 
> >> CP wouldn't know at IPL time, the guest would, not could, 
> but would 
> >> cause such harm.
> >> 
> >> Just because you say you can use xxx GB, doesn't mean you would 
> >> actually use them.
> >> 
> >> When page fills, it over flows to spool.
> >> When spool fills, CP abends on the next pageout.
> >> 
> >> Tom Duerbusch
> >> THD Consulting
> >> 
> >> >>> Marcy Cortes  9/15/2009
> >> 11:02 AM >>>
> >> See a thread on this list with subject "Sanity check?" 
> from Oct 2007 
> >> for what happened when I did the same thing ;)
> >> 
> >> You probably filled page space.
> >> 
> >> I still think IBM should refuse to IPL a guest that will 
> cause such 
> >> harm.
> >> 
> >> 
> >> Marcy
> >> 
> >> "This message may contain confidential and/or privileged 
> information. 
> >> If you are not the addressee or authorized to receive this for the 
> >> addressee, you must not use, copy, disclose, or take any 
> action based 
> >> on this message or any information herein. If you have 
> received this 
> >> message in error, please advise the sender immediately by reply 
> >> e-mail and delete this message. Thank you for your cooperation."
> >> 
> >> 
> >> -Original Message-
> >> From: The IBM z/VM Operating System 
> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
> >> Sent: Tuesday, September 15, 2009 8:39 AM
> >> To: IBMVM@LISTSERV.UARK.EDU
> >> Subject: [IBMVM] VM lockup due to storage typo
> >> 
> >> Does anyone have an idea of how we might have gotten out of 
> >> this without an IPL?
> >> 
> >> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> >> Several guests needed more memory added so the directory was 
> >> updated and one by one the guests shutdown, logged off and 
> >> back on.  So far, so good.
> >> 
> >> But... In changing the memory for many guests, and it being 
> >> late at night after a long day, while meaning to set a 
> >> guest's memory to 9728M, it got set to 9728G.  When that 
> >> guest was cycled we see the message on the console that it's 
> >> memory was limited to 8TB (HCPLGN093E), then the VM system 
> >> appeared to freeze.
> >> 
> >> We couldn't get in via TCP/IP, or the HMC Operating System 
> >> Messages screen, or the HMC Integrated 3270.
> >> 
> >> Finally had to IPL.   Even that was wierd as I'd have 
> >> expected the Load 
> >> Normal to shutdown, it just IPLed.   We did NoAutolog, 
> fixed the typo =
> 
> >> and all came back up ok...
> >> 
> >> I suspect CP was scrambling paging everything in the world 
> >> out as Linux 
> >> tried to initialize that 8TB of memory...   But I'm surprised 
> >> I couldn't 
> >> even get into the HMC consoles (to kill just that one guest 
> >> as opposed to all of them)..
> >> 
> >> Any thoughts?
> >> Lee
> >> -- 
> >> 
> >> Lee Stewart, Senior SE
> >> Sirius Computer Solutions
> >> Phone: (303) 996-7122
> >> Email: lee.stew...@siriuscom.com 
> >> Web:   www.siriuscom.com
> >> =
> ==
> ===
> 

Re: VM lockup due to storage typo

2009-09-17 Thread Rob van der Heij
On Thu, Sep 17, 2009 at 6:34 PM, Bill Holder  wrote:

> Occurrences of this sort of problem are likely to result in temporary
> or permanent hangs of both individual users and eventually the entire
> system, which supports the theory in this case.  I'd really need to
> see a dump of the system in question to confirm this hypothesis,
> however.

And I think Lee has not yet mentioned how much paging space he had
allocated. With a 175G LPAR you would think he has at least 175G worth
of virtual machines, so 350G of paging space... for the moment the
next virtual machine went over the edge. I very much doubt he was that
well prepared. With that amount of space, things might have gotten
slow but there's a fair chance CP would have survived the abuse.

Rob


Re: VM lockup due to storage typo

2009-09-17 Thread Quay, Jonathan (IHG)
It sounds very similar in symptom to my minidisk cache overcommitment
problem that resulted in CP thrashing (and an APAR).

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Bill Holder
Sent: Thursday, September 17, 2009 12:34 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

I should point out that this hang is likely being misunderstood here.  =

While this scenario will indeed drive paging over the edge, that's not =

likely what happened.  If paging had been driven to that point, the 
system would have quickly taken a PGT004 abend and restarted.  Instead,
=

I believe what happened is likely a most difficult to solve variant on
something that was mentioned before: that is, difficulty allocating CP
structures required to represent the massive amount of storage.  Page 
tables are only part of the problem.  The upper level DAT tables (region
=

and segment) can be up to 4 frames long, and once storage utilization 
becomes heavy enough, it becomes fragmented (PGMBK allocation being 
a factor here), making it very difficult for CP to allocate contiguous =

sets of 3s and 4s.  We spent quite a bit of effort in z/VM 5.3.0 
addressing the PGMBK side of this issue, but the harder problem of 
the upper level tables remains as a likely constraint point.  

Occurrences of this sort of problem are likely to result in temporary 
or permanent hangs of both individual users and eventually the entire 
system, which supports the theory in this case.  I'd really need to 
see a dump of the system in question to confirm this hypothesis, 
however.  

Bill Holder
z/VM Development, Memory Management team lead, IBM  


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
I should point out that this hang is likely being misunderstood here.  

While this scenario will indeed drive paging over the edge, that's not 

likely what happened.  If paging had been driven to that point, the 
system would have quickly taken a PGT004 abend and restarted.  Instead, 

I believe what happened is likely a most difficult to solve variant on
something that was mentioned before: that is, difficulty allocating CP
structures required to represent the massive amount of storage.  Page 
tables are only part of the problem.  The upper level DAT tables (region 

and segment) can be up to 4 frames long, and once storage utilization 
becomes heavy enough, it becomes fragmented (PGMBK allocation being 
a factor here), making it very difficult for CP to allocate contiguous 

sets of 3s and 4s.  We spent quite a bit of effort in z/VM 5.3.0 
addressing the PGMBK side of this issue, but the harder problem of 
the upper level tables remains as a likely constraint point.  

Occurrences of this sort of problem are likely to result in temporary 
or permanent hangs of both individual users and eventually the entire 
system, which supports the theory in this case.  I'd really need to 
see a dump of the system in question to confirm this hypothesis, 
however.  

Bill Holder
z/VM Development, Memory Management team lead, IBM


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
No, not at all, that's not what I was saying; what you propose would
obviously be an exposure.  A privileged user (operations staff) can issue

that today.  Putting a loaded gun in the hands of a class G user is not a
t
all the same thing.  Anything a user at a keyboard can do, a guest progra
m
can do, generally, and they all have to be protected.

On Thu, 17 Sep 2009 09:23:11 -0700, P S  wrote:

>On Thu, Sep 17, 2009 at 9:14 AM, Bill Holder  wrote:

>> I don't entirely agree.  The action of the guest did not cause harm
>> to CP, it was the action of the operations staff which did.  This
>> is not a denial of service case that I can see.
>
>Hm. So by that rationale, we can make STORE H class G, because it
>won't be the *guest* harming CP, it will be the end-user who types the
>command.


Re: VM lockup due to storage typo

2009-09-17 Thread P S
On Thu, Sep 17, 2009 at 9:14 AM, Bill Holder  wrote:
> I don't entirely agree.  The action of the guest did not cause harm
> to CP, it was the action of the operations staff which did.  This
> is not a denial of service case that I can see.

Hm. So by that rationale, we can make STORE H class G, because it
won't be the *guest* harming CP, it will be the end-user who types the
command.


Re: VM lockup due to storage typo

2009-09-17 Thread Bill Holder
I don't entirely agree.  The action of the guest did not cause harm
to CP, it was the action of the operations staff which did.  This
is not a denial of service case that I can see.

Bill Holder
z/VM Development, Memory Management team leader, IBM

On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard  wrot
e:

>Maybe CP couldn't know that the guest would do something bad, but it sho
uld
know that it has opened itself to the possibility that the guest could, i
n
normal operation, cause the problem. 
>One of Alan's first precepts of information security and integrity is th
at
the guest cannot be allowed to harm the CP. This clearly violates that.
>
>Regards, 
>Richard Schuh 
>
> 
>
>> -Original Message-
>> From: The IBM z/VM Operating System 
>> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
>> Sent: Tuesday, September 15, 2009 9:19 AM
>> To: IBMVM@LISTSERV.UARK.EDU
>> Subject: Re: VM lockup due to storage typo
>> 
>> CP wouldn't know at IPL time, the guest would, not could, but 
>> would cause such harm.
>> 
>> Just because you say you can use xxx GB, doesn't mean you 
>> would actually use them.
>> 
>> When page fills, it over flows to spool.
>> When spool fills, CP abends on the next pageout.
>> 
>> Tom Duerbusch
>> THD Consulting
>> 
>> >>> Marcy Cortes  9/15/2009 
>> 11:02 AM >>>
>> See a thread on this list with subject "Sanity check?" from 
>> Oct 2007 for what happened when I did the same thing ;)
>> 
>> You probably filled page space.
>> 
>> I still think IBM should refuse to IPL a guest that will 
>> cause such harm.
>> 
>> 
>> Marcy 
>> 
>> "This message may contain confidential and/or privileged 
>> information. If you are not the addressee or authorized to 
>> receive this for the addressee, you must not use, copy, 
>> disclose, or take any action based on this message or any 
>> information herein. If you have received this message in 
>> error, please advise the sender immediately by reply e-mail 
>> and delete this message. Thank you for your cooperation."
>> 
>> 
>> -Original Message-
>> From: The IBM z/VM Operating System 
>> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
>> Sent: Tuesday, September 15, 2009 8:39 AM
>> To: IBMVM@LISTSERV.UARK.EDU
>> Subject: [IBMVM] VM lockup due to storage typo
>> 
>> Does anyone have an idea of how we might have gotten out of 
>> this without an IPL?
>> 
>> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
>> Several guests needed more memory added so the directory was 
>> updated and one by one the guests shutdown, logged off and 
>> back on.  So far, so good.
>> 
>> But... In changing the memory for many guests, and it being 
>> late at night after a long day, while meaning to set a 
>> guest's memory to 9728M, it got set to 9728G.  When that 
>> guest was cycled we see the message on the console that it's 
>> memory was limited to 8TB (HCPLGN093E), then the VM system 
>> appeared to freeze.
>> 
>> We couldn't get in via TCP/IP, or the HMC Operating System 
>> Messages screen, or the HMC Integrated 3270.
>> 
>> Finally had to IPL.   Even that was wierd as I'd have 
>> expected the Load 
>> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 

>> and all came back up ok...
>> 
>> I suspect CP was scrambling paging everything in the world 
>> out as Linux 
>> tried to initialize that 8TB of memory...   But I'm surprised 
>> I couldn't 
>> even get into the HMC consoles (to kill just that one guest 
>> as opposed to all of them)..
>> 
>> Any thoughts?
>> Lee
>> -- 
>> 
>> Lee Stewart, Senior SE
>> Sirius Computer Solutions
>> Phone: (303) 996-7122
>> Email: lee.stew...@siriuscom.com 
>> Web:   www.siriuscom.com
>> 
=
===


Re: VM lockup due to storage typo

2009-09-16 Thread Ron Schmiedge
Unless you set MAXSTORAGE in the profile and used * as the upper limit
in the USER entry. Then if you change the lower limit to be higher
than the setting in the profile, you get an error.

On Wed, Sep 16, 2009 at 3:48 PM, Lee Stewart
 wrote:
> Not really as we were dealing with a lot of guests.  So the only practical
> place to put it would be in a profile.  But according to usage note #1:  A
> maximum storage setting on a USER statement overrides a MAXSTORAGE statement
> in a profile.
>
> So it would have no effect...
>
> Lee
>
> Ron Schmiedge wrote:
>>
>> I've been trying to follow the discussion and wondering if the
>> directory control statement
>>
>> MAXSTORAGE
>>
>> would have provided some protection from the finger check problem?
>
> --
>
> Lee Stewart, Senior SE
> Sirius Computer Solutions
> Phone: (303) 996-7122
> Email: lee.stew...@siriuscom.com
> Web:   www.siriuscom.com
>


Re: VM lockup due to storage typo

2009-09-16 Thread Lee Stewart
Not really as we were dealing with a lot of guests.  So the only 
practical place to put it would be in a profile.  But according to usage 
note #1:  A maximum storage setting on a USER statement overrides a 
MAXSTORAGE statement in a profile.


So it would have no effect...

Lee

Ron Schmiedge wrote:

I've been trying to follow the discussion and wondering if the
directory control statement

MAXSTORAGE

would have provided some protection from the finger check problem?


--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-16 Thread Schuh, Richard
Only if it were included in every directory entry, or at least the one in 
question. Having a global MAXSTORAGE would be better protection.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Ron Schmiedge
> Sent: Wednesday, September 16, 2009 2:20 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> I've been trying to follow the discussion and wondering if 
> the directory control statement
> 
> MAXSTORAGE
> 
> would have provided some protection from the finger check problem?
> 
> 
> 
> On Wed, Sep 16, 2009 at 2:59 PM, Alan Altmark 
>  wrote:
> > On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart 
> >  wrote:
> >> I guess as the one who got bit, I'd offer one easy suggestion...
> >>
> >> The finger check asked for 9728G (9.7+T), VM 
> unceremoniously chopped 
> >> it to 8T as the architecture limit.  Why not have an option (not 
> >> enabled by
> >> default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It 
> >> could take numbers (like the USER storage specification), 
> or OFF to 
> >> indicate no checking.   And maybe something like RSS for 
> Real Storage 
> >> Size to say you can't logon with or define storage to more 
> than the 
> >> amount of Real Storage.
> >>
> >> And if you really wanted a full circle, then a directory 
> option that 
> >> said this one user could override that setting.
> >>
> >> That said I'm kind of swamped for the next two weeks, but 
> after that 
> >> if someone wants to coach me on writing a requirement, I will...
> >
> > For DIRMAINT, look at the DVHXRA/B/C exits to implement 
> whatever kind 
> > of policy limits you like.
> >
> > Alan Altmark
> > z/VM Development
> > IBM Endicott
> >
> 

Re: VM lockup due to storage typo

2009-09-16 Thread Ron Schmiedge
I've been trying to follow the discussion and wondering if the
directory control statement

MAXSTORAGE

would have provided some protection from the finger check problem?



On Wed, Sep 16, 2009 at 2:59 PM, Alan Altmark  wrote:
> On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart
>  wrote:
>> I guess as the one who got bit, I'd offer one easy suggestion...
>>
>> The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it
>> to 8T as the architecture limit.  Why not have an option (not enabled by
>> default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It could
>> take numbers (like the USER storage specification), or OFF to indicate
>> no checking.   And maybe something like RSS for Real Storage Size to say
>> you can't logon with or define storage to more than the amount of Real
>> Storage.
>>
>> And if you really wanted a full circle, then a directory option that
>> said this one user could override that setting.
>>
>> That said I'm kind of swamped for the next two weeks, but after that if
>> someone wants to coach me on writing a requirement, I will...
>
> For DIRMAINT, look at the DVHXRA/B/C exits to implement whatever kind of
> policy limits you like.
>
> Alan Altmark
> z/VM Development
> IBM Endicott
>


Re: VM lockup due to storage typo

2009-09-16 Thread Alan Altmark
On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart 
 wrote:
> I guess as the one who got bit, I'd offer one easy suggestion...
> 
> The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it
> to 8T as the architecture limit.  Why not have an option (not enabled by
> default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It could
> take numbers (like the USER storage specification), or OFF to indicate
> no checking.   And maybe something like RSS for Real Storage Size to say
> you can't logon with or define storage to more than the amount of Real
> Storage.
> 
> And if you really wanted a full circle, then a directory option that
> said this one user could override that setting.
> 
> That said I'm kind of swamped for the next two weeks, but after that if
> someone wants to coach me on writing a requirement, I will...

For DIRMAINT, look at the DVHXRA/B/C exits to implement whatever kind of 
policy limits you like.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Ethan Lanz
On Wed, Sep 16, 2009 at 3:06 PM, Huegel, Thomas  wrote:

> I don't know that I want CP to do anything different than it does now
> EXCEPT I want z/VM to a) keep running and b) have some facility that I
> can use to be able to examine the system to find/fix the problem... I
>

I agree.  The mainframe has a long history of managing over committed
resources, but Linux is presenting new challenges since it was not written
to be virtualized.

Rob noted earlier:
> One of the problems with booting Linux is that it determines the size
> of the virtual machine by testing pages rather than ask CP about it.

It seems to me that this will become a problem in other virtual environments
as well and, similar to the timer tick problem, another opportunity for the
mainframe to show Linux a better way to behave.

If Linux does not use up all available space when it starts, there is
opportunity to monitor and intervene before it gets critical. Then we do not
have to worry about making sure all our virtual blocks fit in the virtual
toy box.


> don't know/care how that get's done, maybe reserving some page space for
> CP and/or a special 'hook' into the HMC.. I'll leave that up to the
> developers.
>
> -Original Message-

> From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
> Behalf Of P S
> Sent: Wednesday, September 16, 2009 12:53 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
>
> On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard 
> wrote:
> > Logon would not be the right or only place to put it. DEF STOR is
> another possible place to err if the maximum storage was too high.
> Perhaps a check of virtual storage at IPL time. That is a common point
> that must be traversed no matter where the error occurred.
>
> Suggest this not get hung up on "But it won't be perfect" ideas. For
> DIRMAINT, perhaps a site configuration option could say "Warn me if a
> userid is defined with either storage limit above x". Similarly, at
> LOGON or DEFINE STORAGE, if the VMsize is > than the total page space
> defined, a warning would be useful.
>
> This doesn't help for aggregate overload (20x1GB with 4GB of page
> space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system
> into the ground before the operator (what operator?) can react, etc.,
> but it would at least give some more informed consent.
>
> In this era of Big Numbers and big Linux guests, this is probably more
> important than it used to be -- in days of yore, if you accidentally
> defined a 32MB guest on an 8MB system, (a) there probably WAS enough
> page space, and (b) the user was probably CMS and wouldn't touch the
> pages that fast anyway.
>

Ethan


Re: VM lockup due to storage typo

2009-09-16 Thread Lee Stewart

I guess as the one who got bit, I'd offer one easy suggestion...

The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it 
to 8T as the architecture limit.  Why not have an option (not enabled by 
default) in the SYSTEM CONFIG file that says Max_Virt_Size.   It could 
take numbers (like the USER storage specification), or OFF to indicate 
no checking.   And maybe something like RSS for Real Storage Size to say 
you can't logon with or define storage to more than the amount of Real 
Storage.


And if you really wanted a full circle, then a directory option that 
said this one user could override that setting.


That said I'm kind of swamped for the next two weeks, but after that if 
someone wants to coach me on writing a requirement, I will...


Lee

Alan Altmark wrote:

On Wednesday, 09/16/2009 at 09:14 EDT, RPN01  wrote:
I don't think, in this case, it is the user causing the problem at all. 

The
user didn't define their storage allocation, and in practice can't do 

that
at all. So the user didn't set up the situation which caused the 

integrity

issue, the system administrator did.


That was my point to Marcy: Not an integrity problem.  The system is 
obeying the sysadmin's instructions.



To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the 

gun

at his toes.


DIRECTXA has no context in which to make such warnings.  Placing limits at 
LOGON would only apply to resource availability to hold the needed control 
structures.  When the guest begins to run and actually use all that 
memory, then another line of defense is needed.


Alan Altmark
z/VM Development
IBM Endicott




--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-16 Thread David Boyes
On 9/15/09 12:09 PM, "Daniel P. Martin"  wrote:

> *cough*SHARE requirement?*cough*

WAVV requirement WRIBDB04 submitted.

I suggested a SYSTEM CONFIG option and corresponding SET command to warn
user/operator and optionally halt IPL if a user requested LOGON or issued an
IPL command with a default VM size greater than the sum of real memory and
configured PAGE space. Normal setting would be MEMSANITY ON, but the SET
MEMSANITY OFF command would still allow experienced admins to shoot
themselves in the foot if necessary.

IBM: Since I seem to be Designated Requirements Dude these days, maybe you
should just give me direct login access to the requirements DB. It'd save
time, and you'd get requirements earlier in the planning cycle. 8-)

-- db


Re: VM lockup due to storage typo

2009-09-16 Thread Huegel, Thomas
I don't know that I want CP to do anything different than it does now
EXCEPT I want z/VM to a) keep running and b) have some facility that I
can use to be able to examine the system to find/fix the problem... I
don't know/care how that get's done, maybe reserving some page space for
CP and/or a special 'hook' into the HMC.. I'll leave that up to the
developers.   

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of P S
Sent: Wednesday, September 16, 2009 12:53 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard 
wrote:
> Logon would not be the right or only place to put it. DEF STOR is
another possible place to err if the maximum storage was too high.
Perhaps a check of virtual storage at IPL time. That is a common point
that must be traversed no matter where the error occurred.

Suggest this not get hung up on "But it won't be perfect" ideas. For
DIRMAINT, perhaps a site configuration option could say "Warn me if a
userid is defined with either storage limit above x". Similarly, at
LOGON or DEFINE STORAGE, if the VMsize is > than the total page space
defined, a warning would be useful.

This doesn't help for aggregate overload (20x1GB with 4GB of page
space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system
into the ground before the operator (what operator?) can react, etc.,
but it would at least give some more informed consent.

In this era of Big Numbers and big Linux guests, this is probably more
important than it used to be -- in days of yore, if you accidentally
defined a 32MB guest on an 8MB system, (a) there probably WAS enough
page space, and (b) the user was probably CMS and wouldn't touch the
pages that fast anyway.


Re: VM lockup due to storage typo

2009-09-16 Thread P S
On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard  wrote:
> Logon would not be the right or only place to put it. DEF STOR is another 
> possible place to err if the maximum storage was too high. Perhaps a check of 
> virtual storage at IPL time. That is a common point that must be traversed no 
> matter where the error occurred.

Suggest this not get hung up on "But it won't be perfect" ideas. For
DIRMAINT, perhaps a site configuration option could say "Warn me if a
userid is defined with either storage limit above x". Similarly, at
LOGON or DEFINE STORAGE, if the VMsize is > than the total page space
defined, a warning would be useful.

This doesn't help for aggregate overload (20x1GB with 4GB of page
space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system
into the ground before the operator (what operator?) can react, etc.,
but it would at least give some more informed consent.

In this era of Big Numbers and big Linux guests, this is probably more
important than it used to be -- in days of yore, if you accidentally
defined a 32MB guest on an 8MB system, (a) there probably WAS enough
page space, and (b) the user was probably CMS and wouldn't touch the
pages that fast anyway.


Re: VM lockup due to storage typo

2009-09-16 Thread Schuh, Richard
Logon would not be the right or only place to put it. DEF STOR is another 
possible place to err if the maximum storage was too high. Perhaps a check of 
virtual storage at IPL time. That is a common point that must be traversed no 
matter where the error occurred. 

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
> Sent: Wednesday, September 16, 2009 10:20 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 
>  wrote:
> > I don't think, in this case, it is the user causing the 
> problem at all. 
> The
> > user didn't define their storage allocation, and in 
> practice can't do
> that
> > at all. So the user didn't set up the situation which caused the
> integrity
> > issue, the system administrator did.
> 
> That was my point to Marcy: Not an integrity problem.  The 
> system is obeying the sysadmin's instructions.
> 
> > To my mind, if this requires addressing, it should be in 
> the DIRECTXA 
> > command, so as to help the system administrator in avoiding 
> aiming the
> gun
> > at his toes.
> 
> DIRECTXA has no context in which to make such warnings.  
> Placing limits at LOGON would only apply to resource 
> availability to hold the needed control structures.  When the 
> guest begins to run and actually use all that memory, then 
> another line of defense is needed.
> 
> Alan Altmark
> z/VM Development
> IBM Endicott
> 

Re: VM lockup due to storage typo

2009-09-16 Thread Alan Altmark
On Wednesday, 09/16/2009 at 09:14 EDT, RPN01  wrote:
> I don't think, in this case, it is the user causing the problem at all. 
The
> user didn't define their storage allocation, and in practice can't do 
that
> at all. So the user didn't set up the situation which caused the 
integrity
> issue, the system administrator did.

That was my point to Marcy: Not an integrity problem.  The system is 
obeying the sysadmin's instructions.

> To my mind, if this requires addressing, it should be in the DIRECTXA
> command, so as to help the system administrator in avoiding aiming the 
gun
> at his toes.

DIRECTXA has no context in which to make such warnings.  Placing limits at 
LOGON would only apply to resource availability to hold the needed control 
structures.  When the guest begins to run and actually use all that 
memory, then another line of defense is needed.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Tom Duerbusch
If you bought the Dirmaint product or a simular product from another vender, 
couldn't a rule be setup to prevent this?

Anyway, there is not gonna be a way of preventing a systems programmer from 
doing anything we do.  We are suppose to be thinking.

For example, when I initialize, format or copy to a pack, I go thru, at least 3 
checks to make sure I have not transpose the CUA.  Saved me a lot of times.

A system programmer IS dangerous.  We can shutdown the system.  We can destroy 
the system (and then go peacefully in retirement).

You can't fix stupid and we are all, occassionaly, stupid.

Now you had this kind of problem, we all should learn from it.  After defining 
a new guest, log on to that guest and do a Q V ALL and see if it is right.

Been there, done that.

Tom Duerbusch
THD Consulting

Sent via BlackBerry by AT&T

-Original Message-
From: RPN01 

Date: Wed, 16 Sep 2009 08:13:57 
To: 
Subject: Re: VM lockup due to storage typo


I don't think, in this case, it is the user causing the problem at all. The
user didn't define their storage allocation, and in practice can't do that
at all. So the user didn't set up the situation which caused the integrity
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,
decisions are left to him. The system doesn't question what he does, within
the definition of the syntax, semantics and limitations of the directory
entries and commands. If you want to define a large virtual machine, should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to add
more page space before logging in the new directory entry. Maybe a warning
of "User  exceeds paging space" could have averted this situation, but
again, each user would have to be checked against the running system. It
shouldn't keep you from creating the entry, just let you know that there
might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the gun
at his toes.

-- 
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
"In theory, theory and practice are the same, but
 in practice, theory and practice are different."




On 9/15/09 3:44 PM, "Alan Altmark"  wrote:

> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
> wrote:
>> I agree with that ("the guest cannot be allowed to harm CP") but has
> that
>> actually been formally - or even informally - accepted by the Powers
> That
>> Be?
> 
> Yes, it is in the Statement of System Integrity in the General Information
> Manual.
> 
>> I ask because I still remember, as though it were yesterday, opening a
>> security/integrity APAR against VM back in the mid-1980's because any
>> class G user could knock CP down by defining a shared and a nonshared
>> device on the same virtual control unit, and being told that that was
> NOT
>> a security or integrity issue, and that no fix would be forthcoming.
> 
> Under "today's" rules, that would be an Integrity problem.
> 
> o If a class G (only) user can repeatedly or with malice of forethought
> hang or abend CP, it WILL be classified as an integrity problem (denial of
> service).
> 
> o If a class G user happens to do something that triggers an abend or hang
> due to a "system malfunction", it will NOT be classified as an integrity
> problem.
> 
> o If the system abends or hangs because it is overloaded (memory, CPU), it
> will NOT be classified as an integrity problem.
> 
> o Just because it isn't an integrity problem doesn't mean it isn't a
> defect.
> 
> Alan Altmark
> z/VM Development
> IBM Endicott


Re: VM lockup due to storage typo

2009-09-16 Thread Rob van der Heij
This gun has been pointing in the same direction forever, but it *is*
a fact that with 64-bit CP the bullets are a lot bigger.

I am sure folks in Edicott are as creative as most of us (or worse,
take a look at ... ;-)  but we know that any safety that CP adds will
annoy people because they forgot to disable it when they still had the
option to do so, or because they drive with the safety off all day
anyway (how many are not using highly privileged CP userid for things
that don't need it - and really, it *is* dangerous)

The problem with the suggested check is that it is stronger than what
most people need. Also, the check is likely to be unfair (aiming at
the wrong victim) and potentially cause a Denial of Service. Would you
want MAINT unable to logon because that 5th Linux guest now logged on
(and you could only add the page pack if you could logon...)  So we
need an option for some users to override it, or an option to enforce
the check only for some users. One means that you may forget the
option, and the other means that within weeks people will ask "why
can't I logon my Linux guest" and the word will spread that you need
to issue a SET SRM OVERCOMM .

Linux has a similar check in that a process can't allocate more
virtual memory than you have available (in main and on swap, or you
get out-of-memory). This ensures that this process could eventually
get all it asks for. But when it does not immediately reference that
memory, it appears to be still available when the next process
allocates memory. So the check is pretty useless and does not protect
you at all.

I don't do operational work these days, so feel on the peanut gallery.
Maybe I grew up in a rather unique shop (or maybe staff reductions
have gotten rid of that luxury there too) but we had pretty strict
rules to minimize mistakes. Most configuration changes would be
checked by another pair of eyes or some code. Configuration files to
be replaced ran through XDIFF to inspect the changes. The nucleus map
was scanned for text decks picked up from the A-disk, etc. Various
health checks ran to compare RACF and the directory, check for certain
disks filling up, and many more. With CMS Pipelines it is often easy
to get an extra pair of eyes oversee your actions.

Rob


Re: VM lockup due to storage typo

2009-09-16 Thread Brian Nielsen
And you also have to check during DEFINE STORAGE, DEFINE FB-512, and any 

other command or function that creates a pagable CP structure.

Brian Nielsen

On Wed, 16 Sep 2009 09:03:43 -0500, Mike Walter  

wrote:

>I can't support DIRECTXA as the sole examination.  Paging volumes can be

>added at any time.  DIRECTXA only gets a change to look when it is run.
>
>If this even needs to be addressed (hence, this thoughtful thread), IMHO

>comparing the min and max virtual machine memory specification would be
>better done when the virtual machine is being built during
>logon/autolog/xautolog.
>
>OTOH, it would not hurt to have DIRECTXA provide that early warning so
>that when one finally does attempt to create the virtual machine, any
>typos might already have been displayed and corrected when DIRECTXA
>provided an early warning.  It's just plain embarrassing for an existing

>virtual machine to cause a problem because the sysprog made a wild (or
>uninformed) keystroke while editing the directory source ... another
>source of sysprog "collateral damage".
>
>Mike Walter
>Hewitt Associates
>The opinions expressed herein are mine alone, not my employer's.
>
>
>
>RPN01 
>
>Sent by: "The IBM z/VM Operating System" 
>09/16/2009 08:13 AM
>Please respond to
>"The IBM z/VM Operating System" 
>
>
>
>To
>IBMVM@LISTSERV.UARK.EDU
>cc
>
>Subject
>Re: VM lockup due to storage typo
>
>
>
>
>
>
>I don't think, in this case, it is the user causing the problem at all.
>The
>user didn't define their storage allocation, and in practice can't do th
at
>at all. So the user didn't set up the situation which caused the integri
ty
>issue, the system administrator did.
>
>The system administrator is in control of the CP Directory, and as such,

>decisions are left to him. The system doesn't question what he does,
>within
>the definition of the syntax, semantics and limitations of the directory

>entries and commands. If you want to define a large virtual machine,
>should
>the system question your authority?
>
>The system could check the memory and page space against each directory
>entry as the binary directory is built, but this would add time to the
>directory build, and does not account for the situation of planning to a
dd
>more page space before logging in the new directory entry. Maybe a warni
ng
>of "User  exceeds paging space" could have averted this situation, b
ut
>again, each user would have to be checked against the running system. It

>shouldn't keep you from creating the entry, just let you know that there

>might be an issue if you actually use it.
>
>To my mind, if this requires addressing, it should be in the DIRECTXA
>command, so as to help the system administrator in avoiding aiming the g
un
>at his toes.
>
>--
>Robert P. Nix  Mayo Foundation.~.
>RO-OE-5-55 200 First Street SW/V\
>507-284-0844   Rochester, MN 55905   /( )\
>-^^-^^
>"In theory, theory and practice are the same, but
> in practice, theory and practice are different."
>
>
>
>
>On 9/15/09 3:44 PM, "Alan Altmark"  wrote:
>
>> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak
>
>> wrote:
>>> I agree with that ("the guest cannot be allowed to harm CP") but has
>> that
>>> actually been formally - or even informally - accepted by the Powers
>> That
>>> Be?
>>
>> Yes, it is in the Statement of System Integrity in the General
>Information
>> Manual.
>>
>>> I ask because I still remember, as though it were yesterday, opening 
a
>>> security/integrity APAR against VM back in the mid-1980's because any

>>> class G user could knock CP down by defining a shared and a nonshared

>>> device on the same virtual control unit, and being told that that was

>> NOT
>>> a security or integrity issue, and that no fix would be forthcoming.
>>
>> Under "today's" rules, that would be an Integrity problem.
>>
>> o If a class G (only) user can repeatedly or with malice of forethough
t
>> hang or abend CP, it WILL be classified as an integrity problem (denia
l
>of
>> service).
>>
>> o If a class G user happens to do something that triggers an abend or
>hang
>> due to a "system malfunction", it will NOT be classified as an integri
ty
>> problem.
>>
>> o If the system abends or hangs because it is overloaded (memory, CPU)
,
>it
>> will NOT be classified as an integrity problem.
>>
>&g

Re: VM lockup due to storage typo

2009-09-16 Thread Mike Walter
I can't support DIRECTXA as the sole examination.  Paging volumes can be 
added at any time.  DIRECTXA only gets a change to look when it is run. 

If this even needs to be addressed (hence, this thoughtful thread), IMHO 
comparing the min and max virtual machine memory specification would be 
better done when the virtual machine is being built during 
logon/autolog/xautolog. 

OTOH, it would not hurt to have DIRECTXA provide that early warning so 
that when one finally does attempt to create the virtual machine, any 
typos might already have been displayed and corrected when DIRECTXA 
provided an early warning.  It's just plain embarrassing for an existing 
virtual machine to cause a problem because the sysprog made a wild (or 
uninformed) keystroke while editing the directory source ... another 
source of sysprog "collateral damage".

Mike Walter
Hewitt Associates
The opinions expressed herein are mine alone, not my employer's.



RPN01  

Sent by: "The IBM z/VM Operating System" 
09/16/2009 08:13 AM
Please respond to
"The IBM z/VM Operating System" 



To
IBMVM@LISTSERV.UARK.EDU
cc

Subject
Re: VM lockup due to storage typo






I don't think, in this case, it is the user causing the problem at all. 
The
user didn't define their storage allocation, and in practice can't do that
at all. So the user didn't set up the situation which caused the integrity
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,
decisions are left to him. The system doesn't question what he does, 
within
the definition of the syntax, semantics and limitations of the directory
entries and commands. If you want to define a large virtual machine, 
should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to add
more page space before logging in the new directory entry. Maybe a warning
of "User  exceeds paging space" could have averted this situation, but
again, each user would have to be checked against the running system. It
shouldn't keep you from creating the entry, just let you know that there
might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the gun
at his toes.

-- 
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
"In theory, theory and practice are the same, but
 in practice, theory and practice are different."




On 9/15/09 3:44 PM, "Alan Altmark"  wrote:

> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 

> wrote:
>> I agree with that ("the guest cannot be allowed to harm CP") but has
> that
>> actually been formally - or even informally - accepted by the Powers
> That
>> Be?
> 
> Yes, it is in the Statement of System Integrity in the General 
Information
> Manual.
> 
>> I ask because I still remember, as though it were yesterday, opening a
>> security/integrity APAR against VM back in the mid-1980's because any
>> class G user could knock CP down by defining a shared and a nonshared
>> device on the same virtual control unit, and being told that that was
> NOT
>> a security or integrity issue, and that no fix would be forthcoming.
> 
> Under "today's" rules, that would be an Integrity problem.
> 
> o If a class G (only) user can repeatedly or with malice of forethought
> hang or abend CP, it WILL be classified as an integrity problem (denial 
of
> service).
> 
> o If a class G user happens to do something that triggers an abend or 
hang
> due to a "system malfunction", it will NOT be classified as an integrity
> problem.
> 
> o If the system abends or hangs because it is overloaded (memory, CPU), 
it
> will NOT be classified as an integrity problem.
> 
> o Just because it isn't an integrity problem doesn't mean it isn't a
> defect.
> 
> Alan Altmark
> z/VM Development
> IBM Endicott






The information contained in this e-mail and any accompanying documents may 
contain information that is confidential or otherwise protected from 
disclosure. If you are not the intended recipient of this message, or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message, including any attachments. Any 
dissemination, distribution or other use of the contents of this message by 
anyone other than the intended recipient is 

Re: VM lockup due to storage typo

2009-09-16 Thread RPN01
I don't think, in this case, it is the user causing the problem at all. The
user didn't define their storage allocation, and in practice can't do that
at all. So the user didn't set up the situation which caused the integrity
issue, the system administrator did.

The system administrator is in control of the CP Directory, and as such,
decisions are left to him. The system doesn't question what he does, within
the definition of the syntax, semantics and limitations of the directory
entries and commands. If you want to define a large virtual machine, should
the system question your authority?

The system could check the memory and page space against each directory
entry as the binary directory is built, but this would add time to the
directory build, and does not account for the situation of planning to add
more page space before logging in the new directory entry. Maybe a warning
of "User  exceeds paging space" could have averted this situation, but
again, each user would have to be checked against the running system. It
shouldn't keep you from creating the entry, just let you know that there
might be an issue if you actually use it.

To my mind, if this requires addressing, it should be in the DIRECTXA
command, so as to help the system administrator in avoiding aiming the gun
at his toes.

-- 
Robert P. Nix  Mayo Foundation.~.
RO-OE-5-55 200 First Street SW/V\
507-284-0844   Rochester, MN 55905   /( )\
-^^-^^
"In theory, theory and practice are the same, but
 in practice, theory and practice are different."




On 9/15/09 3:44 PM, "Alan Altmark"  wrote:

> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
> wrote:
>> I agree with that ("the guest cannot be allowed to harm CP") but has
> that
>> actually been formally - or even informally - accepted by the Powers
> That
>> Be?
> 
> Yes, it is in the Statement of System Integrity in the General Information
> Manual.
> 
>> I ask because I still remember, as though it were yesterday, opening a
>> security/integrity APAR against VM back in the mid-1980's because any
>> class G user could knock CP down by defining a shared and a nonshared
>> device on the same virtual control unit, and being told that that was
> NOT
>> a security or integrity issue, and that no fix would be forthcoming.
> 
> Under "today's" rules, that would be an Integrity problem.
> 
> o If a class G (only) user can repeatedly or with malice of forethought
> hang or abend CP, it WILL be classified as an integrity problem (denial of
> service).
> 
> o If a class G user happens to do something that triggers an abend or hang
> due to a "system malfunction", it will NOT be classified as an integrity
> problem.
> 
> o If the system abends or hangs because it is overloaded (memory, CPU), it
> will NOT be classified as an integrity problem.
> 
> o Just because it isn't an integrity problem doesn't mean it isn't a
> defect.
> 
> Alan Altmark
> z/VM Development
> IBM Endicott


Re: VM lockup due to storage typo

2009-09-15 Thread Kris Buelens
2009/9/15 Schuh, Richard 

> The same might be said for page space. Someone could access a dataspace
> enabled directory and take up page space. We could easily take up 48G of
> page space here by starting 24 machines that each access different d/s
> directories at 2G each.


Dataspace enabled directories are not paged out to paging space; the CP
paging operations for it are issued against the minidisks of the SFS
servers; neither are all dataspace pages brought in storage at the moment of
ACCESS.  The SFS dataspaces are called "mapped dataspaces".  A small
exception: the structures holding the FST blocks, they are not mapped to the
SFS server minidisks, they can page paged out to CP space (and obviously
CP's page management blocks occupy some storage too).
DB2/VM at the other hand, it can also use non-mapped dataspaces.

-- 
Kris Buelens,
IBM Belgium, VM customer support


Re: VM lockup due to storage typo

2009-09-15 Thread Alan Altmark
On Tuesday, 09/15/2009 at 04:50 EDT, Marcy Cortes 
 wrote:
> So are you saying that what Lee and I both did to shoot our systems 
should 
> APAR'able?  Or should it be a requirement?  Or is it going to be a "your 
gun, 
> your foot" answer?

I was just answering the "Is it an integrity problem?" question:  No, it 
isn't an integrity problem.  The sysadmin did something that ultimately 
caused the system to lock up.  (That doesn't mean it was the sysadmin's 
fault, however.)

If you feel you have found a defect, open a PMR.  That's how you find out 
if something is really APARable.  :-)

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
> 
> One of the problems with booting Linux is that it determines 
> the size of the virtual machine by testing pages rather than 
> ask CP about it.

It only took TPF and its predecessors 35 years to get this right. :-)
Way back in VM/370 R3 I had a diag that could be used. We did talk 
the ACP Systems folks at TWA into using the diag instead of touching
Every page. We also had a mod in SVS to do the same (among other things).

 
> If I remember right, it tries the first page of every 
> architectured segment. 

It could be worse. Earlier systems (OS/360, MVS line of systems, ACP,   
VM, etc.) touched every page. The touching was usually done by setting
the storage key. 


Regards, 
Richard Schuh 

Re: VM lockup due to storage typo

2009-09-15 Thread Rob van der Heij
On Tue, Sep 15, 2009 at 11:18 PM, Robert J Brenneman  wrote:

> Admittedly - not 8TB in a 200G box, as Lee tried to do, and it was on
> z/VM 5.1, so it didn't have the system execution space stuff of later
> z/VM releases. It did teach the lesson that more page packs can only
> get you so far. At some point the system data structures needed to
> support the enormous guest just wont fit. This may be a reasonable
> calculation to make within CP as a sanity check.

If a factor of 2 does not make a difference, then try an order of
magnitude. :-)

One of the problems with booting Linux is that it determines the size
of the virtual machine by testing pages rather than ask CP about it.
If I remember right, it tries the first page of every architectured
segment. And to make it worse, it uses a test that also forces CP to
initialize the page frame. Which means that CP must also allocate a
PGMBK to hold the page tables to span that segment. So for each MB of
virtual machine storage, 3 pages must be allocated.
When I get the math right, the 8 TB virtual machine will very quickly
require 96 GB worth of page frames. That needs to come from
somewhere... A decent paging subsystem can fill up a single 3390-3 in
a minute or two.

And although we tell people that you need to add one 3390-3 page pack
for every GB of Linux server you define, there's still folks who think
we talk nonsense because with the first few Linux guests their z/VM
system did not page at all. But once you do start to page, page space
utilization growth is not subtle. It's more like shifting your cup of
coffee towards the edge of the table.

Rob


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
We all know that they are not M$ and we are glad they aren't.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Huegel, Thomas
> Sent: Tuesday, September 15, 2009 2:18 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> I would think that IBM would be scurring to fix what is 
> obviously a problem.
> After all they are not Microsoft... 
> 
> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard
> Sent: Tuesday, September 15, 2009 4:13 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> Seems to me that he said it was either an integrity problem 
> or a defect.
> I would think that either would me meat for the APAR grinder.
> 
> Regards,
> Richard Schuh 
> 
>  
> 
> > -Original Message-
> > From: The IBM z/VM Operating System
> > [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
> > Sent: Tuesday, September 15, 2009 1:50 PM
> > To: IBMVM@LISTSERV.UARK.EDU
> > Subject: Re: VM lockup due to storage typo
> > 
> > So are you saying that what Lee and I both did to shoot our systems 
> > should APAR'able?  Or should it be a requirement?  Or is it 
> going to 
> > be a "your gun, your foot" answer?
> > 
> > 
> > Marcy
> >  
> > "This message may contain confidential and/or privileged 
> information. 
> > If you are not the addressee or authorized to receive this for the 
> > addressee, you must not use, copy, disclose, or take any 
> action based 
> > on this message or any information herein. If you have 
> received this 
> > message in error, please advise the sender immediately by 
> reply e-mail
> 
> > and delete this message. Thank you for your cooperation."
> > 
> > 
> > -Original Message-
> > From: The IBM z/VM Operating System
> > [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
> > Sent: Tuesday, September 15, 2009 1:45 PM
> > To: IBMVM@LISTSERV.UARK.EDU
> > Subject: Re: [IBMVM] VM lockup due to storage typo
> > 
> > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
> > 
> > wrote:
> > > I agree with that ("the guest cannot be allowed to harm 
> CP") but has
> > that
> > > actually been formally - or even informally - accepted by 
> the Powers
> > That
> > > Be?
> > 
> > Yes, it is in the Statement of System Integrity in the General 
> > Information Manual.
> > 
> > > I ask because I still remember, as though it were
> > yesterday, opening a
> > > security/integrity APAR against VM back in the mid-1980's
> > because any
> > > class G user could knock CP down by defining a shared and a
> > nonshared
> > > device on the same virtual control unit, and being told
> > that that was
> > NOT
> > > a security or integrity issue, and that no fix would be 
> forthcoming.
> > 
> > Under "today's" rules, that would be an Integrity problem.
> > 
> > o If a class G (only) user can repeatedly or with malice of 
> > forethought hang or abend CP, it WILL be classified as an integrity 
> > problem (denial of service).
> > 
> > o If a class G user happens to do something that triggers 
> an abend or 
> > hang due to a "system malfunction", it will NOT be classified as an 
> > integrity problem.
> > 
> > o If the system abends or hangs because it is overloaded (memory, 
> > CPU), it will NOT be classified as an integrity problem.
> > 
> > o Just because it isn't an integrity problem doesn't mean 
> it isn't a 
> > defect.
> > 
> > Alan Altmark
> > z/VM Development
> > IBM Endicott
> > 
> 

Re: VM lockup due to storage typo

2009-09-15 Thread Robert J Brenneman
I've tried wacky things like this before to see if I could run a 250G
guest on an lpar with ~140GB of memory and oodles of page space,
running z/VM 5.1

It came up, the guest initialized and Linux IPLed fine. It didn't have
a problem till I started running a memory thrasher in the Linux guest.
It sucked up all available memory and VM started paging, as you'd
guess. It kept making progress till it had used about 20% of the
paging space, but eventually VM itself started thrashing in its memory
management routines. Like a %SY of 500 or so  ( 5 processors running
memory management stuff?? ) I'd guess that VM itself ran out of space
below the 2G bar for page tables or something along that line. It
never abended though - it just thrashed itself for days.

Admittedly - not 8TB in a 200G box, as Lee tried to do, and it was on
z/VM 5.1, so it didn't have the system execution space stuff of later
z/VM releases. It did teach the lesson that more page packs can only
get you so far. At some point the system data structures needed to
support the enormous guest just wont fit. This may be a reasonable
calculation to make within CP as a sanity check.


-- 
Jay Brenneman


Re: VM lockup due to storage typo (OT)

2009-09-15 Thread Schuh, Richard
Marcy,

Did you get to attend any of those parties at the Malibu mansion?

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
> Sent: Tuesday, September 15, 2009 2:16 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
>  
> >Gee, I guess we're in good company!   ;-)
> You betcha! (I'm in MN today, I can say that).
> 
> At least mine was a test/dev system :)  If had done it to a 
> prod system, I'm sure someone here would have had IBM 
> answering questions ...  It's one of those things that fell 
> down low on the to pursue list - bigger fish frying.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 

Re: VM lockup due to storage typo

2009-09-15 Thread Huegel, Thomas
I would think that IBM would be scurring to fix what is obviously a
problem.
After all they are not Microsoft... 

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 4:13 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Seems to me that he said it was either an integrity problem or a defect.
I would think that either would me meat for the APAR grinder.

Regards,
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System
> [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
> Sent: Tuesday, September 15, 2009 1:50 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> So are you saying that what Lee and I both did to shoot our systems 
> should APAR'able?  Or should it be a requirement?  Or is it going to 
> be a "your gun, your foot" answer?
> 
> 
> Marcy
>  
> "This message may contain confidential and/or privileged information. 
> If you are not the addressee or authorized to receive this for the 
> addressee, you must not use, copy, disclose, or take any action based 
> on this message or any information herein. If you have received this 
> message in error, please advise the sender immediately by reply e-mail

> and delete this message. Thank you for your cooperation."
> 
> 
> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
> Sent: Tuesday, September 15, 2009 1:45 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: [IBMVM] VM lockup due to storage typo
> 
> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
> 
> wrote:
> > I agree with that ("the guest cannot be allowed to harm CP") but has
> that
> > actually been formally - or even informally - accepted by the Powers
> That
> > Be?
> 
> Yes, it is in the Statement of System Integrity in the 
> General Information Manual.
> 
> > I ask because I still remember, as though it were 
> yesterday, opening a 
> > security/integrity APAR against VM back in the mid-1980's 
> because any 
> > class G user could knock CP down by defining a shared and a 
> nonshared 
> > device on the same virtual control unit, and being told 
> that that was
> NOT
> > a security or integrity issue, and that no fix would be forthcoming.
> 
> Under "today's" rules, that would be an Integrity problem.
> 
> o If a class G (only) user can repeatedly or with malice of 
> forethought hang or abend CP, it WILL be classified as an 
> integrity problem (denial of service).
> 
> o If a class G user happens to do something that triggers an 
> abend or hang due to a "system malfunction", it will NOT be 
> classified as an integrity problem.
> 
> o If the system abends or hangs because it is overloaded 
> (memory, CPU), it will NOT be classified as an integrity problem.
> 
> o Just because it isn't an integrity problem doesn't mean it 
> isn't a defect.
> 
> Alan Altmark
> z/VM Development
> IBM Endicott
> 


Re: VM lockup due to storage typo

2009-09-15 Thread Marcy Cortes
 
>Gee, I guess we're in good company!   ;-)
You betcha! (I'm in MN today, I can say that).

At least mine was a test/dev system :)  If had done it to a prod system, I'm 
sure someone here would have had IBM answering questions ...  It's one of those 
things that fell down low on the to pursue list - bigger fish frying.


Marcy 

"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
Seems to me that he said it was either an integrity problem or a defect. I 
would think that either would me meat for the APAR grinder.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes
> Sent: Tuesday, September 15, 2009 1:50 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> So are you saying that what Lee and I both did to shoot our 
> systems should APAR'able?  Or should it be a requirement?  Or 
> is it going to be a "your gun, your foot" answer?
> 
> 
> Marcy 
>  
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark
> Sent: Tuesday, September 15, 2009 1:45 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: [IBMVM] VM lockup due to storage typo
> 
> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak 
> 
> wrote:
> > I agree with that ("the guest cannot be allowed to harm CP") but has
> that
> > actually been formally - or even informally - accepted by the Powers
> That
> > Be?
> 
> Yes, it is in the Statement of System Integrity in the 
> General Information Manual.
> 
> > I ask because I still remember, as though it were 
> yesterday, opening a 
> > security/integrity APAR against VM back in the mid-1980's 
> because any 
> > class G user could knock CP down by defining a shared and a 
> nonshared 
> > device on the same virtual control unit, and being told 
> that that was
> NOT
> > a security or integrity issue, and that no fix would be forthcoming.
> 
> Under "today's" rules, that would be an Integrity problem.
> 
> o If a class G (only) user can repeatedly or with malice of 
> forethought hang or abend CP, it WILL be classified as an 
> integrity problem (denial of service).
> 
> o If a class G user happens to do something that triggers an 
> abend or hang due to a "system malfunction", it will NOT be 
> classified as an integrity problem.
> 
> o If the system abends or hangs because it is overloaded 
> (memory, CPU), it will NOT be classified as an integrity problem.
> 
> o Just because it isn't an integrity problem doesn't mean it 
> isn't a defect.
> 
> Alan Altmark
> z/VM Development
> IBM Endicott
> 

Re: VM lockup due to storage typo

2009-09-15 Thread John P. Baker
First, since CP should know at all times how much space of each category
(PAGE, SPOL, etc.) is allocated, it should be able to immediately reject any
request (LOGON, DEFINE STOR, etc.) where the amount of storage requested
exceeds the amount of secondary storage configured.

Second, since CP "should" know at all times how much space of each category
(PAGE, SPOL, etc.) is in use, it should be able to immediately reject any
request (LOGON, DEFINE STOR, etc.) where the amount of storage requested
exceeds the amount of secondary storage available.

If this is not happening, I would argue that the situation should be
APAR'able as a system integrity bug.

Now, we can debate whether pages allocated, but not used, should be counted.
Should such pages require secondary storage backing availability, or should
secondary storage backing availability be required only when the page is
used?  Should this be a system configurable option?  Should this be a
virtual machine configurable option?

John P. Baker

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Lee Stewart
Sent: Tuesday, September 15, 2009 4:56 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Gee, I guess we're in good company!   ;-)

It does seem to me that CP should be smart enough to look at a 175GB 
real storage, 4GB Xstor, and xx number of page packs and say not in our 
wildest dreams can we run an 8TB virtual guest...

Or maybe at the point that the 8TB guest starts choking off all other 
activity and wildly filling page space

Lee


Re: VM lockup due to storage typo

2009-09-15 Thread Lee Stewart

Gee, I guess we're in good company!   ;-)

It does seem to me that CP should be smart enough to look at a 175GB 
real storage, 4GB Xstor, and xx number of page packs and say not in our 
wildest dreams can we run an 8TB virtual guest...


Or maybe at the point that the 8TB guest starts choking off all other 
activity and wildly filling page space


Lee

Marcy Cortes wrote:

See a thread on this list with subject "Sanity check?" from Oct 2007 for what 
happened when I did the same thing ;)

You probably filled page space.

I still think IBM should refuse to IPL a guest that will cause such harm.


Marcy 


"This message may contain confidential and/or privileged information. If you are not 
the addressee or authorized to receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply e-mail and 
delete this message. Thank you for your cooperation."


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.


But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..


Any thoughts?
Lee


--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com
Web:   www.siriuscom.com


Re: VM lockup due to storage typo

2009-09-15 Thread Lee Stewart
From the tn3270 sessions hanging to the phone call to me - 2-3 minutes. 
 From then till we decided we had to IPL - maybe 15-20 minutes.  But 30 
minutes (maybe 45-60 till all the apps were back up) on a major online 
system is a lot.   It was 35 minutes from the message capping the 
virtual storage at 8TB till the IPL time from Q CPLEVEL.  So no, not 
long considering the size.  And yes, I suspect it would PGT004 eventually.


And yes, if CP unceremoniously chopped my wrong size from 9.7TB to 8TB, 
why could it not do the same to either a user specified system limit or 
a "this is the biggest machine this CP can run in this configuration"...


Lee

Gentry, Stephen wrote:

What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is

that the guest cannot be allowed to harm the CP. This clearly violates
that.

Regards, 
Richard Schuh 

 


-Original Message-
From: The IBM z/VM Operating System 
[mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch

Sent: Tuesday, September 15, 2009 9:19 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

CP wouldn't know at IPL time, the guest would, not could, but 
would cause such harm.


Just because you say you can use xxx GB, doesn't mean you 
would actually use them.


When page fills, it over flows to spool.
When spool fills, CP abends on the next pageout.

Tom Duerbusch
THD Consulting

Marcy Cortes  9/15/2009 

11:02 AM >>>
See a thread on this list with subject "Sanity check?" from 
Oct 2007 for what happened when I did the same thing ;)


You probably filled page space.

I still think IBM should refuse to IPL a guest that will 
cause such harm.



Marcy 

"This message may contain confidential and/or privileged 
information. If you are not the addressee or authorized to 
receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any 
information herein. If you have received this message in 
error, please advise the sender immediately by reply e-mail 
and delete this message. Thank you for your cooperation."



-Original Message-
From: The IBM z/VM Operating System 
[mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart

Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of 
this without an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was 
updated and one by one the guests shutdown, logged off and 
back on.  So far, so good.


But... In changing the memory for many guests, and it being 
late at night after a long day, while meaning to set a 
guest's memory to 9728M, it got set to 9728G.  When that 
guest was cycled we see the message on the console that it's 
memory was limited to 8TB (HCPLGN093E), then the VM system 
appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System 
Messages screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have 
expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world 
out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised 
I couldn't 
even get into the HMC consoles (to kill just that one guest 
as opposed to all of them)..


Any thoughts?
Lee
--

Lee Stewart, Senior SE
Sirius Computer Solutions
Phone: (303) 996-7122
Email: lee.stew...@siriuscom.com 
Web:   www.si

Re: VM lockup due to storage typo

2009-09-15 Thread Marcy Cortes
So are you saying that what Lee and I both did to shoot our systems should 
APAR'able?  Or should it be a requirement?  Or is it going to be a "your gun, 
your foot" answer?


Marcy 
 
"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Alan Altmark
Sent: Tuesday, September 15, 2009 1:45 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: [IBMVM] VM lockup due to storage typo

On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak  
wrote:
> I agree with that ("the guest cannot be allowed to harm CP") but has 
that
> actually been formally - or even informally - accepted by the Powers 
That
> Be?

Yes, it is in the Statement of System Integrity in the General Information 
Manual.

> I ask because I still remember, as though it were yesterday, opening a
> security/integrity APAR against VM back in the mid-1980's because any
> class G user could knock CP down by defining a shared and a nonshared
> device on the same virtual control unit, and being told that that was 
NOT
> a security or integrity issue, and that no fix would be forthcoming.

Under "today's" rules, that would be an Integrity problem.

o If a class G (only) user can repeatedly or with malice of forethought 
hang or abend CP, it WILL be classified as an integrity problem (denial of 
service).

o If a class G user happens to do something that triggers an abend or hang 
due to a "system malfunction", it will NOT be classified as an integrity 
problem.

o If the system abends or hangs because it is overloaded (memory, CPU), it 
will NOT be classified as an integrity problem.

o Just because it isn't an integrity problem doesn't mean it isn't a 
defect.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-15 Thread Alan Altmark
On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak  
wrote:
> I agree with that ("the guest cannot be allowed to harm CP") but has 
that
> actually been formally - or even informally - accepted by the Powers 
That
> Be?

Yes, it is in the Statement of System Integrity in the General Information 
Manual.

> I ask because I still remember, as though it were yesterday, opening a
> security/integrity APAR against VM back in the mid-1980's because any
> class G user could knock CP down by defining a shared and a nonshared
> device on the same virtual control unit, and being told that that was 
NOT
> a security or integrity issue, and that no fix would be forthcoming.

Under "today's" rules, that would be an Integrity problem.

o If a class G (only) user can repeatedly or with malice of forethought 
hang or abend CP, it WILL be classified as an integrity problem (denial of 
service).

o If a class G user happens to do something that triggers an abend or hang 
due to a "system malfunction", it will NOT be classified as an integrity 
problem.

o If the system abends or hangs because it is overloaded (memory, CPU), it 
will NOT be classified as an integrity problem.

o Just because it isn't an integrity problem doesn't mean it isn't a 
defect.

Alan Altmark
z/VM Development
IBM Endicott


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
The same might be said for page space. Someone could access a dataspace enabled 
directory and take up page space. We could easily take up 48G of page space 
here by starting 24 machines that each access different d/s directories at 2G 
each. And others could define storage from default to max. Then there are those 
pesky V-disk users - they could increase the load on page space. 

But I do agree that spool space should not enter into the equation when 
determining if there is enough page space.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Brian Nielsen
> Sent: Tuesday, September 15, 2009 12:31 PM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> You can't include SPOOL space in the check for "is there 
> enough page spac= e to allow this guest" decision.  SPOOL 
> space that was available earlier ma= y not be there later 
> when you "need" it as overflow PAGE space.  Any guest =
> 
> can fill up your SPOOL space at any time.
> 
> Brian Nielsen
> 
> 
> 
> On Tue, 15 Sep 2009 13:13:40 -0400, Gentry, Stephen 
>  wrote:
> 
> >What Lee doesn't mention is how long he waited before doing the IPL.
> >Had he waited to see what happens maybe VM would have finally come 
> >around, so to speak. We all have different thresholds of 
> pain. I think 
> >I=
> 
> >would have done what Lee did, long day, not really wanting to wait 
> >around to see if VM recovers, just IPL.  Lee did you have 
> access to the 
> >HMC and thus the SAD screen to see what was going on? Sort 
> of my last 
> >line of defense if I can't get logged in.  Granted all it 
> will tell you 
> >is if you have CPU or I/O utilization, but at least you have 
> something 
> >to go to IBM with.
> >Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if 
> it's set then 
> >guest machine size is verified, if not available PAGE area and SPOOL 
> >size is checked (calculated) and if the guest exceeds that size then 
> >the=
> 
> >quest doesn't start or a severe warning is issued.
> >Steve
> >
> >-----Original Message-
> >From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On 
> >Behalf Of Schuh, Richard
> >Sent: Tuesday, September 15, 2009 12:59 PM
> >To: IBMVM@LISTSERV.UARK.EDU
> >Subject: Re: VM lockup due to storage typo
> >
> >Maybe CP couldn't know that the guest would do something bad, but it 
> >should know that it has opened itself to the possibility 
> that the guest 
> >could, in normal operation, cause the problem.
> >One of Alan's first precepts of information security and 
> integrity is 
> >that the guest cannot be allowed to harm the CP. This 
> clearly violates 
> >that.
> >
> >Regards,
> >Richard Schuh
> >
> > 
> >
> >> -Original Message-
> >> From: The IBM z/VM Operating System
> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> >> Sent: Tuesday, September 15, 2009 9:19 AM
> >> To: IBMVM@LISTSERV.UARK.EDU
> >> Subject: Re: VM lockup due to storage typo
> >> 
> >> CP wouldn't know at IPL time, the guest would, not could, 
> but would 
> >> cause such harm.
> >> 
> >> Just because you say you can use xxx GB, doesn't mean you would 
> >> actually use them.
> >> 
> >> When page fills, it over flows to spool.
> >> When spool fills, CP abends on the next pageout.
> >> 
> >> Tom Duerbusch
> >> THD Consulting
> >> 
> >> >>> Marcy Cortes  9/15/2009
> >> 11:02 AM >>>
> >> See a thread on this list with subject "Sanity check?" 
> from Oct 2007 
> >> for what happened when I did the same thing ;)
> >> 
> >> You probably filled page space.
> >> 
> >> I still think IBM should refuse to IPL a guest that will 
> cause such 
> >> harm.
> >> 
> >> 
> >> Marcy
> >> 
> >> "This message may contain confidential and/or privileged 
> information. 
> >> If you are not the addressee or authorized to receive this for the 
> >> addressee, you must not use, copy, disclose, or take any 
> action based 
> >> on this message or any information herein. If you have 
> received this 
> >> message in error, please advise the sender immediately by reply 
> >> e-mail and delete this message. Thank you for your cooperation."
> >&g

Re: VM lockup due to storage typo

2009-09-15 Thread Tom Duerbusch
Good point.

When I have hit this, I got a PAGxxx type error and CP automatically reipl'ed.

Like I said, when the offending user starts allocating pages, all the other 
machines will abend on a paging error when their recently used pages are tried 
to be paged out.  Eventually, some of CP pagable pages will be the least 
recently used pages and BAM!  PAGxxx CP abend.  Automatic restart in progress...

Tom Duerbusch
THD Consulting

>>> "Gentry, Stephen"  9/15/2009 12:13 PM >>>
What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU 
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is
that the guest cannot be allowed to harm the CP. This clearly violates
that.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> Sent: Tuesday, September 15, 2009 9:19 AM
> To: IBMVM@LISTSERV.UARK.EDU 
> Subject: Re: VM lockup due to storage typo
> 
> CP wouldn't know at IPL time, the guest would, not could, but 
> would cause such harm.
> 
> Just because you say you can use xxx GB, doesn't mean you 
> would actually use them.
> 
> When page fills, it over flows to spool.
> When spool fills, CP abends on the next pageout.
> 
> Tom Duerbusch
> THD Consulting
> 
> >>> Marcy Cortes  9/15/2009 
> 11:02 AM >>>
> See a thread on this list with subject "Sanity check?" from 
> Oct 2007 for what happened when I did the same thing ;)
> 
> You probably filled page space.
> 
> I still think IBM should refuse to IPL a guest that will 
> cause such harm.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
> Sent: Tuesday, September 15, 2009 8:39 AM
> To: IBMVM@LISTSERV.UARK.EDU 
> Subject: [IBMVM] VM lockup due to storage typo
> 
> Does anyone have an idea of how we might have gotten out of 
> this without an IPL?
> 
> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> Several guests needed more memory added so the directory was 
> updated and one by one the guests shutdown, logged off and 
> back on.  So far, so good.
> 
> But... In changing the memory for many guests, and it being 
> late at night after a long day, while meaning to set a 
> guest's memory to 9728M, it got set to 9728G.  When that 
> guest was cycled we see the message on the console that it's 
> memory was limited to 8TB (HCPLGN093E), then the VM system 
> appeared to freeze.
> 
> We couldn't get in via TCP/IP, or the HMC Operating System 
> Messages screen, or the HMC Integrated 3270.
> 
> Finally had to IPL.   Even that was wierd as I'd have 
> expected the Load 
> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
> and all came back up ok...
> 
> I suspect CP was scrambling paging everything in the world 
> out as Linux 
> tried to initialize that 8TB of memory...   But I'm surprised 
> I couldn't 
> even get into the HMC consoles (to kill just that one guest 
> as opposed to all of them)..
> 
> Any thoughts?
> Lee
> -- 
> 
> Lee Stewart, Senior SE
> Sirius Computer Solutions
> Phone: (303) 996-7122
> Email: lee.stew...@siriuscom.com 
> Web:   www.siriuscom.com 
> 


Re: VM lockup due to storage typo

2009-09-15 Thread Brian Nielsen
You can't include SPOOL space in the check for "is there enough page spac
e 
to allow this guest" decision.  SPOOL space that was available earlier ma
y 
not be there later when you "need" it as overflow PAGE space.  Any guest 

can fill up your SPOOL space at any time.

Brian Nielsen



On Tue, 15 Sep 2009 13:13:40 -0400, Gentry, Stephen 
 wrote:

>What Lee doesn't mention is how long he waited before doing the IPL.
>Had he waited to see what happens maybe VM would have finally come
>around, so to speak. We all have different thresholds of pain. I think I

>would have done what Lee did, long day, not really wanting to wait
>around to see if VM recovers, just IPL.  Lee did you have access to the
>HMC and thus the SAD screen to see what was going on? Sort of my last
>line of defense if I can't get logged in.  Granted all it will tell you
>is if you have CPU or I/O utilization, but at least you have something
>to go to IBM with.
>Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
>guest machine size is verified, if not available PAGE area and SPOOL
>size is checked (calculated) and if the guest exceeds that size then the

>quest doesn't start or a severe warning is issued.
>Steve
>
>-Original Message-
>From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
>Behalf Of Schuh, Richard
>Sent: Tuesday, September 15, 2009 12:59 PM
>To: IBMVM@LISTSERV.UARK.EDU
>Subject: Re: VM lockup due to storage typo
>
>Maybe CP couldn't know that the guest would do something bad, but it
>should know that it has opened itself to the possibility that the guest
>could, in normal operation, cause the problem. 
>One of Alan's first precepts of information security and integrity is
>that the guest cannot be allowed to harm the CP. This clearly violates
>that.
>
>Regards, 
>Richard Schuh 
>
> 
>
>> -Original Message-
>> From: The IBM z/VM Operating System 
>> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
>> Sent: Tuesday, September 15, 2009 9:19 AM
>> To: IBMVM@LISTSERV.UARK.EDU
>> Subject: Re: VM lockup due to storage typo
>> 
>> CP wouldn't know at IPL time, the guest would, not could, but 
>> would cause such harm.
>> 
>> Just because you say you can use xxx GB, doesn't mean you 
>> would actually use them.
>> 
>> When page fills, it over flows to spool.
>> When spool fills, CP abends on the next pageout.
>> 
>> Tom Duerbusch
>> THD Consulting
>> 
>> >>> Marcy Cortes  9/15/2009 
>> 11:02 AM >>>
>> See a thread on this list with subject "Sanity check?" from 
>> Oct 2007 for what happened when I did the same thing ;)
>> 
>> You probably filled page space.
>> 
>> I still think IBM should refuse to IPL a guest that will 
>> cause such harm.
>> 
>> 
>> Marcy 
>> 
>> "This message may contain confidential and/or privileged 
>> information. If you are not the addressee or authorized to 
>> receive this for the addressee, you must not use, copy, 
>> disclose, or take any action based on this message or any 
>> information herein. If you have received this message in 
>> error, please advise the sender immediately by reply e-mail 
>> and delete this message. Thank you for your cooperation."
>> 
>> 
>> -Original Message-
>> From: The IBM z/VM Operating System 
>> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
>> Sent: Tuesday, September 15, 2009 8:39 AM
>> To: IBMVM@LISTSERV.UARK.EDU
>> Subject: [IBMVM] VM lockup due to storage typo
>> 
>> Does anyone have an idea of how we might have gotten out of 
>> this without an IPL?
>> 
>> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
>> Several guests needed more memory added so the directory was 
>> updated and one by one the guests shutdown, logged off and 
>> back on.  So far, so good.
>> 
>> But... In changing the memory for many guests, and it being 
>> late at night after a long day, while meaning to set a 
>> guest's memory to 9728M, it got set to 9728G.  When that 
>> guest was cycled we see the message on the console that it's 
>> memory was limited to 8TB (HCPLGN093E), then the VM system 
>> appeared to freeze.
>> 
>> We couldn't get in via TCP/IP, or the HMC Operating System 
>> Messages screen, or the HMC Integrated 3270.
>> 
>> Finally had to IPL.   Even that was wierd as I'd have 
>> expected the Load 
>> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 

>> and all came back up ok...
>> 
>> I suspect CP was scrambling paging everything in the world 
>> out as Linux 
>> tried to initialize that 8TB of memory...   But I'm surprised 
>> I couldn't 
>> even get into the HMC consoles (to kill just that one guest 
>> as opposed to all of them)..
>> 
>> Any thoughts?
>> Lee
>> -- 
>> 
>> Lee Stewart, Senior SE
>> Sirius Computer Solutions
>> Phone: (303) 996-7122
>> Email: lee.stew...@siriuscom.com 
>> Web:   www.siriuscom.com
>> 


Re: VM lockup due to storage typo

2009-09-15 Thread Steve Marak
I agree with that ("the guest cannot be allowed to harm CP") but has that 
actually been formally - or even informally - accepted by the Powers That 
Be?

I ask because I still remember, as though it were yesterday, opening a 
security/integrity APAR against VM back in the mid-1980's because any 
class G user could knock CP down by defining a shared and a nonshared 
device on the same virtual control unit, and being told that that was NOT 
a security or integrity issue, and that no fix would be forthcoming. 

But at least I'm not bitter about it. 

Steve

On Tue, 15 Sep 2009, Schuh, Richard wrote:

> One of Alan's first precepts of information security and integrity is 
> that the guest cannot be allowed to harm the CP. This clearly violates 
> that.
> 
> Regards, 
> Richard Schuh 

-- Steve Marak
-- sama...@gizmoworks.com


Re: VM lockup due to storage typo

2009-09-15 Thread Gentry, Stephen
What Lee doesn't mention is how long he waited before doing the IPL.
Had he waited to see what happens maybe VM would have finally come
around, so to speak. We all have different thresholds of pain. I think I
would have done what Lee did, long day, not really wanting to wait
around to see if VM recovers, just IPL.  Lee did you have access to the
HMC and thus the SAD screen to see what was going on? Sort of my last
line of defense if I can't get logged in.  Granted all it will tell you
is if you have CPU or I/O utilization, but at least you have something
to go to IBM with.
Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then
guest machine size is verified, if not available PAGE area and SPOOL
size is checked (calculated) and if the guest exceeds that size then the
quest doesn't start or a severe warning is issued.
Steve

-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On
Behalf Of Schuh, Richard
Sent: Tuesday, September 15, 2009 12:59 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: VM lockup due to storage typo

Maybe CP couldn't know that the guest would do something bad, but it
should know that it has opened itself to the possibility that the guest
could, in normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is
that the guest cannot be allowed to harm the CP. This clearly violates
that.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> Sent: Tuesday, September 15, 2009 9:19 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> CP wouldn't know at IPL time, the guest would, not could, but 
> would cause such harm.
> 
> Just because you say you can use xxx GB, doesn't mean you 
> would actually use them.
> 
> When page fills, it over flows to spool.
> When spool fills, CP abends on the next pageout.
> 
> Tom Duerbusch
> THD Consulting
> 
> >>> Marcy Cortes  9/15/2009 
> 11:02 AM >>>
> See a thread on this list with subject "Sanity check?" from 
> Oct 2007 for what happened when I did the same thing ;)
> 
> You probably filled page space.
> 
> I still think IBM should refuse to IPL a guest that will 
> cause such harm.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -Original Message-----
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
> Sent: Tuesday, September 15, 2009 8:39 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: [IBMVM] VM lockup due to storage typo
> 
> Does anyone have an idea of how we might have gotten out of 
> this without an IPL?
> 
> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> Several guests needed more memory added so the directory was 
> updated and one by one the guests shutdown, logged off and 
> back on.  So far, so good.
> 
> But... In changing the memory for many guests, and it being 
> late at night after a long day, while meaning to set a 
> guest's memory to 9728M, it got set to 9728G.  When that 
> guest was cycled we see the message on the console that it's 
> memory was limited to 8TB (HCPLGN093E), then the VM system 
> appeared to freeze.
> 
> We couldn't get in via TCP/IP, or the HMC Operating System 
> Messages screen, or the HMC Integrated 3270.
> 
> Finally had to IPL.   Even that was wierd as I'd have 
> expected the Load 
> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
> and all came back up ok...
> 
> I suspect CP was scrambling paging everything in the world 
> out as Linux 
> tried to initialize that 8TB of memory...   But I'm surprised 
> I couldn't 
> even get into the HMC consoles (to kill just that one guest 
> as opposed to all of them)..
> 
> Any thoughts?
> Lee
> -- 
> 
> Lee Stewart, Senior SE
> Sirius Computer Solutions
> Phone: (303) 996-7122
> Email: lee.stew...@siriuscom.com 
> Web:   www.siriuscom.com
> 


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
CMS, being a 32-bit system, will probably never use 3TB of memory. Perhaps 
z/CMS, when it becomes a reality, might but the current CMS is another story. 

Regards, 
Richard Schuh 

 

>  CMS u= ses all of its storage over 
> time. Both will use all of their storage eventual= ly. 
> 


Re: VM lockup due to storage typo

2009-09-15 Thread Tom Duerbusch
CMS will free its storage after the command is complete.

However, do a peek on a very large reader element, such as a OS dump, and CMS 
just might use up all of its storage, just like any other guest might.

It isn't a matter of time, it is a matter of usage.

Tom Duerbusch
THD Consulting

>>> Thomas Kern  9/15/2009 12:48 PM >>>
The difference between CMS and Linux in this case is just a matter of time
before problems occur. Linux wants to use all of its storage early, CMS uses
all of its storage over time. Both will use all of their storage eventually. 

CP is built to overcommit storage. It just lets you REALLY overcommit
storage. But it would be nice if there was some sort of sanity check in
there somewhere. 

/Tom Kern

On Tue, 15 Sep 2009 13:12:38 -0400, Bruce Hayden  wrote:

>The problem isn't that you did an IPL, it is that you IPLed Linux.  An
>IPL of CMS in an 8 TB machine doesn't have any delay or cause a
>problem:
>
>def stor 8t
>STORAGE = 8T
>Storage cleared - system reset.
>i cms
>z/VM V5.4.02009-07-13 11:58
>
>Ready; T=0.01/0.01 13:06:21
>q v stor
>STORAGE = 8T
>Ready; T=0.01/0.01 13:06:26
>q stor
>STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED = 0
>Ready; T=0.01/0.01 13:06:57
>
>An IPL of ZCMS blows up, though.  Maybe they didn't test it with that
>large storage.
>
>On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes
> wrote:
>> See a thread on this list with subject "Sanity check?" from Oct 2007 for
what happened when I did the same thing ;)
>>
>> You probably filled page space.
>>
>> I still think IBM should refuse to IPL a guest that will cause such harm.
>>
>>
>> Marcy
>>
>
>
>--
>Bruce Hayden
>Linux on System z Advanced Technical Support
>IBM, Endicott, NY


Re: VM lockup due to storage typo

2009-09-15 Thread Thomas Kern
The difference between CMS and Linux in this case is just a matter of tim
e
before problems occur. Linux wants to use all of its storage early, CMS u
ses
all of its storage over time. Both will use all of their storage eventual
ly. 

CP is built to overcommit storage. It just lets you REALLY overcommit
storage. But it would be nice if there was some sort of sanity check in
there somewhere. 

/Tom Kern

On Tue, 15 Sep 2009 13:12:38 -0400, Bruce Hayden  wro
te:

>The problem isn't that you did an IPL, it is that you IPLed Linux.  An
>IPL of CMS in an 8 TB machine doesn't have any delay or cause a
>problem:
>
>def stor 8t
>STORAGE = 8T
>Storage cleared - system reset.
>i cms
>z/VM V5.4.02009-07-13 11:58
>
>Ready; T=0.01/0.01 13:06:21
>q v stor
>STORAGE = 8T
>Ready; T=0.01/0.01 13:06:26
>q stor
>STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED =
 0
>Ready; T=0.01/0.01 13:06:57
>
>An IPL of ZCMS blows up, though.  Maybe they didn't test it with that
>large storage.
>
>On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes
> wrote:
>> See a thread on this list with subject "Sanity check?" from Oct 2007 f
or
what happened when I did the same thing ;)
>>
>> You probably filled page space.
>>
>> I still think IBM should refuse to IPL a guest that will cause such ha
rm.
>>
>>
>> Marcy
>>
>
>
>--
>Bruce Hayden
>Linux on System z Advanced Technical Support
>IBM, Endicott, NY


Re: VM lockup due to storage typo

2009-09-15 Thread Bruce Hayden
The problem isn't that you did an IPL, it is that you IPLed Linux.  An
IPL of CMS in an 8 TB machine doesn't have any delay or cause a
problem:

def stor 8t
STORAGE = 8T
Storage cleared - system reset.
i cms
z/VM V5.4.02009-07-13 11:58

Ready; T=0.01/0.01 13:06:21
q v stor
STORAGE = 8T
Ready; T=0.01/0.01 13:06:26
q stor
STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED = 0
Ready; T=0.01/0.01 13:06:57

An IPL of ZCMS blows up, though.  Maybe they didn't test it with that
large storage.

On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes
 wrote:
> See a thread on this list with subject "Sanity check?" from Oct 2007 for what 
> happened when I did the same thing ;)
>
> You probably filled page space.
>
> I still think IBM should refuse to IPL a guest that will cause such harm.
>
>
> Marcy
>


-- 
Bruce Hayden
Linux on System z Advanced Technical Support
IBM, Endicott, NY


Re: VM lockup due to storage typo

2009-09-15 Thread Tom Duerbusch
Thinking about this a little futher

How could 1 error cause this?

In the user direct, the user statement has:

USER LINUX27  xx  32M 600M G 

There are two memory related parms.  The one your guest machine is built with, 
in this case 32 MB.
The other is the maximum memory size for your guest, in this case 600 MB.

With either the initial size, or the dynamically defined size via a DEF STOR 
command, you can't exceed the maximum size.

So to define 8 TB of storage, you have to change the max size to be something 
very large.
And then define the machine to use that size.

So it seems to me that there are two mistakes.  

You told CP you might want a very large size, and when you finally asked for 
it, it obeyed.
That isn't a CP error.

The same problem occurs when you tell CP that you are ok with TB sized vdisks.  
And then you define one.
And then use it up .

Of course, anything that can cause CP to crash isn't a good thing.
Perhaps we need a dedicated paging area for CP, i.e. something like the DUMP 
area for CP dumps, instead of using SPOL.  The guest machines are still going 
to crash, and the offending machine will be the last of many machines to bite 
the dust.  But, CP would survive.  It might be easier to IPL to get everything 
back running again.

Tom Duerbusch
THD Consulting


>>> "Schuh, Richard"  9/15/2009 11:59 AM >>>
Maybe CP couldn't know that the guest would do something bad, but it should 
know that it has opened itself to the possibility that the guest could, in 
normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is that the 
guest cannot be allowed to harm the CP. This clearly violates that.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> Sent: Tuesday, September 15, 2009 9:19 AM
> To: IBMVM@LISTSERV.UARK.EDU 
> Subject: Re: VM lockup due to storage typo
> 
> CP wouldn't know at IPL time, the guest would, not could, but 
> would cause such harm.
> 
> Just because you say you can use xxx GB, doesn't mean you 
> would actually use them.
> 
> When page fills, it over flows to spool.
> When spool fills, CP abends on the next pageout.
> 
> Tom Duerbusch
> THD Consulting
> 
> >>> Marcy Cortes  9/15/2009 
> 11:02 AM >>>
> See a thread on this list with subject "Sanity check?" from 
> Oct 2007 for what happened when I did the same thing ;)
> 
> You probably filled page space.
> 
> I still think IBM should refuse to IPL a guest that will 
> cause such harm.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -----Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
> Sent: Tuesday, September 15, 2009 8:39 AM
> To: IBMVM@LISTSERV.UARK.EDU 
> Subject: [IBMVM] VM lockup due to storage typo
> 
> Does anyone have an idea of how we might have gotten out of 
> this without an IPL?
> 
> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> Several guests needed more memory added so the directory was 
> updated and one by one the guests shutdown, logged off and 
> back on.  So far, so good.
> 
> But... In changing the memory for many guests, and it being 
> late at night after a long day, while meaning to set a 
> guest's memory to 9728M, it got set to 9728G.  When that 
> guest was cycled we see the message on the console that it's 
> memory was limited to 8TB (HCPLGN093E), then the VM system 
> appeared to freeze.
> 
> We couldn't get in via TCP/IP, or the HMC Operating System 
> Messages screen, or the HMC Integrated 3270.
> 
> Finally had to IPL.   Even that was wierd as I'd have 
> expected the Load 
> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
> and all came back up ok...
> 
> I suspect CP was scrambling paging everything in the world 
> out as Linux 
> tried to initialize that 8TB of memory...   But I'm surprised 
> I couldn't 
> even get into the HMC consoles (to kill just that one guest 
> as opposed to all of them)..
> 
> Any thoughts?
> Lee
> -- 
> 
> Lee Stewart, Senior SE
> Sirius Computer Solutions
> Phone: (303) 996-7122
> Email: lee.stew...@siriuscom.com 
> Web:   www.siriuscom.com 
> 


Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
Maybe CP couldn't know that the guest would do something bad, but it should 
know that it has opened itself to the possibility that the guest could, in 
normal operation, cause the problem. 
One of Alan's first precepts of information security and integrity is that the 
guest cannot be allowed to harm the CP. This clearly violates that.

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch
> Sent: Tuesday, September 15, 2009 9:19 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> CP wouldn't know at IPL time, the guest would, not could, but 
> would cause such harm.
> 
> Just because you say you can use xxx GB, doesn't mean you 
> would actually use them.
> 
> When page fills, it over flows to spool.
> When spool fills, CP abends on the next pageout.
> 
> Tom Duerbusch
> THD Consulting
> 
> >>> Marcy Cortes  9/15/2009 
> 11:02 AM >>>
> See a thread on this list with subject "Sanity check?" from 
> Oct 2007 for what happened when I did the same thing ;)
> 
> You probably filled page space.
> 
> I still think IBM should refuse to IPL a guest that will 
> cause such harm.
> 
> 
> Marcy 
> 
> "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> 
> 
> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart
> Sent: Tuesday, September 15, 2009 8:39 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: [IBMVM] VM lockup due to storage typo
> 
> Does anyone have an idea of how we might have gotten out of 
> this without an IPL?
> 
> VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> Several guests needed more memory added so the directory was 
> updated and one by one the guests shutdown, logged off and 
> back on.  So far, so good.
> 
> But... In changing the memory for many guests, and it being 
> late at night after a long day, while meaning to set a 
> guest's memory to 9728M, it got set to 9728G.  When that 
> guest was cycled we see the message on the console that it's 
> memory was limited to 8TB (HCPLGN093E), then the VM system 
> appeared to freeze.
> 
> We couldn't get in via TCP/IP, or the HMC Operating System 
> Messages screen, or the HMC Integrated 3270.
> 
> Finally had to IPL.   Even that was wierd as I'd have 
> expected the Load 
> Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
> and all came back up ok...
> 
> I suspect CP was scrambling paging everything in the world 
> out as Linux 
> tried to initialize that 8TB of memory...   But I'm surprised 
> I couldn't 
> even get into the HMC consoles (to kill just that one guest 
> as opposed to all of them)..
> 
> Any thoughts?
> Lee
> -- 
> 
> Lee Stewart, Senior SE
> Sirius Computer Solutions
> Phone: (303) 996-7122
> Email: lee.stew...@siriuscom.com 
> Web:   www.siriuscom.com
> 

Re: VM lockup due to storage typo

2009-09-15 Thread Schuh, Richard
This should be treated as a bug. It is not an enhancement or new feature, it 
brought a running system down. And it probably did not take a dump. 

Regards, 
Richard Schuh 

 

> -Original Message-
> From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] On Behalf Of Daniel P. Martin
> Sent: Tuesday, September 15, 2009 9:09 AM
> To: IBMVM@LISTSERV.UARK.EDU
> Subject: Re: VM lockup due to storage typo
> 
> *cough*SHARE requirement?*cough*
> 
> Marcy Cortes wrote:
> > See a thread on this list with subject "Sanity check?" from 
> Oct 2007 
> > for what happened when I did the same thing ;)
> >
> > You probably filled page space.
> >
> > I still think IBM should refuse to IPL a guest that will 
> cause such harm.
> >
> >
> > Marcy
> >
> > "This message may contain confidential and/or privileged 
> information. If you are not the addressee or authorized to 
> receive this for the addressee, you must not use, copy, 
> disclose, or take any action based on this message or any 
> information herein. If you have received this message in 
> error, please advise the sender immediately by reply e-mail 
> and delete this message. Thank you for your cooperation."
> >
> >
> > -Original Message-
> > From: The IBM z/VM Operating System 
> [mailto:ib...@listserv.uark.edu] 
> > On Behalf Of Lee Stewart
> > Sent: Tuesday, September 15, 2009 8:39 AM
> > To: IBMVM@LISTSERV.UARK.EDU
> > Subject: [IBMVM] VM lockup due to storage typo
> >
> > Does anyone have an idea of how we might have gotten out of this 
> > without an IPL?
> >
> > VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
> > Several guests needed more memory added so the directory 
> was updated 
> > and one by one the guests shutdown, logged off and back on. 
>  So far, so good.
> >
> > But... In changing the memory for many guests, and it being late at 
> > night after a long day, while meaning to set a guest's memory to 
> > 9728M, it got set to 9728G.  When that guest was cycled we see the 
> > message on the console that it's memory was limited to 8TB 
> > (HCPLGN093E), then the VM system appeared to freeze.
> >
> > We couldn't get in via TCP/IP, or the HMC Operating System Messages 
> > screen, or the HMC Integrated 3270.
> >
> > Finally had to IPL.   Even that was wierd as I'd have 
> expected the Load 
> > Normal to shutdown, it just IPLed.   We did NoAutolog, 
> fixed the typo 
> > and all came back up ok...
> >
> > I suspect CP was scrambling paging everything in the world 
> out as Linux 
> > tried to initialize that 8TB of memory...   But I'm 
> surprised I couldn't 
> > even get into the HMC consoles (to kill just that one guest 
> as opposed 
> > to all of them)..
> >
> > Any thoughts?
> > Lee
> >   
> 

Re: VM lockup due to storage typo

2009-09-15 Thread Daniel P. Martin

*cough*SHARE requirement?*cough*

Marcy Cortes wrote:

See a thread on this list with subject "Sanity check?" from Oct 2007 for what 
happened when I did the same thing ;)

You probably filled page space.

I still think IBM should refuse to IPL a guest that will cause such harm.


Marcy 


"This message may contain confidential and/or privileged information. If you are not 
the addressee or authorized to receive this for the addressee, you must not use, copy, 
disclose, or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply e-mail and 
delete this message. Thank you for your cooperation."


-Original Message-
From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf 
Of Lee Stewart
Sent: Tuesday, September 15, 2009 8:39 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: [IBMVM] VM lockup due to storage typo

Does anyone have an idea of how we might have gotten out of this without 
an IPL?


VM LPAR has 175G of memory and a flock of Linux Oracle guests... 
Several guests needed more memory added so the directory was updated and 
one by one the guests shutdown, logged off and back on.  So far, so good.


But... In changing the memory for many guests, and it being late at 
night after a long day, while meaning to set a guest's memory to 9728M, 
it got set to 9728G.  When that guest was cycled we see the message on 
the console that it's memory was limited to 8TB (HCPLGN093E), then the 
VM system appeared to freeze.


We couldn't get in via TCP/IP, or the HMC Operating System Messages 
screen, or the HMC Integrated 3270.


Finally had to IPL.   Even that was wierd as I'd have expected the Load 
Normal to shutdown, it just IPLed.   We did NoAutolog, fixed the typo 
and all came back up ok...


I suspect CP was scrambling paging everything in the world out as Linux 
tried to initialize that 8TB of memory...   But I'm surprised I couldn't 
even get into the HMC consoles (to kill just that one guest as opposed 
to all of them)..


Any thoughts?
Lee
  


  1   2   >