Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
Now THAT would have exposed the problem. :-) Actually, he already tried that and the result has been this discussion. Regards, Richard Schuh > I'm sure that there are a couple other ways of preventing the > problem, like IPL'ing the machine first and doing a Q V ALL > to see what resources you really did ask for, could have > stopped the problemif the Systems Programmer did it.
Re: VM lockup due to storage typo
> I don't think the analogy to a ping attack is a particularly fair > one. Yes, from the perspective of an innocent third user, they > look the same, perhaps, but they aren't. ??? In both cases, normal function of the "innocent" guest is disrupted by a force beyond it's control through no fault of it's own. The function is disrupted by a lack of shared resources available to the "innocent" guest due to trying to service what appears to be "legitimate" resource requests to another theoretically "innocent" guest. > If the attack were made > through some sort of security gate that defaults to "closed" state > which the sysadmin had accidentally opened and left open, I think > that would be a more fair analogy. Quibbling over details, > perhaps, but there is an important difference. Network floods have nothing innately to do with security states. You can produce exactly the same effect within a local segment with no outside connection, FW or any other "security" gates involved (misconfigure any DECnet device that boots via MOP and see what happens), so I don't see the subtle difference here -- one device banging out traffic without regard for other systems on the same network segment starves access to the other systems on the same segment, denying them the ability to function normally. Barks like a duck, swims like a duck, it'll do for duck soup, as a friend of mine says. But, as you say, let's concentrate on fixing the problem, not blaming the symptoms.
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
I mentioned earlier some sort of preferred paging space for CP areas, kind of like the DUMP area and SPOL. But either way, it still depends on a Systems Programmer, which was the weak link in this discussion. Recall that a Systems Programmer caused the problem of authorizing an 8 TB guest. And that System's Programmer will never do that again, IMHO. So setting up preferred paging area, or paging pools, is just another thing that most of us will never do, until we get shot in the foot. I bet that there are more VM systems that are running without a DUMP area then with. And they are the smaller shops that may be able to handle an outage better than others. The DIRMAINT exit to prevent this amount of storage from being authorized, would have stopped itthat is, if the Systems Programmer did it. Your VM performance monitor could have purged the machine and stopped itif the Systems Programmer did it. I'm sure that there are a couple other ways of preventing the problem, like IPL'ing the machine first and doing a Q V ALL to see what resources you really did ask for, could have stopped the problemif the Systems Programmer did it. Perhaps we are just too dangerous to be around anymore . Time to hide us behind panels and such Tom Duerbusch THD Consulting >>> "John P. Baker" 9/19/2009 11:21 AM >>> All, Since we have now beat the issue of storage management to death, I would like to set forth some concrete ideas for consideration. First, it has been pointed out that it may not currently be possible to LOGON to MAINT or OPERATOR or to some other service machine in order to diagnose the problem. I recommend that the idea of splitting page space into multiple pools be considered, where individual users can be assigned to different pools. For the purposes of discussion, let us consider that following enhancement: . In the SYSTEM CONFIG file o DEFBACKSTGPOOL pool-id-8 o BACKSTGPOOL pool-id-8 volser-6 . In the CP directory o OPTION BACKSTGPOOL pool-name-8 . Extend the CLASS B CP QUERY command o QUERY BACKSTGPOOL user-id-8 o QUERY DEFBACKSTGPOOL . Extend the CLASS B CP SET command o SET BACKSTGPOOL user-id-8 {DEFAULT | pool-name-8} . Extend the CLASS G CP QUERY command o QUERY BACKSTGPOOL Each paging volume will be allocated to a specific backing storage pool. A LOGON will be rejected if the backing storage pool does not exist. The SET BACKSTGPOOL command will be rejected if the backing storage pool does not exist. Second, provide a specification on whether a virtual machine requires full backing storage for its defined memory size. . In the SYSTEM CONFIG file o DEFBACKSTG {SYSTEM | VMSIZE} . In the CP directory o OPTION BACKSTG {DEFAULT | SYSTEM | VMSIZE} . Extend the CLASS B CP QUERY command o QUERY BACKSTG user-id-8 o QUERY DEFBACKSTG . Extend the CLASS B CP SET command o SET BACKSTG user-id-8 { DEFAULT | SYSTEM | VMSIZE} . Extend the CLASS G CP QUERY command o QUERY BACKSTG If BACKSTG is set or defaulted to SYSTEM, page allocation will continue to operate as it does today. If BACKSTG is set or defaulted to VMSIZE, there must be available within the backing storage spool sufficient space to accommodate the entirety of the specified VMSIZE, otherwise the LOGON, DEFINE STORAGE, or SET BACKSTG command will be failed. The SETBACKSTG command will force a virtual machine reset to occur. These changes will address some of the issues raised. I am certain that other changes would be required, and that other ideas should be considered. Please post your ideas. Don't hesitate to point out any problems. John P. Baker
Re: VM lockup due to storage typo
I don't think the analogy to a ping attack is a particularly fair one. Yes, from the perspective of an innocent third user, they look the same, perhaps, but they aren't. If the attack were made through some sort of security gate that defaults to "closed" state which the sysadmin had accidentally opened and left open, I think that would be a more fair analogy. Quibbling over details, perhaps, but there is an important difference. On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes w rote: >On 9/18/09 9:32 AM, "Bill Holder" wrote: > >> That is indeed one important question, but there was another one, the >> question of whether this was a denial of service attack exposure, whic h i >> t >> is not. > >I think that's a point of view question. > >If I am another user on the same VM system, happy within my cozy little >class G box, and the hypervisor admin does something outside of my contr ol >to some OTHER user that causes CP to choke, then from the original user' s >perspective it IS a DOS attack because it's something that is out of my >control, starves ME, and causes ME to choke without reason. > >An analagous parallel case in the distributed system world would be a pi ng >flood attack on a network segment. The innocent get hurt along with the >intended target by being starved of access to the network, and thus lose the >ability to function according to design. > >From the hypervisor admin's POV, then yeah, it's just doing what it's to ld >to do. It's correct operation, working as documented. > >I think Bill Schuh and Marcy and myself are arguing for the former >viewpoint. I think you and Adam are arguing from the latter view. > >> I'm not disagreeing that it would be nice if there were some sor >> t >> of "are you sure" safety net before the system proceeded to try to do >> something suicidal, but that's a design and requirements question, not a >> defect question. > >I think we're all in violent agreement on that point. Now, the question is >what is the best way to put a safety on that gun? > = ===
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
Bill, You may well be correct. Of course, that permits me to pose the question of how such a condition could effectively be avoided. Ideas, anyone? John P. Baker -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder Sent: Monday, September 21, 2009 11:32 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo) These are very interesting ideas, but I suspect (no way to prove, since no doc will be forthcoming) that the hang was not a paging issue, but rather a central storage fragmentation issue involving attempts to allocate four contiguous frames for region and segment tables. Don't let me throw cold water on the current discussion, though, I just wanted to point out that all of the interesting paging ideas probably wouldn't help the situation that triggered this entire discussion. - Bill Holder, z/VM Development, IBM
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
These are very interesting ideas, but I suspect (no way to prove, since n o doc will be forthcoming) that the hang was not a paging issue, but rather a central storage fragmentation issue involving attempts to allocate four contiguous frames for region and segment tables. Don't let me throw cold water on the current discussion, though, I just wanted to point out that all of the interesting paging ideas probably wouldn't help the situation that triggered this entire discussion. - Bill Holder, z/VM Development, IBM
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
On 9/20/09 4:26 AM, "Rob van der Heij" wrote: > > Most performance tuning gets harder when you split resources and > consumers in different groups and manage them separately. Sharing is > easier with large numbers. > Rob Although with SSD coming back into vogue, the idea of swap vs page (shades of HPO) might be worth considering again. If the goal is to get a very large number of pages out of the way quickly and/or adding some additional levels of paging hierarchy back into CP, I can see where that would have merit.
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
Rob, In many instances you would be correct. However, in this case, the decisions targeting a specific backing storage pool are made either at LOGON time or during a DEFINE STORAGE command. This is actually a very simple approach to the problem. Also, once the backup storage pool placement decision is made, there should be no impact on the instruction path length. John P. Baker -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Rob van der Heij Sent: Sunday, September 20, 2009 4:26 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo) On Sat, Sep 19, 2009 at 6:21 PM, John P. Baker wrote: I don't like the idea to use only a subset of your paging capacity for part of the workload. It's not just about space but also about throughput. This is imho a very complicated approach to exclude some (small) important users from an OOM killer. The real question is whether you can do an OOM killer at all and achieve something useful by doing so. Most performance tuning gets harder when you split resources and consumers in different groups and manage them separately. Sharing is easier with large numbers. Rob -- Rob van der Heij Velocity Software http://www.velocitysoftware.com/
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
On Sat, Sep 19, 2009 at 6:21 PM, John P. Baker wrote: > I recommend that the idea of splitting page space into multiple pools be > considered, where individual users can be assigned to different pools. For > the purposes of discussion, let us consider that following enhancement: I don't like the idea to use only a subset of your paging capacity for part of the workload. It's not just about space but also about throughput. This is imho a very complicated approach to exclude some (small) important users from an OOM killer. The real question is whether you can do an OOM killer at all and achieve something useful by doing so. Most performance tuning gets harder when you split resources and consumers in different groups and manage them separately. Sharing is easier with large numbers. Rob -- Rob van der Heij Velocity Software http://www.velocitysoftware.com/
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
Rich, Something else that comes to mind is that page space spills into spool space when page space fills up. It may be worth considering to provide system configuration options (both a default and for each backing storage pool) that would determine whether page over-allocation could be spilled into spool space. John P. Baker -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Rich Smrcina Sent: Saturday, September 19, 2009 1:19 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo) Nicely written -- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2010 - Apr 9-14, 2010 Covington, KY
Re: Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
Nicely written John P. Baker wrote: All, Since we have now beat the issue of storage management to death, I would like to set forth some concrete ideas for consideration. First, it has been pointed out that it may not currently be possible to LOGON to MAINT or OPERATOR or to some other service machine in order to diagnose the problem. I recommend that the idea of splitting page space into multiple pools be considered, where individual users can be assigned to different pools. For the purposes of discussion, let us consider that following enhancement: · In the SYSTEM CONFIG file o DEFBACKSTGPOOL pool-id-8 o BACKSTGPOOL pool-id-8 volser-6 · In the CP directory o OPTION BACKSTGPOOL pool-name-8 · Extend the CLASS B CP QUERY command o QUERY BACKSTGPOOL user-id-8 o QUERY DEFBACKSTGPOOL · Extend the CLASS B CP SET command o SET BACKSTGPOOL user-id-8 {DEFAULT | pool-name-8} · Extend the CLASS G CP QUERY command o QUERY BACKSTGPOOL Each paging volume will be allocated to a specific backing storage pool. A LOGON will be rejected if the backing storage pool does not exist. The SET BACKSTGPOOL command will be rejected if the backing storage pool does not exist. Second, provide a specification on whether a virtual machine requires full backing storage for its defined memory size. · In the SYSTEM CONFIG file o DEFBACKSTG {SYSTEM | VMSIZE} · In the CP directory o OPTION BACKSTG {DEFAULT | SYSTEM | VMSIZE} · Extend the CLASS B CP QUERY command o QUERY BACKSTG user-id-8 o QUERY DEFBACKSTG · Extend the CLASS B CP SET command o SET BACKSTG user-id-8 { DEFAULT | SYSTEM | VMSIZE} · Extend the CLASS G CP QUERY command o QUERY BACKSTG If BACKSTG is set or defaulted to SYSTEM, page allocation will continue to operate as it does today. If BACKSTG is set or defaulted to VMSIZE, there must be available within the backing storage spool sufficient space to accommodate the entirety of the specified VMSIZE, otherwise the LOGON, DEFINE STORAGE, or SET BACKSTG command will be failed. The SETBACKSTG command will force a virtual machine reset to occur. These changes will address some of the issues raised. I am certain that other changes would be required, and that other ideas should be considered. Please post your ideas. Don’t hesitate to point out any problems. John P. Baker -- Rich Smrcina Phone: 414-491-6001 http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2010 - Apr 9-14, 2010 Covington, KY
Storage Management Enhancement Ideas (was: VM lockup due to storage typo)
All, Since we have now beat the issue of storage management to death, I would like to set forth some concrete ideas for consideration. First, it has been pointed out that it may not currently be possible to LOGON to MAINT or OPERATOR or to some other service machine in order to diagnose the problem. I recommend that the idea of splitting page space into multiple pools be considered, where individual users can be assigned to different pools. For the purposes of discussion, let us consider that following enhancement: . In the SYSTEM CONFIG file o DEFBACKSTGPOOL pool-id-8 o BACKSTGPOOL pool-id-8 volser-6 . In the CP directory o OPTION BACKSTGPOOL pool-name-8 . Extend the CLASS B CP QUERY command o QUERY BACKSTGPOOL user-id-8 o QUERY DEFBACKSTGPOOL . Extend the CLASS B CP SET command o SET BACKSTGPOOL user-id-8 {DEFAULT | pool-name-8} . Extend the CLASS G CP QUERY command o QUERY BACKSTGPOOL Each paging volume will be allocated to a specific backing storage pool. A LOGON will be rejected if the backing storage pool does not exist. The SET BACKSTGPOOL command will be rejected if the backing storage pool does not exist. Second, provide a specification on whether a virtual machine requires full backing storage for its defined memory size. . In the SYSTEM CONFIG file o DEFBACKSTG {SYSTEM | VMSIZE} . In the CP directory o OPTION BACKSTG {DEFAULT | SYSTEM | VMSIZE} . Extend the CLASS B CP QUERY command o QUERY BACKSTG user-id-8 o QUERY DEFBACKSTG . Extend the CLASS B CP SET command o SET BACKSTG user-id-8 { DEFAULT | SYSTEM | VMSIZE} . Extend the CLASS G CP QUERY command o QUERY BACKSTG If BACKSTG is set or defaulted to SYSTEM, page allocation will continue to operate as it does today. If BACKSTG is set or defaulted to VMSIZE, there must be available within the backing storage spool sufficient space to accommodate the entirety of the specified VMSIZE, otherwise the LOGON, DEFINE STORAGE, or SET BACKSTG command will be failed. The SETBACKSTG command will force a virtual machine reset to occur. These changes will address some of the issues raised. I am certain that other changes would be required, and that other ideas should be considered. Please post your ideas. Don't hesitate to point out any problems. John P. Baker
Re: VM lockup due to storage typo
On Friday, 09/18/2009 at 10:13 EDT, David Boyes wrote: > On 9/18/09 9:32 AM, "Bill Holder" wrote: > > > That is indeed one important question, but there was another one, the > > question of whether this was a denial of service attack exposure, which > > it is not. > > I think that's a point of view question. It's all very Humpty Dumpty. :-) "Integrity" has a precise meaning with regard to APARs. The *guest* is not doing anything to annoy CP. CP is actually annoying himself trying to instantiate the guest. Until control is given to the guest, nothing can be attributed to the guest. The walls between guests and between the guest and CP have not been breached. Ergo, no integrity problem. Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
Adam Thornton wrote: On Sep 18, 2009, at 9:11 AM, David Boyes wrote: I think we're all in violent agreement on that point. Now, the question is what is the best way to put a safety on that gun? Oooh! Oooh! Pick me! Mandatory User Access Control dialog boxes that pop up and make you click OK any time you want to breathe. Adam Would those be 3270 flower boxes? -- Rich Smrcina
Re: VM lockup due to storage typo
While I agree it's not a DoS "attack" exposure, the system issued no messages and allowed no input on any console (via tn3270, OSA ICC console, HMC 3270 or HMC Operating system messages). If we had a way to enter a command or two (probably an IND first), we could have forced off the offender and not hard crashed 30+ other Oracle servers. As someone suggested, CP was probably busy allocating paging structures etc. But should that be to the exclusion of any console input or operator control? To have an entire LPAR appear hung to all consoles, and all Linuxes become non-responsive for 15-20-30 minutes certainly seems like a DoS to me... Lee Bill Holder wrote: I see this as three separate questions (with my answers): Is it a denial of service attack exposure? - Clearly not. Is it a defect? - I don't believe so, for the base issue of whether VM should allow a privileged user do do something destructive, though there may well be defects or scalability / constraint shortcomings exposed by the hang (we'd need to see a dump to understand what's really happening). Is this an area ripe for improvement, could/should VM be smarter about preventing a privileged from doing something dangerous or destructive? - Sure. I won't tell you not to open a requirement. - Bill Holder, z/VM Development, IBM -- Lee Stewart, Senior SE Sirius Computer Solutions Phone: (303) 996-7122 Email: lee.stew...@siriuscom.com Web: www.siriuscom.com
Re: VM lockup due to storage typo
On 9/18/09 3:41 PM, "Brian Nielsen" wrote: > A scenario that hasn't been mentioned deals with draining a PAGE volume. > > The calculation of "defined paging space" might be considered fuzzy if a > > PAGE volume is being DRAINed. Of course, you could be strict and conside > r > such a volume as undefined, but there will be cases where storage > requirements for a guest are less than the available page space but put > > the total demand above "defined paging space". Good point. I think that I would consider a page volume marked as draining as unavailable space as soon as CP starts the DRAIN operation, but you're right that the wording I used is ambiguous. I'll change it to read "available and online paging space". I'll wait a day or so and see if anyone else has comments and resubmit it.
Re: VM lockup due to storage typo
On 9/18/09 3:50 PM, "Tom Duerbusch" wrote: > The problem I would have, is my MAINT user is defined with 1 GB. That is so I > can process large reader files. > The very vast majority of the time, I'm only using a few MB. > Would you fix, prevent MAINT from logging on, when we are at, or near the > discussed problem? > Operations also has some userids of a similar nature. I don't want to be too prescriptive here -- gotta give Alan something to chew on -- but I would expect that there would need to be some exemption mechanism for userids that are known to need extra humungous virtual machine sizes and are known to be reasonably well behaved. If IBM shipped an ESM by default (even an awful one), I'd say that should be done in the ESM, but that's another crusade.
Re: VM lockup due to storage typo
On 9/18/09 4:27 PM, "Schuh, Richard" wrote: > Does "the current physical storage" refer to main or main + xstore? Also, is > there any consideration of the total virtual storage or working sets of the > in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a > dozen users of 991G each logging on to my system that has only 1.02TB total > page+physical memory. > > It might be better to have a config file maximum and simply measure VM size > against it - a MAXSTORE directory option that has been generalized, so to > speak. Of course, any MAXSTORE directory entry that is lower would be > respected. SET commands could temporarily lift or lower the limit for the > system or for specific users. AFAICT, most of the Xstore I see out there is configured to be page cache, so I usually would think of it as configured online paging space. I posed the problem in the requirement as generally as possible. Most cases, IBM doesn't like too specific suggestions in requirements, so I kept my suggestion pretty generalized. If others submit requirements, I suspect it'll be more likely to get their attention and get a solution created.
Re: VM lockup due to storage typo
The action when spool fills has been to make the virtual printers and punches not ready for any user attempting to write. That does keep the system from crashing, but most systems running in the various VMs do not know how to handle it. Recovery can be a problem. It is almost as bad as recovering from a crashed SFS server. Pausing the spool hog(s) is a good idea, especially if it can be done early enough to prevent devices from being made not ready. Pausing page space hogs may be tougher to do. I can IPL a TPF system that is streaming dumps and not do whatever caused it to dump. I can also purge the individual dump files. I have no such action that I can take for a page space hog. In fact, the space it occupies will remain allocated it until it either logs off or does a system reset. About the only thing I can do is force it. I suppose it would be possible redefine its storage, but that would leave it in a virtual system reset state, so I might as well force it. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes > Sent: Friday, September 18, 2009 1:42 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > VM64461 puts the brakes on console spooling by detecting that > something crazy is going on and may exhaust all of vm's > memory and pauses the virtual machine to allow the writes to > disk to take place and the memory to get back under control. > I believe messages are put out. My understanding of that > may be off a little, but that's the gist of it. > > I'd like to see something like that. If a virtual machine is > up and running and CP sees that it is grabbing all of the > page space at an excessive rate or if it is in danger not > getting its page management blocks into memory then stun it > (or maybe even a parm that says no one user can use more the > x% of page). Put out a message to Operator about "Userid > BIGBAD has been halted due to excessive memory consumption" > or something like that. > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard > Sent: Friday, September 18, 2009 1:28 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: [IBMVM] VM lockup due to storage typo > > Does "the current physical storage" refer to main or main + > xstore? Also, is there any consideration of the total virtual > storage or working sets of the in-Queue, in-memory, or > logged-on users in the calculation? I wouldn't want a dozen > users of 991G each logging on to my system that has only > 1.02TB total page+physical memory. > > It might be better to have a config file maximum and simply > measure VM size against it - a MAXSTORE directory option that > has been generalized, so to speak. Of course, any MAXSTORE > directory entry that is lower would be respected. SET > commands could temporarily lift or lower the limit for the > system or for specific users. > > Regards, > Richard Schuh > > > > > -----Original Message- > > From: The IBM z/VM Operating System > > [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes > > Sent: Friday, September 18, 2009 10:49 AM > > To: IBMVM@LISTSERV.UARK.EDU > > Subject: Re: VM lockup due to storage typo > > > > On 9/18/09 11:38 AM, "Bill Holder" wrote: > > > > > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes > > > w > > > rote: > > >> I think we're all in violent agreement on that point. Now, the > > >> question > > > is > > >> what is the best way to put a safety on that gun? > > > Is this a procedural or technical implementation question > (or both)? > > > For the former, I'd say a requirement is appropriate. > > > > OK, got that covered and done. > > > > > For the latter, > > > let's have at it. :) > > > > As I suggested in the requirement: > > > > Possible solution would be to provide a SYSTEM CONFIG option > > (Check_Resource_Alloc_Sanity for discussion purposes) and > associated > &g
Re: VM lockup due to storage typo
VM64461 puts the brakes on console spooling by detecting that something crazy is going on and may exhaust all of vm's memory and pauses the virtual machine to allow the writes to disk to take place and the memory to get back under control. I believe messages are put out. My understanding of that may be off a little, but that's the gist of it. I'd like to see something like that. If a virtual machine is up and running and CP sees that it is grabbing all of the page space at an excessive rate or if it is in danger not getting its page management blocks into memory then stun it (or maybe even a parm that says no one user can use more the x% of page). Put out a message to Operator about "Userid BIGBAD has been halted due to excessive memory consumption" or something like that. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Friday, September 18, 2009 1:28 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: [IBMVM] VM lockup due to storage typo Does "the current physical storage" refer to main or main + xstore? Also, is there any consideration of the total virtual storage or working sets of the in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a dozen users of 991G each logging on to my system that has only 1.02TB total page+physical memory. It might be better to have a config file maximum and simply measure VM size against it - a MAXSTORE directory option that has been generalized, so to speak. Of course, any MAXSTORE directory entry that is lower would be respected. SET commands could temporarily lift or lower the limit for the system or for specific users. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes > Sent: Friday, September 18, 2009 10:49 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > On 9/18/09 11:38 AM, "Bill Holder" wrote: > > > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes > > w > > rote: > >> I think we're all in violent agreement on that point. Now, the > >> question > > is > >> what is the best way to put a safety on that gun? > > Is this a procedural or technical implementation question (or both)? > > For the former, I'd say a requirement is appropriate. > > OK, got that covered and done. > > > For the latter, > > let's have at it. :) > > As I suggested in the requirement: > > Possible solution would be to provide a SYSTEM CONFIG option > (Check_Resource_Alloc_Sanity for discussion purposes) and > associated SET command to check LOGIN, DEF STOR, and IPL > events to determine whether the requested resources (default > virtual storage size for LOGIN, new value for virtual storage > for DEF STOR, and current virtual storage size at time of > issue for IPL) are greater than the current physical storage > and defined paging space. If check is true, then issue a > warning message and cancel the action. > > Option defaults to ON, can be turned off by class A user SET command. > > Not perfect, but would catch most of the scenarios that have > been discussed so far. >
Re: VM lockup due to storage typo
Does "the current physical storage" refer to main or main + xstore? Also, is there any consideration of the total virtual storage or working sets of the in-Queue, in-memory, or logged-on users in the calculation? I wouldn't want a dozen users of 991G each logging on to my system that has only 1.02TB total page+physical memory. It might be better to have a config file maximum and simply measure VM size against it - a MAXSTORE directory option that has been generalized, so to speak. Of course, any MAXSTORE directory entry that is lower would be respected. SET commands could temporarily lift or lower the limit for the system or for specific users. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes > Sent: Friday, September 18, 2009 10:49 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > On 9/18/09 11:38 AM, "Bill Holder" wrote: > > > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes > > w > > rote: > >> I think we're all in violent agreement on that point. Now, the > >> question > > is > >> what is the best way to put a safety on that gun? > > Is this a procedural or technical implementation question (or both)? > > For the former, I'd say a requirement is appropriate. > > OK, got that covered and done. > > > For the latter, > > let's have at it. :) > > As I suggested in the requirement: > > Possible solution would be to provide a SYSTEM CONFIG option > (Check_Resource_Alloc_Sanity for discussion purposes) and > associated SET command to check LOGIN, DEF STOR, and IPL > events to determine whether the requested resources (default > virtual storage size for LOGIN, new value for virtual storage > for DEF STOR, and current virtual storage size at time of > issue for IPL) are greater than the current physical storage > and defined paging space. If check is true, then issue a > warning message and cancel the action. > > Option defaults to ON, can be turned off by class A user SET command. > > Not perfect, but would catch most of the scenarios that have > been discussed so far. >
Re: VM lockup due to storage typo
The problem I would have, is my MAINT user is defined with 1 GB. That is so I can process large reader files. The very vast majority of the time, I'm only using a few MB. Would you fix, prevent MAINT from logging on, when we are at, or near the discussed problem? Operations also has some userids of a similar nature. Tom Duerbusch THD Consulting >>> David Boyes 9/18/2009 12:49 PM >>> On 9/18/09 11:38 AM, "Bill Holder" wrote: > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes w > rote: >> I think we're all in violent agreement on that point. Now, the question > is >> what is the best way to put a safety on that gun? > Is this a procedural or technical implementation question (or both)? > For the former, I'd say a requirement is appropriate. OK, got that covered and done. > For the latter, > let's have at it. :) As I suggested in the requirement: Possible solution would be to provide a SYSTEM CONFIG option (Check_Resource_Alloc_Sanity for discussion purposes) and associated SET command to check LOGIN, DEF STOR, and IPL events to determine whether the requested resources (default virtual storage size for LOGIN, new value for virtual storage for DEF STOR, and current virtual storage size at time of issue for IPL) are greater than the current physical storage and defined paging space. If check is true, then issue a warning message and cancel the action. Option defaults to ON, can be turned off by class A user SET command. Not perfect, but would catch most of the scenarios that have been discussed so far.
Re: VM lockup due to storage typo
On Fri, 18 Sep 2009 13:49:27 -0400, David Boyes wrote: >On 9/18/09 11:38 AM, "Bill Holder" wrote: > >> On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes w >> rote: >>> I think we're all in violent agreement on that point. Now, the questi on >> is >>> what is the best way to put a safety on that gun? >> Is this a procedural or technical implementation question (or both)? >> For the former, I'd say a requirement is appropriate. > >OK, got that covered and done. > >> For the latter, >> let's have at it. :) > >As I suggested in the requirement: > >Possible solution would be to provide a SYSTEM CONFIG option >(Check_Resource_Alloc_Sanity for discussion purposes) and associated SET >command to check LOGIN, DEF STOR, and IPL events to determine whether th e >requested resources (default virtual storage size for LOGIN, new value f or >virtual storage for DEF STOR, and current virtual storage size at time o f >issue for IPL) are greater than the current physical storage and defined >paging space. If check is true, then issue a warning message and cancel the >action. > >Option defaults to ON, can be turned off by class A user SET command. > >Not perfect, but would catch most of the scenarios that have been discussed >so far. A scenario that hasn't been mentioned deals with draining a PAGE volume. The calculation of "defined paging space" might be considered fuzzy if a PAGE volume is being DRAINed. Of course, you could be strict and conside r such a volume as undefined, but there will be cases where storage requirements for a guest are less than the available page space but put the total demand above "defined paging space". Brian
Re: VM lockup due to storage typo
On 9/18/09 11:38 AM, "Bill Holder" wrote: > On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes w > rote: >> I think we're all in violent agreement on that point. Now, the question > is >> what is the best way to put a safety on that gun? > Is this a procedural or technical implementation question (or both)? > For the former, I'd say a requirement is appropriate. OK, got that covered and done. > For the latter, > let's have at it. :) As I suggested in the requirement: Possible solution would be to provide a SYSTEM CONFIG option (Check_Resource_Alloc_Sanity for discussion purposes) and associated SET command to check LOGIN, DEF STOR, and IPL events to determine whether the requested resources (default virtual storage size for LOGIN, new value for virtual storage for DEF STOR, and current virtual storage size at time of issue for IPL) are greater than the current physical storage and defined paging space. If check is true, then issue a warning message and cancel the action. Option defaults to ON, can be turned off by class A user SET command. Not perfect, but would catch most of the scenarios that have been discussed so far.
Re: VM lockup due to storage typo
On 9/18/09 11:58 AM, "Schuh, Richard" wrote: > Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that > name :-) It's your lawful good alter ego, arch nemesis of Chuckie. The Saturday morning cartoon starring the Billster debuts next TV season, along with "Danger at Rockland Island: Endicott in Peril" and "The Poughkeepsie Seven", a drama about seven virtualization protestors illegally imprisoned and tortured in building 705 for resisting the One True OS for System z. 8-)
Re: VM lockup due to storage typo
I think the real problem here is that when CP is thrashing about for whatever reason, it can be very hard to get control of a VM prompt to manually fix things. Perhaps if CP could determine that some resource is being sorely abused, it could degrade the offending machine at least to the point that a favored user can do a bit of problem determination and possibly force the offender(s). Our operator (PROPST) machine has option quickdsp and share rel 1. I hope it never goes astray, but I also have a bit of hope that I will be able to re-connect to it if some other virtual machine buggers the system so I can straighten things out. Bob. -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Friday, September 18, 2009 12:11 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo While you are at it, make it self-healing, including the updating of the source code. Or at least include a Medical Tricorder with each system.:-) > We recognize that CP must be more forgiving and we are working to that > end, examining a variety of solutions that include inertial dampening, > tritanium plating, Kevlar(R), stacks of phone books, as well as taking > the gun away from you and beating you over the head with it (aka "the > retaliatory baseball bat subroutine"). You may need dedicated DUMP packs in order to be able to do this. CP may have outgrown the size of the dump space and cannot allocate a larger space as a result of the problem. > The bottom line is that none of us want the system to go out to lunch. > That doesn't serve anyone's purposes. If it happens, get a restart > dump and let us know. Sometimes it's *not* your fault. Really! :-) > > Alan Altmark > z/VM Development > IBM Endicott > = This electronic transmission and any documents accompanying this electronic transmission contain confidential information belonging to the sender. This information may be legally privileged. The information is intended only for the use of the individual or entity named above. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or the taking of any action in reliance on or regarding the contents of this electronically transmitted information is strictly prohibited.
Re: VM lockup due to storage typo
While you are at it, make it self-healing, including the updating of the source code. Or at least include a Medical Tricorder with each system.:-) > We recognize that CP must be more forgiving and we are > working to that > end, examining a variety of solutions that include inertial > dampening, > tritanium plating, Kevlar(R), stacks of phone books, as well > as taking the > gun away from you and beating you over the head with it (aka "the > retaliatory baseball bat subroutine"). You may need dedicated DUMP packs in order to be able to do this. CP may have outgrown the size of the dump space and cannot allocate a larger space as a result of the problem. > The bottom line is that none of us want the system to go out > to lunch. > That doesn't serve anyone's purposes. If it happens, get a > restart dump > and let us know. Sometimes it's *not* your fault. Really! :-) > > Alan Altmark > z/VM Development > IBM Endicott >
Re: VM lockup due to storage typo
On Thursday, 09/17/2009 at 01:22 EDT, "Schuh, Richard" wrote: > An IPL isn't an action? True, the guest was not aware that it would harm the > system, but absent that action by the guest, there would not have been a > problem. The guest was an unwitting agent, a part of a bot net, as it were. The case where the administrator loads the chamber and the user pulls the trigger to cause an outage is, admittedly, near a line between "normal defect" and "integrity defect". Who, exactly, caused the problem? I can't blame the user - they just logged on with no opportunity (or responsibility!) to review their directory prior to login (how?). This particular problem must be laid at the feet of the sysadmin with all due ceremony, along with any other administrative snafu. But I assert that even that is a red herring. The central issue is not who chambered the weapon or who pulled the trigger. Rather, it is an issue centered on how much shielding is or should be present to mitigate mistakes or errors in judgement by the sysadmins, and, to some extent, from CP's own attempts to make you happy. We recognize that CP must be more forgiving and we are working to that end, examining a variety of solutions that include inertial dampening, tritanium plating, Kevlar(R), stacks of phone books, as well as taking the gun away from you and beating you over the head with it (aka "the retaliatory baseball bat subroutine"). The bottom line is that none of us want the system to go out to lunch. That doesn't serve anyone's purposes. If it happens, get a restart dump and let us know. Sometimes it's *not* your fault. Really! :-) Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
Personally, I have always preferred BAC (Broken As Coded). John P. Baker -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Friday, September 18, 2009 11:58 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that name :-) Working as Documented is another version of WAD. My stance is that if the system dies because of a design "feature", then perhaps that feature ought to be reconsidered. Certainly, there is no way to anticipate all possible feature failures, but when one comes up that is preventable, then the design ought to be tweaked. All of the discussion about whether it is or is not a DOS is totally irrelevant, especially to those who have been victimized. (I thought that Lyn Hadley eliminated WAD and BAD from the IBM vernacular years ago.) Regards, Richard Schuh
Re: VM lockup due to storage typo
Hey Zeke Boyes, who is Bill Schuh? I don't even know of a relative by that name :-) Working as Documented is another version of WAD. My stance is that if the system dies because of a design "feature", then perhaps that feature ought to be reconsidered. Certainly, there is no way to anticipate all possible feature failures, but when one comes up that is preventable, then the design ought to be tweaked. All of the discussion about whether it is or is not a DOS is totally irrelevant, especially to those who have been victimized. (I thought that Lyn Hadley eliminated WAD and BAD from the IBM vernacular years ago.) Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes > Sent: Friday, September 18, 2009 7:12 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > On 9/18/09 9:32 AM, "Bill Holder" wrote: > > > That is indeed one important question, but there was > another one, the > > question of whether this was a denial of service attack exposure, > > which i t is not. > > I think that's a point of view question. > > If I am another user on the same VM system, happy within my > cozy little class G box, and the hypervisor admin does > something outside of my control to some OTHER user that > causes CP to choke, then from the original user's perspective > it IS a DOS attack because it's something that is out of my > control, starves ME, and causes ME to choke without reason. > > An analagous parallel case in the distributed system world > would be a ping flood attack on a network segment. The > innocent get hurt along with the intended target by being > starved of access to the network, and thus lose the ability > to function according to design. > > From the hypervisor admin's POV, then yeah, it's just doing > what it's told to do. It's correct operation, working as documented. > > I think Bill Schuh and Marcy and myself are arguing for the > former viewpoint. I think you and Adam are arguing from the > latter view. > > > I'm not disagreeing that it would be nice if there were > some sor t of > > "are you sure" safety net before the system proceeded to try to do > > something suicidal, but that's a design and requirements > question, not > > a defect question. > > I think we're all in violent agreement on that point. Now, > the question is what is the best way to put a safety on that gun? >
Re: VM lockup due to storage typo
On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes w rote: ... >I think we're all in violent agreement on that point. Now, the question is >what is the best way to put a safety on that gun? > = === Is this a procedural or technical implementation question (or both)? For the former, I'd say a requirement is appropriate. For the latter, let's have at it. :)
Re: VM lockup due to storage typo
On Fri, 18 Sep 2009 10:11:58 -0400, David Boyes wrote: >I think we're all in violent agreement on that point. Now, the question is >what is the best way to put a safety on that gun? Since the Linux OOM model is to kill a process, just kill some Linux virtual machine to free up space... Brian Nielsen
Re: VM lockup due to storage typo
On Fri, Sep 18, 2009 at 4:11 PM, David Boyes wrote: > I think we're all in violent agreement on that point. Now, the question is > what is the best way to put a safety on that gun? IMHO the suggested solutions so far merely bend the barrel upwards. This may deflect the bullet from your own foot in some usage scenarios, but likely hurts other feet and makes the thing in general hard to aim ;-) Rob
Re: VM lockup due to storage typo
On Sep 18, 2009, at 9:11 AM, David Boyes wrote: I think we're all in violent agreement on that point. Now, the question is what is the best way to put a safety on that gun? Oooh! Oooh! Pick me! Mandatory User Access Control dialog boxes that pop up and make you click OK any time you want to breathe. Adam
Re: VM lockup due to storage typo
On 9/18/09 9:38 AM, "Huegel, Thomas" wrote: > A little OT, but curiosity calls.. What is the max. storage that z/LINUX > can use? Last time I looked at the Linux memory management code (a while back) it was 4TB, but that's probably expanded by now. The documented z/VM limit of 8TB has been around for a while; I think that appeared in 5.2.
Re: VM lockup due to storage typo
On 9/18/09 9:32 AM, "Bill Holder" wrote: > That is indeed one important question, but there was another one, the > question of whether this was a denial of service attack exposure, which i > t > is not. I think that's a point of view question. If I am another user on the same VM system, happy within my cozy little class G box, and the hypervisor admin does something outside of my control to some OTHER user that causes CP to choke, then from the original user's perspective it IS a DOS attack because it's something that is out of my control, starves ME, and causes ME to choke without reason. An analagous parallel case in the distributed system world would be a ping flood attack on a network segment. The innocent get hurt along with the intended target by being starved of access to the network, and thus lose the ability to function according to design. >From the hypervisor admin's POV, then yeah, it's just doing what it's told to do. It's correct operation, working as documented. I think Bill Schuh and Marcy and myself are arguing for the former viewpoint. I think you and Adam are arguing from the latter view. > I'm not disagreeing that it would be nice if there were some sor > t > of "are you sure" safety net before the system proceeded to try to do > something suicidal, but that's a design and requirements question, not a > defect question. I think we're all in violent agreement on that point. Now, the question is what is the best way to put a safety on that gun?
Re: VM lockup due to storage typo
I see this as three separate questions (with my answers): Is it a denial of service attack exposure? - Clearly not. Is it a defect? - I don't believe so, for the base issue of whether VM should allow a privileged user do do something destructive, though there may well be defects or scalability / constraint shortcomings exposed by the hang (we'd need to see a dump to understand what's really happening). Is this an area ripe for improvement, could/should VM be smarter about preventing a privileged from doing something dangerous or destructive? - Sure. I won't tell you not to open a requirement. - Bill Holder, z/VM Development, IBM
Re: VM lockup due to storage typo
A little OT, but curiosity calls.. What is the max. storage that z/LINUX can use? -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of David Boyes Sent: Thursday, September 17, 2009 4:37 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo On 9/17/09 2:16 PM, "Adam Thornton" wrote: > "Administrator typo" is not a failure mode the operating system is > designed to protect you from. That may be true now, but I think the point of the argument is that it should not be. On VMS, if you have a SYSTEM priv bit set, the system will still warn you if you're about to do something that seems stupid. If there is an architected limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a problem), then it's not too unreasonable for the system to take defensive measures and issue a warning that all is not right in in the kingdom of Denmark, cream or no cream dresses. It seems like a basic defense that if CP notices you starting something that it KNOWS it may not have resources to complete, requiring confirmation that you know what you're doing (or about to do) is a good defensive measure. Did the system do what you told it to do when you told it to do it? Yes. Whether it should march off a cliff without at least questioning the order is the question at hand. -- db
Re: VM lockup due to storage typo
That is indeed one important question, but there was another one, the question of whether this was a denial of service attack exposure, which i t is not. I'm not disagreeing that it would be nice if there were some sor t of "are you sure" safety net before the system proceeded to try to do something suicidal, but that's a design and requirements question, not a defect question. - Bill Holder, z/VM Development, IBM On Thu, 17 Sep 2009 17:36:44 -0400, David Boyes w rote: >On 9/17/09 2:16 PM, "Adam Thornton" wrote: > > >> "Administrator typo" is not a failure mode the operating system is >> designed to protect you from. > >That may be true now, but I think the point of the argument is that it >should not be. > >On VMS, if you have a SYSTEM priv bit set, the system will still warn yo u if >you're about to do something that seems stupid. If there is an architect ed >limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a >problem), then it's not too unreasonable for the system to take defensiv e >measures and issue a warning that all is not right in in the kingdom of >Denmark, cream or no cream dresses. > >It seems like a basic defense that if CP notices you starting something that >it KNOWS it may not have resources to complete, requiring confirmation t hat >you know what you're doing (or about to do) is a good defensive measure. > >Did the system do what you told it to do when you told it to do it? Yes. >Whether it should march off a cliff without at least questioning the ord er >is the question at hand. > >-- db > = ===
Re: VM lockup due to storage typo
Well, there is precedence here of VM dev fixing things that are too large/too much that take down VM See VM64461 and VM6 I'll probably look into the possibility of a vmsecure exit to add a safety to my gun for now. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation."
Re: VM lockup due to storage typo
On Sep 17, 2009, at 5:36 PM, David Boyes wrote: Whether it should march off a cliff without at least questioning the order is the question at hand. Of course it should. Yes, my Unix is showing. Adam
Re: VM lockup due to storage typo
On 9/17/09 2:16 PM, "Adam Thornton" wrote: > "Administrator typo" is not a failure mode the operating system is > designed to protect you from. That may be true now, but I think the point of the argument is that it should not be. On VMS, if you have a SYSTEM priv bit set, the system will still warn you if you're about to do something that seems stupid. If there is an architected limit (note that the 9.7TB got clipped to 8TB, so SOMETHING noticed a problem), then it's not too unreasonable for the system to take defensive measures and issue a warning that all is not right in in the kingdom of Denmark, cream or no cream dresses. It seems like a basic defense that if CP notices you starting something that it KNOWS it may not have resources to complete, requiring confirmation that you know what you're doing (or about to do) is a good defensive measure. Did the system do what you told it to do when you told it to do it? Yes. Whether it should march off a cliff without at least questioning the order is the question at hand. -- db
Re: VM lockup due to storage typo
On Sep 17, 2009, at 1:58 PM, Bill Holder wrote: I'd agree with that point in cases where it's less clear, but in this case, it's perfectly clear that the user action would have been harmless if not for the administrator typo Yabbut "Administrator typo" is not a failure mode the operating system is designed to protect you from. If you have authority to edit the user directory, then, well, your gun, your foot. Adam
Re: VM lockup due to storage typo
FYI, the system in question had about 175GB of page space - 22 mod 9s. Currently the system does NO paging. All the guests fit within real storage. (Of course there will eventually be more guests on that LPAR, so sooner or later we'll start to page.) Lee Rob van der Heij wrote: On Thu, Sep 17, 2009 at 6:34 PM, Bill Holder wrote: Occurrences of this sort of problem are likely to result in temporary or permanent hangs of both individual users and eventually the entire system, which supports the theory in this case. I'd really need to see a dump of the system in question to confirm this hypothesis, however. And I think Lee has not yet mentioned how much paging space he had allocated. With a 175G LPAR you would think he has at least 175G worth of virtual machines, so 350G of paging space... for the moment the next virtual machine went over the edge. I very much doubt he was that well prepared. With that amount of space, things might have gotten slow but there's a fair chance CP would have survived the abuse. Rob -- Lee Stewart, Senior SE Sirius Computer Solutions Phone: (303) 996-7122 Email: lee.stew...@siriuscom.com Web: www.siriuscom.com
Re: VM lockup due to storage typo
On Thu, Sep 17, 2009 at 10:58 AM, Bill Holder wrote: > I'd agree with that point in cases where it's less clear, but in > this case, it's perfectly clear that the user action would have > been harmless if not for the administrator typo. I don't disagree > that more protection at the user action level would be nice in > this case, that's really different discussion than whether this > constitutes a denial of service exposure. OK, I buy that. If the sysprog does a UCR to make SHUTDOWN class G, it isn't VM's fault if a user issues SHUTDOWN.
Re: VM lockup due to storage typo
I'd agree with that point in cases where it's less clear, but in this case, it's perfectly clear that the user action would have been harmless if not for the administrator typo. I don't disagree that more protection at the user action level would be nice in this case, that's really different discussion than whether this constitutes a denial of service exposure. There's a reason that trusted users are called that, because they have the power to shoot themselves, and the entire system. We cannot protect against every possible harmful act by trusted users, whether accidental or malicious. Regards, - Bill Holder On Thu, 17 Sep 2009 10:48:53 -0700, Schuh, Richard wrot e: >I don't think you can differentiate between the root cause and the immediate cause when it comes to security and integrity. You may not necessarily be able to detect the root cause, but you must protect the system against the immediate cause if at all possible. > >Regards, >Richard Schuh > >
Re: VM lockup due to storage typo
I don't think you can differentiate between the root cause and the immediate cause when it comes to security and integrity. You may not necessarily be able to detect the root cause, but you must protect the system against the immediate cause if at all possible. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder > Sent: Thursday, September 17, 2009 10:35 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > Sure, true enough, but the exposure was not caused by the > guest action. Yes, it wouldn't have happened had the guest > not logged on an IPLed, but that wasn't the root cause, the typo was. > The action of the class G user didn't cause the problem, > therefore it's not a Denial of Service attack case. Note > that I'm not saying it's not APARable, however. > > Regards, > - Bill Holder > > On Thu, 17 Sep 2009 10:21:05 -0700, Schuh, Richard > wrot= > e: > > >An IPL isn't an action? True, the guest was not aware that it would > >harm= > > the system, but absent that action by the guest, there would > not have bee= n a problem. The guest was an unwitting agent, > a part of a bot net, as it wer= e. > > > >Regards, > >Richard Schuh > > > > > > > >> -Original Message- > >> From: The IBM z/VM Operating System > >> [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder > >> Sent: Thursday, September 17, 2009 9:14 AM > >> To: IBMVM@LISTSERV.UARK.EDU > >> Subject: Re: VM lockup due to storage typo > >> > >> I don't entirely agree. The action of the guest did not > cause harm > >> to CP, it was the action of the operations staff which > did. This is > >> not a denial of service case that I can see. > >> > >> Bill Holder > >> z/VM Development, Memory Management team leader, IBM > >> > >> On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard > > >> wrot= > >> e: > >> > >> >Maybe CP couldn't know that the guest would do something > bad, but it > >> >= > > >> >sho= > >> uld > >> know that it has opened itself to the possibility that the guest > >> could, i= n normal operation, cause the problem. > >> >One of Alan's first precepts of information security and > >> integrity is > >> >th= > >> at > >> the guest cannot be allowed to harm the CP. This clearly violates > >> that. > >> > > >> >Regards, > >> >Richard Schuh > >> > > >> > > >> > > >> >> -Original Message- > >> >> From: The IBM z/VM Operating System > >> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > >> >> Sent: Tuesday, September 15, 2009 9:19 AM > >> >> To: IBMVM@LISTSERV.UARK.EDU > >> >> Subject: Re: VM lockup due to storage typo > >> >> > >> >> CP wouldn't know at IPL time, the guest would, not could, > >> but would > >> >> cause such harm. > >> >> > >> >> Just because you say you can use xxx GB, doesn't mean you would > >> >> actually use them. > >> >> > >> >> When page fills, it over flows to spool. > >> >> When spool fills, CP abends on the next pageout. > >> >> > >> >> Tom Duerbusch > >> >> THD Consulting > >> >> > >> >> >>> Marcy Cortes 9/15/2009 > >> >> 11:02 AM >>> > >> >> See a thread on this list with subject "Sanity check?" > >> from Oct 2007 > >> >> for what happened when I did the same thing ;) > >> >> > >> >> You probably filled page space. > >> >> > >> >> I still think IBM should refuse to IPL a guest that will > >> cause such > >> >> harm. > >> >> > >> >> > >> >> Marcy > >> >> > >> >> "This message may contain confidential and/or privileged > >> information. > >> >> If you are not the addressee or authorized to receive > this for the > >> >> = > > >> >> addressee, you must not use, copy, disclose, or take any > >> action based > >> >> on this me
Re: VM lockup due to storage typo
Sure, true enough, but the exposure was not caused by the guest action. Yes, it wouldn't have happened had the guest not logged on an IPLed, but that wasn't the root cause, the typo was. The action of the class G user didn't cause the problem, therefore it's not a Denial of Service attack case. Note that I'm not saying it's not APARable, however. Regards, - Bill Holder On Thu, 17 Sep 2009 10:21:05 -0700, Schuh, Richard wrot e: >An IPL isn't an action? True, the guest was not aware that it would harm the system, but absent that action by the guest, there would not have bee n a problem. The guest was an unwitting agent, a part of a bot net, as it wer e. > >Regards, >Richard Schuh > > > >> -Original Message- >> From: The IBM z/VM Operating System >> [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder >> Sent: Thursday, September 17, 2009 9:14 AM >> To: IBMVM@LISTSERV.UARK.EDU >> Subject: Re: VM lockup due to storage typo >> >> I don't entirely agree. The action of the guest did not >> cause harm to CP, it was the action of the operations staff >> which did. This is not a denial of service case that I can see. >> >> Bill Holder >> z/VM Development, Memory Management team leader, IBM >> >> On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard >> wrot= >> e: >> >> >Maybe CP couldn't know that the guest would do something bad, but it >> >sho= >> uld >> know that it has opened itself to the possibility that the >> guest could, i= n normal operation, cause the problem. >> >One of Alan's first precepts of information security and >> integrity is >> >th= >> at >> the guest cannot be allowed to harm the CP. This clearly >> violates that. >> > >> >Regards, >> >Richard Schuh >> > >> > >> > >> >> -Original Message- >> >> From: The IBM z/VM Operating System >> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch >> >> Sent: Tuesday, September 15, 2009 9:19 AM >> >> To: IBMVM@LISTSERV.UARK.EDU >> >> Subject: Re: VM lockup due to storage typo >> >> >> >> CP wouldn't know at IPL time, the guest would, not could, >> but would >> >> cause such harm. >> >> >> >> Just because you say you can use xxx GB, doesn't mean you would >> >> actually use them. >> >> >> >> When page fills, it over flows to spool. >> >> When spool fills, CP abends on the next pageout. >> >> >> >> Tom Duerbusch >> >> THD Consulting >> >> >> >> >>> Marcy Cortes 9/15/2009 >> >> 11:02 AM >>> >> >> See a thread on this list with subject "Sanity check?" >> from Oct 2007 >> >> for what happened when I did the same thing ;) >> >> >> >> You probably filled page space. >> >> >> >> I still think IBM should refuse to IPL a guest that will >> cause such >> >> harm. >> >> >> >> >> >> Marcy >> >> >> >> "This message may contain confidential and/or privileged >> information. >> >> If you are not the addressee or authorized to receive this for the >> >> addressee, you must not use, copy, disclose, or take any >> action based >> >> on this message or any information herein. If you have >> received this >> >> message in error, please advise the sender immediately by reply >> >> e-mail and delete this message. Thank you for your cooperation." >> >> >> >> >> >> -Original Message- >> >> From: The IBM z/VM Operating System >> >> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart >> >> Sent: Tuesday, September 15, 2009 8:39 AM >> >> To: IBMVM@LISTSERV.UARK.EDU >> >> Subject: [IBMVM] VM lockup due to storage typo >> >> >> >> Does anyone have an idea of how we might have gotten out of >> >> this without an IPL? >> >> >> >> VM LPAR has 175G of memory and a flock of Linux Oracle guests... >> >> Several guests needed more memory added so the directory was >> >> updated and one by one the guests shutdown, logged off and >> >> back on. So far, so good. >> >> >> >> But... In changing the memory for many guests, and it being &
Re: VM lockup due to storage typo
An IPL isn't an action? True, the guest was not aware that it would harm the system, but absent that action by the guest, there would not have been a problem. The guest was an unwitting agent, a part of a bot net, as it were. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder > Sent: Thursday, September 17, 2009 9:14 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > I don't entirely agree. The action of the guest did not > cause harm to CP, it was the action of the operations staff > which did. This is not a denial of service case that I can see. > > Bill Holder > z/VM Development, Memory Management team leader, IBM > > On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard > wrot= > e: > > >Maybe CP couldn't know that the guest would do something bad, but it > >sho= > uld > know that it has opened itself to the possibility that the > guest could, i= n normal operation, cause the problem. > >One of Alan's first precepts of information security and > integrity is > >th= > at > the guest cannot be allowed to harm the CP. This clearly > violates that. > > > >Regards, > >Richard Schuh > > > > > > > >> -Original Message- > >> From: The IBM z/VM Operating System > >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > >> Sent: Tuesday, September 15, 2009 9:19 AM > >> To: IBMVM@LISTSERV.UARK.EDU > >> Subject: Re: VM lockup due to storage typo > >> > >> CP wouldn't know at IPL time, the guest would, not could, > but would > >> cause such harm. > >> > >> Just because you say you can use xxx GB, doesn't mean you would > >> actually use them. > >> > >> When page fills, it over flows to spool. > >> When spool fills, CP abends on the next pageout. > >> > >> Tom Duerbusch > >> THD Consulting > >> > >> >>> Marcy Cortes 9/15/2009 > >> 11:02 AM >>> > >> See a thread on this list with subject "Sanity check?" > from Oct 2007 > >> for what happened when I did the same thing ;) > >> > >> You probably filled page space. > >> > >> I still think IBM should refuse to IPL a guest that will > cause such > >> harm. > >> > >> > >> Marcy > >> > >> "This message may contain confidential and/or privileged > information. > >> If you are not the addressee or authorized to receive this for the > >> addressee, you must not use, copy, disclose, or take any > action based > >> on this message or any information herein. If you have > received this > >> message in error, please advise the sender immediately by reply > >> e-mail and delete this message. Thank you for your cooperation." > >> > >> > >> -Original Message- > >> From: The IBM z/VM Operating System > >> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart > >> Sent: Tuesday, September 15, 2009 8:39 AM > >> To: IBMVM@LISTSERV.UARK.EDU > >> Subject: [IBMVM] VM lockup due to storage typo > >> > >> Does anyone have an idea of how we might have gotten out of > >> this without an IPL? > >> > >> VM LPAR has 175G of memory and a flock of Linux Oracle guests... > >> Several guests needed more memory added so the directory was > >> updated and one by one the guests shutdown, logged off and > >> back on. So far, so good. > >> > >> But... In changing the memory for many guests, and it being > >> late at night after a long day, while meaning to set a > >> guest's memory to 9728M, it got set to 9728G. When that > >> guest was cycled we see the message on the console that it's > >> memory was limited to 8TB (HCPLGN093E), then the VM system > >> appeared to freeze. > >> > >> We couldn't get in via TCP/IP, or the HMC Operating System > >> Messages screen, or the HMC Integrated 3270. > >> > >> Finally had to IPL. Even that was wierd as I'd have > >> expected the Load > >> Normal to shutdown, it just IPLed. We did NoAutolog, > fixed the typo = > > >> and all came back up ok... > >> > >> I suspect CP was scrambling paging everything in the world > >> out as Linux > >> tried to initialize that 8TB of memory... But I'm surprised > >> I couldn't > >> even get into the HMC consoles (to kill just that one guest > >> as opposed to all of them).. > >> > >> Any thoughts? > >> Lee > >> -- > >> > >> Lee Stewart, Senior SE > >> Sirius Computer Solutions > >> Phone: (303) 996-7122 > >> Email: lee.stew...@siriuscom.com > >> Web: www.siriuscom.com > >> = > == > === >
Re: VM lockup due to storage typo
On Thu, Sep 17, 2009 at 6:34 PM, Bill Holder wrote: > Occurrences of this sort of problem are likely to result in temporary > or permanent hangs of both individual users and eventually the entire > system, which supports the theory in this case. I'd really need to > see a dump of the system in question to confirm this hypothesis, > however. And I think Lee has not yet mentioned how much paging space he had allocated. With a 175G LPAR you would think he has at least 175G worth of virtual machines, so 350G of paging space... for the moment the next virtual machine went over the edge. I very much doubt he was that well prepared. With that amount of space, things might have gotten slow but there's a fair chance CP would have survived the abuse. Rob
Re: VM lockup due to storage typo
It sounds very similar in symptom to my minidisk cache overcommitment problem that resulted in CP thrashing (and an APAR). -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Bill Holder Sent: Thursday, September 17, 2009 12:34 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo I should point out that this hang is likely being misunderstood here. = While this scenario will indeed drive paging over the edge, that's not = likely what happened. If paging had been driven to that point, the system would have quickly taken a PGT004 abend and restarted. Instead, = I believe what happened is likely a most difficult to solve variant on something that was mentioned before: that is, difficulty allocating CP structures required to represent the massive amount of storage. Page tables are only part of the problem. The upper level DAT tables (region = and segment) can be up to 4 frames long, and once storage utilization becomes heavy enough, it becomes fragmented (PGMBK allocation being a factor here), making it very difficult for CP to allocate contiguous = sets of 3s and 4s. We spent quite a bit of effort in z/VM 5.3.0 addressing the PGMBK side of this issue, but the harder problem of the upper level tables remains as a likely constraint point. Occurrences of this sort of problem are likely to result in temporary or permanent hangs of both individual users and eventually the entire system, which supports the theory in this case. I'd really need to see a dump of the system in question to confirm this hypothesis, however. Bill Holder z/VM Development, Memory Management team lead, IBM
Re: VM lockup due to storage typo
I should point out that this hang is likely being misunderstood here. While this scenario will indeed drive paging over the edge, that's not likely what happened. If paging had been driven to that point, the system would have quickly taken a PGT004 abend and restarted. Instead, I believe what happened is likely a most difficult to solve variant on something that was mentioned before: that is, difficulty allocating CP structures required to represent the massive amount of storage. Page tables are only part of the problem. The upper level DAT tables (region and segment) can be up to 4 frames long, and once storage utilization becomes heavy enough, it becomes fragmented (PGMBK allocation being a factor here), making it very difficult for CP to allocate contiguous sets of 3s and 4s. We spent quite a bit of effort in z/VM 5.3.0 addressing the PGMBK side of this issue, but the harder problem of the upper level tables remains as a likely constraint point. Occurrences of this sort of problem are likely to result in temporary or permanent hangs of both individual users and eventually the entire system, which supports the theory in this case. I'd really need to see a dump of the system in question to confirm this hypothesis, however. Bill Holder z/VM Development, Memory Management team lead, IBM
Re: VM lockup due to storage typo
No, not at all, that's not what I was saying; what you propose would obviously be an exposure. A privileged user (operations staff) can issue that today. Putting a loaded gun in the hands of a class G user is not a t all the same thing. Anything a user at a keyboard can do, a guest progra m can do, generally, and they all have to be protected. On Thu, 17 Sep 2009 09:23:11 -0700, P S wrote: >On Thu, Sep 17, 2009 at 9:14 AM, Bill Holder wrote: >> I don't entirely agree. The action of the guest did not cause harm >> to CP, it was the action of the operations staff which did. This >> is not a denial of service case that I can see. > >Hm. So by that rationale, we can make STORE H class G, because it >won't be the *guest* harming CP, it will be the end-user who types the >command.
Re: VM lockup due to storage typo
On Thu, Sep 17, 2009 at 9:14 AM, Bill Holder wrote: > I don't entirely agree. The action of the guest did not cause harm > to CP, it was the action of the operations staff which did. This > is not a denial of service case that I can see. Hm. So by that rationale, we can make STORE H class G, because it won't be the *guest* harming CP, it will be the end-user who types the command.
Re: VM lockup due to storage typo
I don't entirely agree. The action of the guest did not cause harm to CP, it was the action of the operations staff which did. This is not a denial of service case that I can see. Bill Holder z/VM Development, Memory Management team leader, IBM On Tue, 15 Sep 2009 09:59:09 -0700, Schuh, Richard wrot e: >Maybe CP couldn't know that the guest would do something bad, but it sho uld know that it has opened itself to the possibility that the guest could, i n normal operation, cause the problem. >One of Alan's first precepts of information security and integrity is th at the guest cannot be allowed to harm the CP. This clearly violates that. > >Regards, >Richard Schuh > > > >> -Original Message- >> From: The IBM z/VM Operating System >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch >> Sent: Tuesday, September 15, 2009 9:19 AM >> To: IBMVM@LISTSERV.UARK.EDU >> Subject: Re: VM lockup due to storage typo >> >> CP wouldn't know at IPL time, the guest would, not could, but >> would cause such harm. >> >> Just because you say you can use xxx GB, doesn't mean you >> would actually use them. >> >> When page fills, it over flows to spool. >> When spool fills, CP abends on the next pageout. >> >> Tom Duerbusch >> THD Consulting >> >> >>> Marcy Cortes 9/15/2009 >> 11:02 AM >>> >> See a thread on this list with subject "Sanity check?" from >> Oct 2007 for what happened when I did the same thing ;) >> >> You probably filled page space. >> >> I still think IBM should refuse to IPL a guest that will >> cause such harm. >> >> >> Marcy >> >> "This message may contain confidential and/or privileged >> information. If you are not the addressee or authorized to >> receive this for the addressee, you must not use, copy, >> disclose, or take any action based on this message or any >> information herein. If you have received this message in >> error, please advise the sender immediately by reply e-mail >> and delete this message. Thank you for your cooperation." >> >> >> -Original Message- >> From: The IBM z/VM Operating System >> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart >> Sent: Tuesday, September 15, 2009 8:39 AM >> To: IBMVM@LISTSERV.UARK.EDU >> Subject: [IBMVM] VM lockup due to storage typo >> >> Does anyone have an idea of how we might have gotten out of >> this without an IPL? >> >> VM LPAR has 175G of memory and a flock of Linux Oracle guests... >> Several guests needed more memory added so the directory was >> updated and one by one the guests shutdown, logged off and >> back on. So far, so good. >> >> But... In changing the memory for many guests, and it being >> late at night after a long day, while meaning to set a >> guest's memory to 9728M, it got set to 9728G. When that >> guest was cycled we see the message on the console that it's >> memory was limited to 8TB (HCPLGN093E), then the VM system >> appeared to freeze. >> >> We couldn't get in via TCP/IP, or the HMC Operating System >> Messages screen, or the HMC Integrated 3270. >> >> Finally had to IPL. Even that was wierd as I'd have >> expected the Load >> Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo >> and all came back up ok... >> >> I suspect CP was scrambling paging everything in the world >> out as Linux >> tried to initialize that 8TB of memory... But I'm surprised >> I couldn't >> even get into the HMC consoles (to kill just that one guest >> as opposed to all of them).. >> >> Any thoughts? >> Lee >> -- >> >> Lee Stewart, Senior SE >> Sirius Computer Solutions >> Phone: (303) 996-7122 >> Email: lee.stew...@siriuscom.com >> Web: www.siriuscom.com >> = ===
Re: VM lockup due to storage typo
Unless you set MAXSTORAGE in the profile and used * as the upper limit in the USER entry. Then if you change the lower limit to be higher than the setting in the profile, you get an error. On Wed, Sep 16, 2009 at 3:48 PM, Lee Stewart wrote: > Not really as we were dealing with a lot of guests. So the only practical > place to put it would be in a profile. But according to usage note #1: A > maximum storage setting on a USER statement overrides a MAXSTORAGE statement > in a profile. > > So it would have no effect... > > Lee > > Ron Schmiedge wrote: >> >> I've been trying to follow the discussion and wondering if the >> directory control statement >> >> MAXSTORAGE >> >> would have provided some protection from the finger check problem? > > -- > > Lee Stewart, Senior SE > Sirius Computer Solutions > Phone: (303) 996-7122 > Email: lee.stew...@siriuscom.com > Web: www.siriuscom.com >
Re: VM lockup due to storage typo
Not really as we were dealing with a lot of guests. So the only practical place to put it would be in a profile. But according to usage note #1: A maximum storage setting on a USER statement overrides a MAXSTORAGE statement in a profile. So it would have no effect... Lee Ron Schmiedge wrote: I've been trying to follow the discussion and wondering if the directory control statement MAXSTORAGE would have provided some protection from the finger check problem? -- Lee Stewart, Senior SE Sirius Computer Solutions Phone: (303) 996-7122 Email: lee.stew...@siriuscom.com Web: www.siriuscom.com
Re: VM lockup due to storage typo
Only if it were included in every directory entry, or at least the one in question. Having a global MAXSTORAGE would be better protection. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Ron Schmiedge > Sent: Wednesday, September 16, 2009 2:20 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > I've been trying to follow the discussion and wondering if > the directory control statement > > MAXSTORAGE > > would have provided some protection from the finger check problem? > > > > On Wed, Sep 16, 2009 at 2:59 PM, Alan Altmark > wrote: > > On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart > > wrote: > >> I guess as the one who got bit, I'd offer one easy suggestion... > >> > >> The finger check asked for 9728G (9.7+T), VM > unceremoniously chopped > >> it to 8T as the architecture limit. Why not have an option (not > >> enabled by > >> default) in the SYSTEM CONFIG file that says Max_Virt_Size. It > >> could take numbers (like the USER storage specification), > or OFF to > >> indicate no checking. And maybe something like RSS for > Real Storage > >> Size to say you can't logon with or define storage to more > than the > >> amount of Real Storage. > >> > >> And if you really wanted a full circle, then a directory > option that > >> said this one user could override that setting. > >> > >> That said I'm kind of swamped for the next two weeks, but > after that > >> if someone wants to coach me on writing a requirement, I will... > > > > For DIRMAINT, look at the DVHXRA/B/C exits to implement > whatever kind > > of policy limits you like. > > > > Alan Altmark > > z/VM Development > > IBM Endicott > > >
Re: VM lockup due to storage typo
I've been trying to follow the discussion and wondering if the directory control statement MAXSTORAGE would have provided some protection from the finger check problem? On Wed, Sep 16, 2009 at 2:59 PM, Alan Altmark wrote: > On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart > wrote: >> I guess as the one who got bit, I'd offer one easy suggestion... >> >> The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it >> to 8T as the architecture limit. Why not have an option (not enabled by >> default) in the SYSTEM CONFIG file that says Max_Virt_Size. It could >> take numbers (like the USER storage specification), or OFF to indicate >> no checking. And maybe something like RSS for Real Storage Size to say >> you can't logon with or define storage to more than the amount of Real >> Storage. >> >> And if you really wanted a full circle, then a directory option that >> said this one user could override that setting. >> >> That said I'm kind of swamped for the next two weeks, but after that if >> someone wants to coach me on writing a requirement, I will... > > For DIRMAINT, look at the DVHXRA/B/C exits to implement whatever kind of > policy limits you like. > > Alan Altmark > z/VM Development > IBM Endicott >
Re: VM lockup due to storage typo
On Wednesday, 09/16/2009 at 04:44 EDT, Lee Stewart wrote: > I guess as the one who got bit, I'd offer one easy suggestion... > > The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it > to 8T as the architecture limit. Why not have an option (not enabled by > default) in the SYSTEM CONFIG file that says Max_Virt_Size. It could > take numbers (like the USER storage specification), or OFF to indicate > no checking. And maybe something like RSS for Real Storage Size to say > you can't logon with or define storage to more than the amount of Real > Storage. > > And if you really wanted a full circle, then a directory option that > said this one user could override that setting. > > That said I'm kind of swamped for the next two weeks, but after that if > someone wants to coach me on writing a requirement, I will... For DIRMAINT, look at the DVHXRA/B/C exits to implement whatever kind of policy limits you like. Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
On Wed, Sep 16, 2009 at 3:06 PM, Huegel, Thomas wrote: > I don't know that I want CP to do anything different than it does now > EXCEPT I want z/VM to a) keep running and b) have some facility that I > can use to be able to examine the system to find/fix the problem... I > I agree. The mainframe has a long history of managing over committed resources, but Linux is presenting new challenges since it was not written to be virtualized. Rob noted earlier: > One of the problems with booting Linux is that it determines the size > of the virtual machine by testing pages rather than ask CP about it. It seems to me that this will become a problem in other virtual environments as well and, similar to the timer tick problem, another opportunity for the mainframe to show Linux a better way to behave. If Linux does not use up all available space when it starts, there is opportunity to monitor and intervene before it gets critical. Then we do not have to worry about making sure all our virtual blocks fit in the virtual toy box. > don't know/care how that get's done, maybe reserving some page space for > CP and/or a special 'hook' into the HMC.. I'll leave that up to the > developers. > > -Original Message- > From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On > Behalf Of P S > Sent: Wednesday, September 16, 2009 12:53 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard > wrote: > > Logon would not be the right or only place to put it. DEF STOR is > another possible place to err if the maximum storage was too high. > Perhaps a check of virtual storage at IPL time. That is a common point > that must be traversed no matter where the error occurred. > > Suggest this not get hung up on "But it won't be perfect" ideas. For > DIRMAINT, perhaps a site configuration option could say "Warn me if a > userid is defined with either storage limit above x". Similarly, at > LOGON or DEFINE STORAGE, if the VMsize is > than the total page space > defined, a warning would be useful. > > This doesn't help for aggregate overload (20x1GB with 4GB of page > space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system > into the ground before the operator (what operator?) can react, etc., > but it would at least give some more informed consent. > > In this era of Big Numbers and big Linux guests, this is probably more > important than it used to be -- in days of yore, if you accidentally > defined a 32MB guest on an 8MB system, (a) there probably WAS enough > page space, and (b) the user was probably CMS and wouldn't touch the > pages that fast anyway. > Ethan
Re: VM lockup due to storage typo
I guess as the one who got bit, I'd offer one easy suggestion... The finger check asked for 9728G (9.7+T), VM unceremoniously chopped it to 8T as the architecture limit. Why not have an option (not enabled by default) in the SYSTEM CONFIG file that says Max_Virt_Size. It could take numbers (like the USER storage specification), or OFF to indicate no checking. And maybe something like RSS for Real Storage Size to say you can't logon with or define storage to more than the amount of Real Storage. And if you really wanted a full circle, then a directory option that said this one user could override that setting. That said I'm kind of swamped for the next two weeks, but after that if someone wants to coach me on writing a requirement, I will... Lee Alan Altmark wrote: On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 wrote: I don't think, in this case, it is the user causing the problem at all. The user didn't define their storage allocation, and in practice can't do that at all. So the user didn't set up the situation which caused the integrity issue, the system administrator did. That was my point to Marcy: Not an integrity problem. The system is obeying the sysadmin's instructions. To my mind, if this requires addressing, it should be in the DIRECTXA command, so as to help the system administrator in avoiding aiming the gun at his toes. DIRECTXA has no context in which to make such warnings. Placing limits at LOGON would only apply to resource availability to hold the needed control structures. When the guest begins to run and actually use all that memory, then another line of defense is needed. Alan Altmark z/VM Development IBM Endicott -- Lee Stewart, Senior SE Sirius Computer Solutions Phone: (303) 996-7122 Email: lee.stew...@siriuscom.com Web: www.siriuscom.com
Re: VM lockup due to storage typo
On 9/15/09 12:09 PM, "Daniel P. Martin" wrote: > *cough*SHARE requirement?*cough* WAVV requirement WRIBDB04 submitted. I suggested a SYSTEM CONFIG option and corresponding SET command to warn user/operator and optionally halt IPL if a user requested LOGON or issued an IPL command with a default VM size greater than the sum of real memory and configured PAGE space. Normal setting would be MEMSANITY ON, but the SET MEMSANITY OFF command would still allow experienced admins to shoot themselves in the foot if necessary. IBM: Since I seem to be Designated Requirements Dude these days, maybe you should just give me direct login access to the requirements DB. It'd save time, and you'd get requirements earlier in the planning cycle. 8-) -- db
Re: VM lockup due to storage typo
I don't know that I want CP to do anything different than it does now EXCEPT I want z/VM to a) keep running and b) have some facility that I can use to be able to examine the system to find/fix the problem... I don't know/care how that get's done, maybe reserving some page space for CP and/or a special 'hook' into the HMC.. I'll leave that up to the developers. -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of P S Sent: Wednesday, September 16, 2009 12:53 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard wrote: > Logon would not be the right or only place to put it. DEF STOR is another possible place to err if the maximum storage was too high. Perhaps a check of virtual storage at IPL time. That is a common point that must be traversed no matter where the error occurred. Suggest this not get hung up on "But it won't be perfect" ideas. For DIRMAINT, perhaps a site configuration option could say "Warn me if a userid is defined with either storage limit above x". Similarly, at LOGON or DEFINE STORAGE, if the VMsize is > than the total page space defined, a warning would be useful. This doesn't help for aggregate overload (20x1GB with 4GB of page space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system into the ground before the operator (what operator?) can react, etc., but it would at least give some more informed consent. In this era of Big Numbers and big Linux guests, this is probably more important than it used to be -- in days of yore, if you accidentally defined a 32MB guest on an 8MB system, (a) there probably WAS enough page space, and (b) the user was probably CMS and wouldn't touch the pages that fast anyway.
Re: VM lockup due to storage typo
On Wed, Sep 16, 2009 at 10:42 AM, Schuh, Richard wrote: > Logon would not be the right or only place to put it. DEF STOR is another > possible place to err if the maximum storage was too high. Perhaps a check of > virtual storage at IPL time. That is a common point that must be traversed no > matter where the error occurred. Suggest this not get hung up on "But it won't be perfect" ideas. For DIRMAINT, perhaps a site configuration option could say "Warn me if a userid is defined with either storage limit above x". Similarly, at LOGON or DEFINE STORAGE, if the VMsize is > than the total page space defined, a warning would be useful. This doesn't help for aggregate overload (20x1GB with 4GB of page space), doesn't guarantee that XAUTOLOG BIGPIG won't spiral the system into the ground before the operator (what operator?) can react, etc., but it would at least give some more informed consent. In this era of Big Numbers and big Linux guests, this is probably more important than it used to be -- in days of yore, if you accidentally defined a 32MB guest on an 8MB system, (a) there probably WAS enough page space, and (b) the user was probably CMS and wouldn't touch the pages that fast anyway.
Re: VM lockup due to storage typo
Logon would not be the right or only place to put it. DEF STOR is another possible place to err if the maximum storage was too high. Perhaps a check of virtual storage at IPL time. That is a common point that must be traversed no matter where the error occurred. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark > Sent: Wednesday, September 16, 2009 10:20 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 > wrote: > > I don't think, in this case, it is the user causing the > problem at all. > The > > user didn't define their storage allocation, and in > practice can't do > that > > at all. So the user didn't set up the situation which caused the > integrity > > issue, the system administrator did. > > That was my point to Marcy: Not an integrity problem. The > system is obeying the sysadmin's instructions. > > > To my mind, if this requires addressing, it should be in > the DIRECTXA > > command, so as to help the system administrator in avoiding > aiming the > gun > > at his toes. > > DIRECTXA has no context in which to make such warnings. > Placing limits at LOGON would only apply to resource > availability to hold the needed control structures. When the > guest begins to run and actually use all that memory, then > another line of defense is needed. > > Alan Altmark > z/VM Development > IBM Endicott >
Re: VM lockup due to storage typo
On Wednesday, 09/16/2009 at 09:14 EDT, RPN01 wrote: > I don't think, in this case, it is the user causing the problem at all. The > user didn't define their storage allocation, and in practice can't do that > at all. So the user didn't set up the situation which caused the integrity > issue, the system administrator did. That was my point to Marcy: Not an integrity problem. The system is obeying the sysadmin's instructions. > To my mind, if this requires addressing, it should be in the DIRECTXA > command, so as to help the system administrator in avoiding aiming the gun > at his toes. DIRECTXA has no context in which to make such warnings. Placing limits at LOGON would only apply to resource availability to hold the needed control structures. When the guest begins to run and actually use all that memory, then another line of defense is needed. Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
If you bought the Dirmaint product or a simular product from another vender, couldn't a rule be setup to prevent this? Anyway, there is not gonna be a way of preventing a systems programmer from doing anything we do. We are suppose to be thinking. For example, when I initialize, format or copy to a pack, I go thru, at least 3 checks to make sure I have not transpose the CUA. Saved me a lot of times. A system programmer IS dangerous. We can shutdown the system. We can destroy the system (and then go peacefully in retirement). You can't fix stupid and we are all, occassionaly, stupid. Now you had this kind of problem, we all should learn from it. After defining a new guest, log on to that guest and do a Q V ALL and see if it is right. Been there, done that. Tom Duerbusch THD Consulting Sent via BlackBerry by AT&T -Original Message- From: RPN01 Date: Wed, 16 Sep 2009 08:13:57 To: Subject: Re: VM lockup due to storage typo I don't think, in this case, it is the user causing the problem at all. The user didn't define their storage allocation, and in practice can't do that at all. So the user didn't set up the situation which caused the integrity issue, the system administrator did. The system administrator is in control of the CP Directory, and as such, decisions are left to him. The system doesn't question what he does, within the definition of the syntax, semantics and limitations of the directory entries and commands. If you want to define a large virtual machine, should the system question your authority? The system could check the memory and page space against each directory entry as the binary directory is built, but this would add time to the directory build, and does not account for the situation of planning to add more page space before logging in the new directory entry. Maybe a warning of "User exceeds paging space" could have averted this situation, but again, each user would have to be checked against the running system. It shouldn't keep you from creating the entry, just let you know that there might be an issue if you actually use it. To my mind, if this requires addressing, it should be in the DIRECTXA command, so as to help the system administrator in avoiding aiming the gun at his toes. -- Robert P. Nix Mayo Foundation.~. RO-OE-5-55 200 First Street SW/V\ 507-284-0844 Rochester, MN 55905 /( )\ -^^-^^ "In theory, theory and practice are the same, but in practice, theory and practice are different." On 9/15/09 3:44 PM, "Alan Altmark" wrote: > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > wrote: >> I agree with that ("the guest cannot be allowed to harm CP") but has > that >> actually been formally - or even informally - accepted by the Powers > That >> Be? > > Yes, it is in the Statement of System Integrity in the General Information > Manual. > >> I ask because I still remember, as though it were yesterday, opening a >> security/integrity APAR against VM back in the mid-1980's because any >> class G user could knock CP down by defining a shared and a nonshared >> device on the same virtual control unit, and being told that that was > NOT >> a security or integrity issue, and that no fix would be forthcoming. > > Under "today's" rules, that would be an Integrity problem. > > o If a class G (only) user can repeatedly or with malice of forethought > hang or abend CP, it WILL be classified as an integrity problem (denial of > service). > > o If a class G user happens to do something that triggers an abend or hang > due to a "system malfunction", it will NOT be classified as an integrity > problem. > > o If the system abends or hangs because it is overloaded (memory, CPU), it > will NOT be classified as an integrity problem. > > o Just because it isn't an integrity problem doesn't mean it isn't a > defect. > > Alan Altmark > z/VM Development > IBM Endicott
Re: VM lockup due to storage typo
This gun has been pointing in the same direction forever, but it *is* a fact that with 64-bit CP the bullets are a lot bigger. I am sure folks in Edicott are as creative as most of us (or worse, take a look at ... ;-) but we know that any safety that CP adds will annoy people because they forgot to disable it when they still had the option to do so, or because they drive with the safety off all day anyway (how many are not using highly privileged CP userid for things that don't need it - and really, it *is* dangerous) The problem with the suggested check is that it is stronger than what most people need. Also, the check is likely to be unfair (aiming at the wrong victim) and potentially cause a Denial of Service. Would you want MAINT unable to logon because that 5th Linux guest now logged on (and you could only add the page pack if you could logon...) So we need an option for some users to override it, or an option to enforce the check only for some users. One means that you may forget the option, and the other means that within weeks people will ask "why can't I logon my Linux guest" and the word will spread that you need to issue a SET SRM OVERCOMM . Linux has a similar check in that a process can't allocate more virtual memory than you have available (in main and on swap, or you get out-of-memory). This ensures that this process could eventually get all it asks for. But when it does not immediately reference that memory, it appears to be still available when the next process allocates memory. So the check is pretty useless and does not protect you at all. I don't do operational work these days, so feel on the peanut gallery. Maybe I grew up in a rather unique shop (or maybe staff reductions have gotten rid of that luxury there too) but we had pretty strict rules to minimize mistakes. Most configuration changes would be checked by another pair of eyes or some code. Configuration files to be replaced ran through XDIFF to inspect the changes. The nucleus map was scanned for text decks picked up from the A-disk, etc. Various health checks ran to compare RACF and the directory, check for certain disks filling up, and many more. With CMS Pipelines it is often easy to get an extra pair of eyes oversee your actions. Rob
Re: VM lockup due to storage typo
And you also have to check during DEFINE STORAGE, DEFINE FB-512, and any other command or function that creates a pagable CP structure. Brian Nielsen On Wed, 16 Sep 2009 09:03:43 -0500, Mike Walter wrote: >I can't support DIRECTXA as the sole examination. Paging volumes can be >added at any time. DIRECTXA only gets a change to look when it is run. > >If this even needs to be addressed (hence, this thoughtful thread), IMHO >comparing the min and max virtual machine memory specification would be >better done when the virtual machine is being built during >logon/autolog/xautolog. > >OTOH, it would not hurt to have DIRECTXA provide that early warning so >that when one finally does attempt to create the virtual machine, any >typos might already have been displayed and corrected when DIRECTXA >provided an early warning. It's just plain embarrassing for an existing >virtual machine to cause a problem because the sysprog made a wild (or >uninformed) keystroke while editing the directory source ... another >source of sysprog "collateral damage". > >Mike Walter >Hewitt Associates >The opinions expressed herein are mine alone, not my employer's. > > > >RPN01 > >Sent by: "The IBM z/VM Operating System" >09/16/2009 08:13 AM >Please respond to >"The IBM z/VM Operating System" > > > >To >IBMVM@LISTSERV.UARK.EDU >cc > >Subject >Re: VM lockup due to storage typo > > > > > > >I don't think, in this case, it is the user causing the problem at all. >The >user didn't define their storage allocation, and in practice can't do th at >at all. So the user didn't set up the situation which caused the integri ty >issue, the system administrator did. > >The system administrator is in control of the CP Directory, and as such, >decisions are left to him. The system doesn't question what he does, >within >the definition of the syntax, semantics and limitations of the directory >entries and commands. If you want to define a large virtual machine, >should >the system question your authority? > >The system could check the memory and page space against each directory >entry as the binary directory is built, but this would add time to the >directory build, and does not account for the situation of planning to a dd >more page space before logging in the new directory entry. Maybe a warni ng >of "User exceeds paging space" could have averted this situation, b ut >again, each user would have to be checked against the running system. It >shouldn't keep you from creating the entry, just let you know that there >might be an issue if you actually use it. > >To my mind, if this requires addressing, it should be in the DIRECTXA >command, so as to help the system administrator in avoiding aiming the g un >at his toes. > >-- >Robert P. Nix Mayo Foundation.~. >RO-OE-5-55 200 First Street SW/V\ >507-284-0844 Rochester, MN 55905 /( )\ >-^^-^^ >"In theory, theory and practice are the same, but > in practice, theory and practice are different." > > > > >On 9/15/09 3:44 PM, "Alan Altmark" wrote: > >> On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > >> wrote: >>> I agree with that ("the guest cannot be allowed to harm CP") but has >> that >>> actually been formally - or even informally - accepted by the Powers >> That >>> Be? >> >> Yes, it is in the Statement of System Integrity in the General >Information >> Manual. >> >>> I ask because I still remember, as though it were yesterday, opening a >>> security/integrity APAR against VM back in the mid-1980's because any >>> class G user could knock CP down by defining a shared and a nonshared >>> device on the same virtual control unit, and being told that that was >> NOT >>> a security or integrity issue, and that no fix would be forthcoming. >> >> Under "today's" rules, that would be an Integrity problem. >> >> o If a class G (only) user can repeatedly or with malice of forethough t >> hang or abend CP, it WILL be classified as an integrity problem (denia l >of >> service). >> >> o If a class G user happens to do something that triggers an abend or >hang >> due to a "system malfunction", it will NOT be classified as an integri ty >> problem. >> >> o If the system abends or hangs because it is overloaded (memory, CPU) , >it >> will NOT be classified as an integrity problem. >> >&g
Re: VM lockup due to storage typo
I can't support DIRECTXA as the sole examination. Paging volumes can be added at any time. DIRECTXA only gets a change to look when it is run. If this even needs to be addressed (hence, this thoughtful thread), IMHO comparing the min and max virtual machine memory specification would be better done when the virtual machine is being built during logon/autolog/xautolog. OTOH, it would not hurt to have DIRECTXA provide that early warning so that when one finally does attempt to create the virtual machine, any typos might already have been displayed and corrected when DIRECTXA provided an early warning. It's just plain embarrassing for an existing virtual machine to cause a problem because the sysprog made a wild (or uninformed) keystroke while editing the directory source ... another source of sysprog "collateral damage". Mike Walter Hewitt Associates The opinions expressed herein are mine alone, not my employer's. RPN01 Sent by: "The IBM z/VM Operating System" 09/16/2009 08:13 AM Please respond to "The IBM z/VM Operating System" To IBMVM@LISTSERV.UARK.EDU cc Subject Re: VM lockup due to storage typo I don't think, in this case, it is the user causing the problem at all. The user didn't define their storage allocation, and in practice can't do that at all. So the user didn't set up the situation which caused the integrity issue, the system administrator did. The system administrator is in control of the CP Directory, and as such, decisions are left to him. The system doesn't question what he does, within the definition of the syntax, semantics and limitations of the directory entries and commands. If you want to define a large virtual machine, should the system question your authority? The system could check the memory and page space against each directory entry as the binary directory is built, but this would add time to the directory build, and does not account for the situation of planning to add more page space before logging in the new directory entry. Maybe a warning of "User exceeds paging space" could have averted this situation, but again, each user would have to be checked against the running system. It shouldn't keep you from creating the entry, just let you know that there might be an issue if you actually use it. To my mind, if this requires addressing, it should be in the DIRECTXA command, so as to help the system administrator in avoiding aiming the gun at his toes. -- Robert P. Nix Mayo Foundation.~. RO-OE-5-55 200 First Street SW/V\ 507-284-0844 Rochester, MN 55905 /( )\ -^^-^^ "In theory, theory and practice are the same, but in practice, theory and practice are different." On 9/15/09 3:44 PM, "Alan Altmark" wrote: > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > wrote: >> I agree with that ("the guest cannot be allowed to harm CP") but has > that >> actually been formally - or even informally - accepted by the Powers > That >> Be? > > Yes, it is in the Statement of System Integrity in the General Information > Manual. > >> I ask because I still remember, as though it were yesterday, opening a >> security/integrity APAR against VM back in the mid-1980's because any >> class G user could knock CP down by defining a shared and a nonshared >> device on the same virtual control unit, and being told that that was > NOT >> a security or integrity issue, and that no fix would be forthcoming. > > Under "today's" rules, that would be an Integrity problem. > > o If a class G (only) user can repeatedly or with malice of forethought > hang or abend CP, it WILL be classified as an integrity problem (denial of > service). > > o If a class G user happens to do something that triggers an abend or hang > due to a "system malfunction", it will NOT be classified as an integrity > problem. > > o If the system abends or hangs because it is overloaded (memory, CPU), it > will NOT be classified as an integrity problem. > > o Just because it isn't an integrity problem doesn't mean it isn't a > defect. > > Alan Altmark > z/VM Development > IBM Endicott The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is
Re: VM lockup due to storage typo
I don't think, in this case, it is the user causing the problem at all. The user didn't define their storage allocation, and in practice can't do that at all. So the user didn't set up the situation which caused the integrity issue, the system administrator did. The system administrator is in control of the CP Directory, and as such, decisions are left to him. The system doesn't question what he does, within the definition of the syntax, semantics and limitations of the directory entries and commands. If you want to define a large virtual machine, should the system question your authority? The system could check the memory and page space against each directory entry as the binary directory is built, but this would add time to the directory build, and does not account for the situation of planning to add more page space before logging in the new directory entry. Maybe a warning of "User exceeds paging space" could have averted this situation, but again, each user would have to be checked against the running system. It shouldn't keep you from creating the entry, just let you know that there might be an issue if you actually use it. To my mind, if this requires addressing, it should be in the DIRECTXA command, so as to help the system administrator in avoiding aiming the gun at his toes. -- Robert P. Nix Mayo Foundation.~. RO-OE-5-55 200 First Street SW/V\ 507-284-0844 Rochester, MN 55905 /( )\ -^^-^^ "In theory, theory and practice are the same, but in practice, theory and practice are different." On 9/15/09 3:44 PM, "Alan Altmark" wrote: > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > wrote: >> I agree with that ("the guest cannot be allowed to harm CP") but has > that >> actually been formally - or even informally - accepted by the Powers > That >> Be? > > Yes, it is in the Statement of System Integrity in the General Information > Manual. > >> I ask because I still remember, as though it were yesterday, opening a >> security/integrity APAR against VM back in the mid-1980's because any >> class G user could knock CP down by defining a shared and a nonshared >> device on the same virtual control unit, and being told that that was > NOT >> a security or integrity issue, and that no fix would be forthcoming. > > Under "today's" rules, that would be an Integrity problem. > > o If a class G (only) user can repeatedly or with malice of forethought > hang or abend CP, it WILL be classified as an integrity problem (denial of > service). > > o If a class G user happens to do something that triggers an abend or hang > due to a "system malfunction", it will NOT be classified as an integrity > problem. > > o If the system abends or hangs because it is overloaded (memory, CPU), it > will NOT be classified as an integrity problem. > > o Just because it isn't an integrity problem doesn't mean it isn't a > defect. > > Alan Altmark > z/VM Development > IBM Endicott
Re: VM lockup due to storage typo
2009/9/15 Schuh, Richard > The same might be said for page space. Someone could access a dataspace > enabled directory and take up page space. We could easily take up 48G of > page space here by starting 24 machines that each access different d/s > directories at 2G each. Dataspace enabled directories are not paged out to paging space; the CP paging operations for it are issued against the minidisks of the SFS servers; neither are all dataspace pages brought in storage at the moment of ACCESS. The SFS dataspaces are called "mapped dataspaces". A small exception: the structures holding the FST blocks, they are not mapped to the SFS server minidisks, they can page paged out to CP space (and obviously CP's page management blocks occupy some storage too). DB2/VM at the other hand, it can also use non-mapped dataspaces. -- Kris Buelens, IBM Belgium, VM customer support
Re: VM lockup due to storage typo
On Tuesday, 09/15/2009 at 04:50 EDT, Marcy Cortes wrote: > So are you saying that what Lee and I both did to shoot our systems should > APAR'able? Or should it be a requirement? Or is it going to be a "your gun, > your foot" answer? I was just answering the "Is it an integrity problem?" question: No, it isn't an integrity problem. The sysadmin did something that ultimately caused the system to lock up. (That doesn't mean it was the sysadmin's fault, however.) If you feel you have found a defect, open a PMR. That's how you find out if something is really APARable. :-) Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
> > One of the problems with booting Linux is that it determines > the size of the virtual machine by testing pages rather than > ask CP about it. It only took TPF and its predecessors 35 years to get this right. :-) Way back in VM/370 R3 I had a diag that could be used. We did talk the ACP Systems folks at TWA into using the diag instead of touching Every page. We also had a mod in SVS to do the same (among other things). > If I remember right, it tries the first page of every > architectured segment. It could be worse. Earlier systems (OS/360, MVS line of systems, ACP, VM, etc.) touched every page. The touching was usually done by setting the storage key. Regards, Richard Schuh
Re: VM lockup due to storage typo
On Tue, Sep 15, 2009 at 11:18 PM, Robert J Brenneman wrote: > Admittedly - not 8TB in a 200G box, as Lee tried to do, and it was on > z/VM 5.1, so it didn't have the system execution space stuff of later > z/VM releases. It did teach the lesson that more page packs can only > get you so far. At some point the system data structures needed to > support the enormous guest just wont fit. This may be a reasonable > calculation to make within CP as a sanity check. If a factor of 2 does not make a difference, then try an order of magnitude. :-) One of the problems with booting Linux is that it determines the size of the virtual machine by testing pages rather than ask CP about it. If I remember right, it tries the first page of every architectured segment. And to make it worse, it uses a test that also forces CP to initialize the page frame. Which means that CP must also allocate a PGMBK to hold the page tables to span that segment. So for each MB of virtual machine storage, 3 pages must be allocated. When I get the math right, the 8 TB virtual machine will very quickly require 96 GB worth of page frames. That needs to come from somewhere... A decent paging subsystem can fill up a single 3390-3 in a minute or two. And although we tell people that you need to add one 3390-3 page pack for every GB of Linux server you define, there's still folks who think we talk nonsense because with the first few Linux guests their z/VM system did not page at all. But once you do start to page, page space utilization growth is not subtle. It's more like shifting your cup of coffee towards the edge of the table. Rob
Re: VM lockup due to storage typo
We all know that they are not M$ and we are glad they aren't. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Huegel, Thomas > Sent: Tuesday, September 15, 2009 2:18 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > I would think that IBM would be scurring to fix what is > obviously a problem. > After all they are not Microsoft... > > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard > Sent: Tuesday, September 15, 2009 4:13 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > Seems to me that he said it was either an integrity problem > or a defect. > I would think that either would me meat for the APAR grinder. > > Regards, > Richard Schuh > > > > > -Original Message- > > From: The IBM z/VM Operating System > > [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes > > Sent: Tuesday, September 15, 2009 1:50 PM > > To: IBMVM@LISTSERV.UARK.EDU > > Subject: Re: VM lockup due to storage typo > > > > So are you saying that what Lee and I both did to shoot our systems > > should APAR'able? Or should it be a requirement? Or is it > going to > > be a "your gun, your foot" answer? > > > > > > Marcy > > > > "This message may contain confidential and/or privileged > information. > > If you are not the addressee or authorized to receive this for the > > addressee, you must not use, copy, disclose, or take any > action based > > on this message or any information herein. If you have > received this > > message in error, please advise the sender immediately by > reply e-mail > > > and delete this message. Thank you for your cooperation." > > > > > > -Original Message- > > From: The IBM z/VM Operating System > > [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark > > Sent: Tuesday, September 15, 2009 1:45 PM > > To: IBMVM@LISTSERV.UARK.EDU > > Subject: Re: [IBMVM] VM lockup due to storage typo > > > > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > > > > wrote: > > > I agree with that ("the guest cannot be allowed to harm > CP") but has > > that > > > actually been formally - or even informally - accepted by > the Powers > > That > > > Be? > > > > Yes, it is in the Statement of System Integrity in the General > > Information Manual. > > > > > I ask because I still remember, as though it were > > yesterday, opening a > > > security/integrity APAR against VM back in the mid-1980's > > because any > > > class G user could knock CP down by defining a shared and a > > nonshared > > > device on the same virtual control unit, and being told > > that that was > > NOT > > > a security or integrity issue, and that no fix would be > forthcoming. > > > > Under "today's" rules, that would be an Integrity problem. > > > > o If a class G (only) user can repeatedly or with malice of > > forethought hang or abend CP, it WILL be classified as an integrity > > problem (denial of service). > > > > o If a class G user happens to do something that triggers > an abend or > > hang due to a "system malfunction", it will NOT be classified as an > > integrity problem. > > > > o If the system abends or hangs because it is overloaded (memory, > > CPU), it will NOT be classified as an integrity problem. > > > > o Just because it isn't an integrity problem doesn't mean > it isn't a > > defect. > > > > Alan Altmark > > z/VM Development > > IBM Endicott > > >
Re: VM lockup due to storage typo
I've tried wacky things like this before to see if I could run a 250G guest on an lpar with ~140GB of memory and oodles of page space, running z/VM 5.1 It came up, the guest initialized and Linux IPLed fine. It didn't have a problem till I started running a memory thrasher in the Linux guest. It sucked up all available memory and VM started paging, as you'd guess. It kept making progress till it had used about 20% of the paging space, but eventually VM itself started thrashing in its memory management routines. Like a %SY of 500 or so ( 5 processors running memory management stuff?? ) I'd guess that VM itself ran out of space below the 2G bar for page tables or something along that line. It never abended though - it just thrashed itself for days. Admittedly - not 8TB in a 200G box, as Lee tried to do, and it was on z/VM 5.1, so it didn't have the system execution space stuff of later z/VM releases. It did teach the lesson that more page packs can only get you so far. At some point the system data structures needed to support the enormous guest just wont fit. This may be a reasonable calculation to make within CP as a sanity check. -- Jay Brenneman
Re: VM lockup due to storage typo (OT)
Marcy, Did you get to attend any of those parties at the Malibu mansion? Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes > Sent: Tuesday, September 15, 2009 2:16 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > > >Gee, I guess we're in good company! ;-) > You betcha! (I'm in MN today, I can say that). > > At least mine was a test/dev system :) If had done it to a > prod system, I'm sure someone here would have had IBM > answering questions ... It's one of those things that fell > down low on the to pursue list - bigger fish frying. > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." >
Re: VM lockup due to storage typo
I would think that IBM would be scurring to fix what is obviously a problem. After all they are not Microsoft... -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Tuesday, September 15, 2009 4:13 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo Seems to me that he said it was either an integrity problem or a defect. I would think that either would me meat for the APAR grinder. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes > Sent: Tuesday, September 15, 2009 1:50 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > So are you saying that what Lee and I both did to shoot our systems > should APAR'able? Or should it be a requirement? Or is it going to > be a "your gun, your foot" answer? > > > Marcy > > "This message may contain confidential and/or privileged information. > If you are not the addressee or authorized to receive this for the > addressee, you must not use, copy, disclose, or take any action based > on this message or any information herein. If you have received this > message in error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark > Sent: Tuesday, September 15, 2009 1:45 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: [IBMVM] VM lockup due to storage typo > > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > > wrote: > > I agree with that ("the guest cannot be allowed to harm CP") but has > that > > actually been formally - or even informally - accepted by the Powers > That > > Be? > > Yes, it is in the Statement of System Integrity in the > General Information Manual. > > > I ask because I still remember, as though it were > yesterday, opening a > > security/integrity APAR against VM back in the mid-1980's > because any > > class G user could knock CP down by defining a shared and a > nonshared > > device on the same virtual control unit, and being told > that that was > NOT > > a security or integrity issue, and that no fix would be forthcoming. > > Under "today's" rules, that would be an Integrity problem. > > o If a class G (only) user can repeatedly or with malice of > forethought hang or abend CP, it WILL be classified as an > integrity problem (denial of service). > > o If a class G user happens to do something that triggers an > abend or hang due to a "system malfunction", it will NOT be > classified as an integrity problem. > > o If the system abends or hangs because it is overloaded > (memory, CPU), it will NOT be classified as an integrity problem. > > o Just because it isn't an integrity problem doesn't mean it > isn't a defect. > > Alan Altmark > z/VM Development > IBM Endicott >
Re: VM lockup due to storage typo
>Gee, I guess we're in good company! ;-) You betcha! (I'm in MN today, I can say that). At least mine was a test/dev system :) If had done it to a prod system, I'm sure someone here would have had IBM answering questions ... It's one of those things that fell down low on the to pursue list - bigger fish frying. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation."
Re: VM lockup due to storage typo
Seems to me that he said it was either an integrity problem or a defect. I would think that either would me meat for the APAR grinder. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Marcy Cortes > Sent: Tuesday, September 15, 2009 1:50 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > So are you saying that what Lee and I both did to shoot our > systems should APAR'able? Or should it be a requirement? Or > is it going to be a "your gun, your foot" answer? > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark > Sent: Tuesday, September 15, 2009 1:45 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: [IBMVM] VM lockup due to storage typo > > On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak > > wrote: > > I agree with that ("the guest cannot be allowed to harm CP") but has > that > > actually been formally - or even informally - accepted by the Powers > That > > Be? > > Yes, it is in the Statement of System Integrity in the > General Information Manual. > > > I ask because I still remember, as though it were > yesterday, opening a > > security/integrity APAR against VM back in the mid-1980's > because any > > class G user could knock CP down by defining a shared and a > nonshared > > device on the same virtual control unit, and being told > that that was > NOT > > a security or integrity issue, and that no fix would be forthcoming. > > Under "today's" rules, that would be an Integrity problem. > > o If a class G (only) user can repeatedly or with malice of > forethought hang or abend CP, it WILL be classified as an > integrity problem (denial of service). > > o If a class G user happens to do something that triggers an > abend or hang due to a "system malfunction", it will NOT be > classified as an integrity problem. > > o If the system abends or hangs because it is overloaded > (memory, CPU), it will NOT be classified as an integrity problem. > > o Just because it isn't an integrity problem doesn't mean it > isn't a defect. > > Alan Altmark > z/VM Development > IBM Endicott >
Re: VM lockup due to storage typo
First, since CP should know at all times how much space of each category (PAGE, SPOL, etc.) is allocated, it should be able to immediately reject any request (LOGON, DEFINE STOR, etc.) where the amount of storage requested exceeds the amount of secondary storage configured. Second, since CP "should" know at all times how much space of each category (PAGE, SPOL, etc.) is in use, it should be able to immediately reject any request (LOGON, DEFINE STOR, etc.) where the amount of storage requested exceeds the amount of secondary storage available. If this is not happening, I would argue that the situation should be APAR'able as a system integrity bug. Now, we can debate whether pages allocated, but not used, should be counted. Should such pages require secondary storage backing availability, or should secondary storage backing availability be required only when the page is used? Should this be a system configurable option? Should this be a virtual machine configurable option? John P. Baker -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart Sent: Tuesday, September 15, 2009 4:56 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo Gee, I guess we're in good company! ;-) It does seem to me that CP should be smart enough to look at a 175GB real storage, 4GB Xstor, and xx number of page packs and say not in our wildest dreams can we run an 8TB virtual guest... Or maybe at the point that the 8TB guest starts choking off all other activity and wildly filling page space Lee
Re: VM lockup due to storage typo
Gee, I guess we're in good company! ;-) It does seem to me that CP should be smart enough to look at a 175GB real storage, 4GB Xstor, and xx number of page packs and say not in our wildest dreams can we run an 8TB virtual guest... Or maybe at the point that the 8TB guest starts choking off all other activity and wildly filling page space Lee Marcy Cortes wrote: See a thread on this list with subject "Sanity check?" from Oct 2007 for what happened when I did the same thing ;) You probably filled page space. I still think IBM should refuse to IPL a guest that will cause such harm. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart Sent: Tuesday, September 15, 2009 8:39 AM To: IBMVM@LISTSERV.UARK.EDU Subject: [IBMVM] VM lockup due to storage typo Does anyone have an idea of how we might have gotten out of this without an IPL? VM LPAR has 175G of memory and a flock of Linux Oracle guests... Several guests needed more memory added so the directory was updated and one by one the guests shutdown, logged off and back on. So far, so good. But... In changing the memory for many guests, and it being late at night after a long day, while meaning to set a guest's memory to 9728M, it got set to 9728G. When that guest was cycled we see the message on the console that it's memory was limited to 8TB (HCPLGN093E), then the VM system appeared to freeze. We couldn't get in via TCP/IP, or the HMC Operating System Messages screen, or the HMC Integrated 3270. Finally had to IPL. Even that was wierd as I'd have expected the Load Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo and all came back up ok... I suspect CP was scrambling paging everything in the world out as Linux tried to initialize that 8TB of memory... But I'm surprised I couldn't even get into the HMC consoles (to kill just that one guest as opposed to all of them).. Any thoughts? Lee -- Lee Stewart, Senior SE Sirius Computer Solutions Phone: (303) 996-7122 Email: lee.stew...@siriuscom.com Web: www.siriuscom.com
Re: VM lockup due to storage typo
From the tn3270 sessions hanging to the phone call to me - 2-3 minutes. From then till we decided we had to IPL - maybe 15-20 minutes. But 30 minutes (maybe 45-60 till all the apps were back up) on a major online system is a lot. It was 35 minutes from the message capping the virtual storage at 8TB till the IPL time from Q CPLEVEL. So no, not long considering the size. And yes, I suspect it would PGT004 eventually. And yes, if CP unceremoniously chopped my wrong size from 9.7TB to 8TB, why could it not do the same to either a user specified system limit or a "this is the biggest machine this CP can run in this configuration"... Lee Gentry, Stephen wrote: What Lee doesn't mention is how long he waited before doing the IPL. Had he waited to see what happens maybe VM would have finally come around, so to speak. We all have different thresholds of pain. I think I would have done what Lee did, long day, not really wanting to wait around to see if VM recovers, just IPL. Lee did you have access to the HMC and thus the SAD screen to see what was going on? Sort of my last line of defense if I can't get logged in. Granted all it will tell you is if you have CPU or I/O utilization, but at least you have something to go to IBM with. Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then guest machine size is verified, if not available PAGE area and SPOOL size is checked (calculated) and if the guest exceeds that size then the quest doesn't start or a severe warning is issued. Steve -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Tuesday, September 15, 2009 12:59 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo Maybe CP couldn't know that the guest would do something bad, but it should know that it has opened itself to the possibility that the guest could, in normal operation, cause the problem. One of Alan's first precepts of information security and integrity is that the guest cannot be allowed to harm the CP. This clearly violates that. Regards, Richard Schuh -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch Sent: Tuesday, September 15, 2009 9:19 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo CP wouldn't know at IPL time, the guest would, not could, but would cause such harm. Just because you say you can use xxx GB, doesn't mean you would actually use them. When page fills, it over flows to spool. When spool fills, CP abends on the next pageout. Tom Duerbusch THD Consulting Marcy Cortes 9/15/2009 11:02 AM >>> See a thread on this list with subject "Sanity check?" from Oct 2007 for what happened when I did the same thing ;) You probably filled page space. I still think IBM should refuse to IPL a guest that will cause such harm. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart Sent: Tuesday, September 15, 2009 8:39 AM To: IBMVM@LISTSERV.UARK.EDU Subject: [IBMVM] VM lockup due to storage typo Does anyone have an idea of how we might have gotten out of this without an IPL? VM LPAR has 175G of memory and a flock of Linux Oracle guests... Several guests needed more memory added so the directory was updated and one by one the guests shutdown, logged off and back on. So far, so good. But... In changing the memory for many guests, and it being late at night after a long day, while meaning to set a guest's memory to 9728M, it got set to 9728G. When that guest was cycled we see the message on the console that it's memory was limited to 8TB (HCPLGN093E), then the VM system appeared to freeze. We couldn't get in via TCP/IP, or the HMC Operating System Messages screen, or the HMC Integrated 3270. Finally had to IPL. Even that was wierd as I'd have expected the Load Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo and all came back up ok... I suspect CP was scrambling paging everything in the world out as Linux tried to initialize that 8TB of memory... But I'm surprised I couldn't even get into the HMC consoles (to kill just that one guest as opposed to all of them).. Any thoughts? Lee -- Lee Stewart, Senior SE Sirius Computer Solutions Phone: (303) 996-7122 Email: lee.stew...@siriuscom.com Web: www.si
Re: VM lockup due to storage typo
So are you saying that what Lee and I both did to shoot our systems should APAR'able? Or should it be a requirement? Or is it going to be a "your gun, your foot" answer? Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Alan Altmark Sent: Tuesday, September 15, 2009 1:45 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: [IBMVM] VM lockup due to storage typo On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak wrote: > I agree with that ("the guest cannot be allowed to harm CP") but has that > actually been formally - or even informally - accepted by the Powers That > Be? Yes, it is in the Statement of System Integrity in the General Information Manual. > I ask because I still remember, as though it were yesterday, opening a > security/integrity APAR against VM back in the mid-1980's because any > class G user could knock CP down by defining a shared and a nonshared > device on the same virtual control unit, and being told that that was NOT > a security or integrity issue, and that no fix would be forthcoming. Under "today's" rules, that would be an Integrity problem. o If a class G (only) user can repeatedly or with malice of forethought hang or abend CP, it WILL be classified as an integrity problem (denial of service). o If a class G user happens to do something that triggers an abend or hang due to a "system malfunction", it will NOT be classified as an integrity problem. o If the system abends or hangs because it is overloaded (memory, CPU), it will NOT be classified as an integrity problem. o Just because it isn't an integrity problem doesn't mean it isn't a defect. Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
On Tuesday, 09/15/2009 at 03:27 EDT, Steve Marak wrote: > I agree with that ("the guest cannot be allowed to harm CP") but has that > actually been formally - or even informally - accepted by the Powers That > Be? Yes, it is in the Statement of System Integrity in the General Information Manual. > I ask because I still remember, as though it were yesterday, opening a > security/integrity APAR against VM back in the mid-1980's because any > class G user could knock CP down by defining a shared and a nonshared > device on the same virtual control unit, and being told that that was NOT > a security or integrity issue, and that no fix would be forthcoming. Under "today's" rules, that would be an Integrity problem. o If a class G (only) user can repeatedly or with malice of forethought hang or abend CP, it WILL be classified as an integrity problem (denial of service). o If a class G user happens to do something that triggers an abend or hang due to a "system malfunction", it will NOT be classified as an integrity problem. o If the system abends or hangs because it is overloaded (memory, CPU), it will NOT be classified as an integrity problem. o Just because it isn't an integrity problem doesn't mean it isn't a defect. Alan Altmark z/VM Development IBM Endicott
Re: VM lockup due to storage typo
The same might be said for page space. Someone could access a dataspace enabled directory and take up page space. We could easily take up 48G of page space here by starting 24 machines that each access different d/s directories at 2G each. And others could define storage from default to max. Then there are those pesky V-disk users - they could increase the load on page space. But I do agree that spool space should not enter into the equation when determining if there is enough page space. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Brian Nielsen > Sent: Tuesday, September 15, 2009 12:31 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > You can't include SPOOL space in the check for "is there > enough page spac= e to allow this guest" decision. SPOOL > space that was available earlier ma= y not be there later > when you "need" it as overflow PAGE space. Any guest = > > can fill up your SPOOL space at any time. > > Brian Nielsen > > > > On Tue, 15 Sep 2009 13:13:40 -0400, Gentry, Stephen > wrote: > > >What Lee doesn't mention is how long he waited before doing the IPL. > >Had he waited to see what happens maybe VM would have finally come > >around, so to speak. We all have different thresholds of > pain. I think > >I= > > >would have done what Lee did, long day, not really wanting to wait > >around to see if VM recovers, just IPL. Lee did you have > access to the > >HMC and thus the SAD screen to see what was going on? Sort > of my last > >line of defense if I can't get logged in. Granted all it > will tell you > >is if you have CPU or I/O utilization, but at least you have > something > >to go to IBM with. > >Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if > it's set then > >guest machine size is verified, if not available PAGE area and SPOOL > >size is checked (calculated) and if the guest exceeds that size then > >the= > > >quest doesn't start or a severe warning is issued. > >Steve > > > >-----Original Message- > >From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On > >Behalf Of Schuh, Richard > >Sent: Tuesday, September 15, 2009 12:59 PM > >To: IBMVM@LISTSERV.UARK.EDU > >Subject: Re: VM lockup due to storage typo > > > >Maybe CP couldn't know that the guest would do something bad, but it > >should know that it has opened itself to the possibility > that the guest > >could, in normal operation, cause the problem. > >One of Alan's first precepts of information security and > integrity is > >that the guest cannot be allowed to harm the CP. This > clearly violates > >that. > > > >Regards, > >Richard Schuh > > > > > > > >> -Original Message- > >> From: The IBM z/VM Operating System > >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > >> Sent: Tuesday, September 15, 2009 9:19 AM > >> To: IBMVM@LISTSERV.UARK.EDU > >> Subject: Re: VM lockup due to storage typo > >> > >> CP wouldn't know at IPL time, the guest would, not could, > but would > >> cause such harm. > >> > >> Just because you say you can use xxx GB, doesn't mean you would > >> actually use them. > >> > >> When page fills, it over flows to spool. > >> When spool fills, CP abends on the next pageout. > >> > >> Tom Duerbusch > >> THD Consulting > >> > >> >>> Marcy Cortes 9/15/2009 > >> 11:02 AM >>> > >> See a thread on this list with subject "Sanity check?" > from Oct 2007 > >> for what happened when I did the same thing ;) > >> > >> You probably filled page space. > >> > >> I still think IBM should refuse to IPL a guest that will > cause such > >> harm. > >> > >> > >> Marcy > >> > >> "This message may contain confidential and/or privileged > information. > >> If you are not the addressee or authorized to receive this for the > >> addressee, you must not use, copy, disclose, or take any > action based > >> on this message or any information herein. If you have > received this > >> message in error, please advise the sender immediately by reply > >> e-mail and delete this message. Thank you for your cooperation." > >&g
Re: VM lockup due to storage typo
Good point. When I have hit this, I got a PAGxxx type error and CP automatically reipl'ed. Like I said, when the offending user starts allocating pages, all the other machines will abend on a paging error when their recently used pages are tried to be paged out. Eventually, some of CP pagable pages will be the least recently used pages and BAM! PAGxxx CP abend. Automatic restart in progress... Tom Duerbusch THD Consulting >>> "Gentry, Stephen" 9/15/2009 12:13 PM >>> What Lee doesn't mention is how long he waited before doing the IPL. Had he waited to see what happens maybe VM would have finally come around, so to speak. We all have different thresholds of pain. I think I would have done what Lee did, long day, not really wanting to wait around to see if VM recovers, just IPL. Lee did you have access to the HMC and thus the SAD screen to see what was going on? Sort of my last line of defense if I can't get logged in. Granted all it will tell you is if you have CPU or I/O utilization, but at least you have something to go to IBM with. Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then guest machine size is verified, if not available PAGE area and SPOOL size is checked (calculated) and if the guest exceeds that size then the quest doesn't start or a severe warning is issued. Steve -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Tuesday, September 15, 2009 12:59 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo Maybe CP couldn't know that the guest would do something bad, but it should know that it has opened itself to the possibility that the guest could, in normal operation, cause the problem. One of Alan's first precepts of information security and integrity is that the guest cannot be allowed to harm the CP. This clearly violates that. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > Sent: Tuesday, September 15, 2009 9:19 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > CP wouldn't know at IPL time, the guest would, not could, but > would cause such harm. > > Just because you say you can use xxx GB, doesn't mean you > would actually use them. > > When page fills, it over flows to spool. > When spool fills, CP abends on the next pageout. > > Tom Duerbusch > THD Consulting > > >>> Marcy Cortes 9/15/2009 > 11:02 AM >>> > See a thread on this list with subject "Sanity check?" from > Oct 2007 for what happened when I did the same thing ;) > > You probably filled page space. > > I still think IBM should refuse to IPL a guest that will > cause such harm. > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart > Sent: Tuesday, September 15, 2009 8:39 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: [IBMVM] VM lockup due to storage typo > > Does anyone have an idea of how we might have gotten out of > this without an IPL? > > VM LPAR has 175G of memory and a flock of Linux Oracle guests... > Several guests needed more memory added so the directory was > updated and one by one the guests shutdown, logged off and > back on. So far, so good. > > But... In changing the memory for many guests, and it being > late at night after a long day, while meaning to set a > guest's memory to 9728M, it got set to 9728G. When that > guest was cycled we see the message on the console that it's > memory was limited to 8TB (HCPLGN093E), then the VM system > appeared to freeze. > > We couldn't get in via TCP/IP, or the HMC Operating System > Messages screen, or the HMC Integrated 3270. > > Finally had to IPL. Even that was wierd as I'd have > expected the Load > Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo > and all came back up ok... > > I suspect CP was scrambling paging everything in the world > out as Linux > tried to initialize that 8TB of memory... But I'm surprised > I couldn't > even get into the HMC consoles (to kill just that one guest > as opposed to all of them).. > > Any thoughts? > Lee > -- > > Lee Stewart, Senior SE > Sirius Computer Solutions > Phone: (303) 996-7122 > Email: lee.stew...@siriuscom.com > Web: www.siriuscom.com >
Re: VM lockup due to storage typo
You can't include SPOOL space in the check for "is there enough page spac e to allow this guest" decision. SPOOL space that was available earlier ma y not be there later when you "need" it as overflow PAGE space. Any guest can fill up your SPOOL space at any time. Brian Nielsen On Tue, 15 Sep 2009 13:13:40 -0400, Gentry, Stephen wrote: >What Lee doesn't mention is how long he waited before doing the IPL. >Had he waited to see what happens maybe VM would have finally come >around, so to speak. We all have different thresholds of pain. I think I >would have done what Lee did, long day, not really wanting to wait >around to see if VM recovers, just IPL. Lee did you have access to the >HMC and thus the SAD screen to see what was going on? Sort of my last >line of defense if I can't get logged in. Granted all it will tell you >is if you have CPU or I/O utilization, but at least you have something >to go to IBM with. >Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then >guest machine size is verified, if not available PAGE area and SPOOL >size is checked (calculated) and if the guest exceeds that size then the >quest doesn't start or a severe warning is issued. >Steve > >-Original Message- >From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On >Behalf Of Schuh, Richard >Sent: Tuesday, September 15, 2009 12:59 PM >To: IBMVM@LISTSERV.UARK.EDU >Subject: Re: VM lockup due to storage typo > >Maybe CP couldn't know that the guest would do something bad, but it >should know that it has opened itself to the possibility that the guest >could, in normal operation, cause the problem. >One of Alan's first precepts of information security and integrity is >that the guest cannot be allowed to harm the CP. This clearly violates >that. > >Regards, >Richard Schuh > > > >> -Original Message- >> From: The IBM z/VM Operating System >> [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch >> Sent: Tuesday, September 15, 2009 9:19 AM >> To: IBMVM@LISTSERV.UARK.EDU >> Subject: Re: VM lockup due to storage typo >> >> CP wouldn't know at IPL time, the guest would, not could, but >> would cause such harm. >> >> Just because you say you can use xxx GB, doesn't mean you >> would actually use them. >> >> When page fills, it over flows to spool. >> When spool fills, CP abends on the next pageout. >> >> Tom Duerbusch >> THD Consulting >> >> >>> Marcy Cortes 9/15/2009 >> 11:02 AM >>> >> See a thread on this list with subject "Sanity check?" from >> Oct 2007 for what happened when I did the same thing ;) >> >> You probably filled page space. >> >> I still think IBM should refuse to IPL a guest that will >> cause such harm. >> >> >> Marcy >> >> "This message may contain confidential and/or privileged >> information. If you are not the addressee or authorized to >> receive this for the addressee, you must not use, copy, >> disclose, or take any action based on this message or any >> information herein. If you have received this message in >> error, please advise the sender immediately by reply e-mail >> and delete this message. Thank you for your cooperation." >> >> >> -Original Message- >> From: The IBM z/VM Operating System >> [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart >> Sent: Tuesday, September 15, 2009 8:39 AM >> To: IBMVM@LISTSERV.UARK.EDU >> Subject: [IBMVM] VM lockup due to storage typo >> >> Does anyone have an idea of how we might have gotten out of >> this without an IPL? >> >> VM LPAR has 175G of memory and a flock of Linux Oracle guests... >> Several guests needed more memory added so the directory was >> updated and one by one the guests shutdown, logged off and >> back on. So far, so good. >> >> But... In changing the memory for many guests, and it being >> late at night after a long day, while meaning to set a >> guest's memory to 9728M, it got set to 9728G. When that >> guest was cycled we see the message on the console that it's >> memory was limited to 8TB (HCPLGN093E), then the VM system >> appeared to freeze. >> >> We couldn't get in via TCP/IP, or the HMC Operating System >> Messages screen, or the HMC Integrated 3270. >> >> Finally had to IPL. Even that was wierd as I'd have >> expected the Load >> Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo >> and all came back up ok... >> >> I suspect CP was scrambling paging everything in the world >> out as Linux >> tried to initialize that 8TB of memory... But I'm surprised >> I couldn't >> even get into the HMC consoles (to kill just that one guest >> as opposed to all of them).. >> >> Any thoughts? >> Lee >> -- >> >> Lee Stewart, Senior SE >> Sirius Computer Solutions >> Phone: (303) 996-7122 >> Email: lee.stew...@siriuscom.com >> Web: www.siriuscom.com >>
Re: VM lockup due to storage typo
I agree with that ("the guest cannot be allowed to harm CP") but has that actually been formally - or even informally - accepted by the Powers That Be? I ask because I still remember, as though it were yesterday, opening a security/integrity APAR against VM back in the mid-1980's because any class G user could knock CP down by defining a shared and a nonshared device on the same virtual control unit, and being told that that was NOT a security or integrity issue, and that no fix would be forthcoming. But at least I'm not bitter about it. Steve On Tue, 15 Sep 2009, Schuh, Richard wrote: > One of Alan's first precepts of information security and integrity is > that the guest cannot be allowed to harm the CP. This clearly violates > that. > > Regards, > Richard Schuh -- Steve Marak -- sama...@gizmoworks.com
Re: VM lockup due to storage typo
What Lee doesn't mention is how long he waited before doing the IPL. Had he waited to see what happens maybe VM would have finally come around, so to speak. We all have different thresholds of pain. I think I would have done what Lee did, long day, not really wanting to wait around to see if VM recovers, just IPL. Lee did you have access to the HMC and thus the SAD screen to see what was going on? Sort of my last line of defense if I can't get logged in. Granted all it will tell you is if you have CPU or I/O utilization, but at least you have something to go to IBM with. Maybe a SYSTEM CONFIG file option, like MAX_USER_SIZE, if it's set then guest machine size is verified, if not available PAGE area and SPOOL size is checked (calculated) and if the guest exceeds that size then the quest doesn't start or a severe warning is issued. Steve -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Schuh, Richard Sent: Tuesday, September 15, 2009 12:59 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: VM lockup due to storage typo Maybe CP couldn't know that the guest would do something bad, but it should know that it has opened itself to the possibility that the guest could, in normal operation, cause the problem. One of Alan's first precepts of information security and integrity is that the guest cannot be allowed to harm the CP. This clearly violates that. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > Sent: Tuesday, September 15, 2009 9:19 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > CP wouldn't know at IPL time, the guest would, not could, but > would cause such harm. > > Just because you say you can use xxx GB, doesn't mean you > would actually use them. > > When page fills, it over flows to spool. > When spool fills, CP abends on the next pageout. > > Tom Duerbusch > THD Consulting > > >>> Marcy Cortes 9/15/2009 > 11:02 AM >>> > See a thread on this list with subject "Sanity check?" from > Oct 2007 for what happened when I did the same thing ;) > > You probably filled page space. > > I still think IBM should refuse to IPL a guest that will > cause such harm. > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -Original Message----- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart > Sent: Tuesday, September 15, 2009 8:39 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: [IBMVM] VM lockup due to storage typo > > Does anyone have an idea of how we might have gotten out of > this without an IPL? > > VM LPAR has 175G of memory and a flock of Linux Oracle guests... > Several guests needed more memory added so the directory was > updated and one by one the guests shutdown, logged off and > back on. So far, so good. > > But... In changing the memory for many guests, and it being > late at night after a long day, while meaning to set a > guest's memory to 9728M, it got set to 9728G. When that > guest was cycled we see the message on the console that it's > memory was limited to 8TB (HCPLGN093E), then the VM system > appeared to freeze. > > We couldn't get in via TCP/IP, or the HMC Operating System > Messages screen, or the HMC Integrated 3270. > > Finally had to IPL. Even that was wierd as I'd have > expected the Load > Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo > and all came back up ok... > > I suspect CP was scrambling paging everything in the world > out as Linux > tried to initialize that 8TB of memory... But I'm surprised > I couldn't > even get into the HMC consoles (to kill just that one guest > as opposed to all of them).. > > Any thoughts? > Lee > -- > > Lee Stewart, Senior SE > Sirius Computer Solutions > Phone: (303) 996-7122 > Email: lee.stew...@siriuscom.com > Web: www.siriuscom.com >
Re: VM lockup due to storage typo
CMS, being a 32-bit system, will probably never use 3TB of memory. Perhaps z/CMS, when it becomes a reality, might but the current CMS is another story. Regards, Richard Schuh > CMS u= ses all of its storage over > time. Both will use all of their storage eventual= ly. >
Re: VM lockup due to storage typo
CMS will free its storage after the command is complete. However, do a peek on a very large reader element, such as a OS dump, and CMS just might use up all of its storage, just like any other guest might. It isn't a matter of time, it is a matter of usage. Tom Duerbusch THD Consulting >>> Thomas Kern 9/15/2009 12:48 PM >>> The difference between CMS and Linux in this case is just a matter of time before problems occur. Linux wants to use all of its storage early, CMS uses all of its storage over time. Both will use all of their storage eventually. CP is built to overcommit storage. It just lets you REALLY overcommit storage. But it would be nice if there was some sort of sanity check in there somewhere. /Tom Kern On Tue, 15 Sep 2009 13:12:38 -0400, Bruce Hayden wrote: >The problem isn't that you did an IPL, it is that you IPLed Linux. An >IPL of CMS in an 8 TB machine doesn't have any delay or cause a >problem: > >def stor 8t >STORAGE = 8T >Storage cleared - system reset. >i cms >z/VM V5.4.02009-07-13 11:58 > >Ready; T=0.01/0.01 13:06:21 >q v stor >STORAGE = 8T >Ready; T=0.01/0.01 13:06:26 >q stor >STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED = 0 >Ready; T=0.01/0.01 13:06:57 > >An IPL of ZCMS blows up, though. Maybe they didn't test it with that >large storage. > >On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes > wrote: >> See a thread on this list with subject "Sanity check?" from Oct 2007 for what happened when I did the same thing ;) >> >> You probably filled page space. >> >> I still think IBM should refuse to IPL a guest that will cause such harm. >> >> >> Marcy >> > > >-- >Bruce Hayden >Linux on System z Advanced Technical Support >IBM, Endicott, NY
Re: VM lockup due to storage typo
The difference between CMS and Linux in this case is just a matter of tim e before problems occur. Linux wants to use all of its storage early, CMS u ses all of its storage over time. Both will use all of their storage eventual ly. CP is built to overcommit storage. It just lets you REALLY overcommit storage. But it would be nice if there was some sort of sanity check in there somewhere. /Tom Kern On Tue, 15 Sep 2009 13:12:38 -0400, Bruce Hayden wro te: >The problem isn't that you did an IPL, it is that you IPLed Linux. An >IPL of CMS in an 8 TB machine doesn't have any delay or cause a >problem: > >def stor 8t >STORAGE = 8T >Storage cleared - system reset. >i cms >z/VM V5.4.02009-07-13 11:58 > >Ready; T=0.01/0.01 13:06:21 >q v stor >STORAGE = 8T >Ready; T=0.01/0.01 13:06:26 >q stor >STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED = 0 >Ready; T=0.01/0.01 13:06:57 > >An IPL of ZCMS blows up, though. Maybe they didn't test it with that >large storage. > >On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes > wrote: >> See a thread on this list with subject "Sanity check?" from Oct 2007 f or what happened when I did the same thing ;) >> >> You probably filled page space. >> >> I still think IBM should refuse to IPL a guest that will cause such ha rm. >> >> >> Marcy >> > > >-- >Bruce Hayden >Linux on System z Advanced Technical Support >IBM, Endicott, NY
Re: VM lockup due to storage typo
The problem isn't that you did an IPL, it is that you IPLed Linux. An IPL of CMS in an 8 TB machine doesn't have any delay or cause a problem: def stor 8t STORAGE = 8T Storage cleared - system reset. i cms z/VM V5.4.02009-07-13 11:58 Ready; T=0.01/0.01 13:06:21 q v stor STORAGE = 8T Ready; T=0.01/0.01 13:06:26 q stor STORAGE = 4G CONFIGURED = 4G INC = 128M STANDBY = 8G RESERVED = 0 Ready; T=0.01/0.01 13:06:57 An IPL of ZCMS blows up, though. Maybe they didn't test it with that large storage. On Tue, Sep 15, 2009 at 12:02 PM, Marcy Cortes wrote: > See a thread on this list with subject "Sanity check?" from Oct 2007 for what > happened when I did the same thing ;) > > You probably filled page space. > > I still think IBM should refuse to IPL a guest that will cause such harm. > > > Marcy > -- Bruce Hayden Linux on System z Advanced Technical Support IBM, Endicott, NY
Re: VM lockup due to storage typo
Thinking about this a little futher How could 1 error cause this? In the user direct, the user statement has: USER LINUX27 xx 32M 600M G There are two memory related parms. The one your guest machine is built with, in this case 32 MB. The other is the maximum memory size for your guest, in this case 600 MB. With either the initial size, or the dynamically defined size via a DEF STOR command, you can't exceed the maximum size. So to define 8 TB of storage, you have to change the max size to be something very large. And then define the machine to use that size. So it seems to me that there are two mistakes. You told CP you might want a very large size, and when you finally asked for it, it obeyed. That isn't a CP error. The same problem occurs when you tell CP that you are ok with TB sized vdisks. And then you define one. And then use it up . Of course, anything that can cause CP to crash isn't a good thing. Perhaps we need a dedicated paging area for CP, i.e. something like the DUMP area for CP dumps, instead of using SPOL. The guest machines are still going to crash, and the offending machine will be the last of many machines to bite the dust. But, CP would survive. It might be easier to IPL to get everything back running again. Tom Duerbusch THD Consulting >>> "Schuh, Richard" 9/15/2009 11:59 AM >>> Maybe CP couldn't know that the guest would do something bad, but it should know that it has opened itself to the possibility that the guest could, in normal operation, cause the problem. One of Alan's first precepts of information security and integrity is that the guest cannot be allowed to harm the CP. This clearly violates that. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > Sent: Tuesday, September 15, 2009 9:19 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > CP wouldn't know at IPL time, the guest would, not could, but > would cause such harm. > > Just because you say you can use xxx GB, doesn't mean you > would actually use them. > > When page fills, it over flows to spool. > When spool fills, CP abends on the next pageout. > > Tom Duerbusch > THD Consulting > > >>> Marcy Cortes 9/15/2009 > 11:02 AM >>> > See a thread on this list with subject "Sanity check?" from > Oct 2007 for what happened when I did the same thing ;) > > You probably filled page space. > > I still think IBM should refuse to IPL a guest that will > cause such harm. > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -----Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart > Sent: Tuesday, September 15, 2009 8:39 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: [IBMVM] VM lockup due to storage typo > > Does anyone have an idea of how we might have gotten out of > this without an IPL? > > VM LPAR has 175G of memory and a flock of Linux Oracle guests... > Several guests needed more memory added so the directory was > updated and one by one the guests shutdown, logged off and > back on. So far, so good. > > But... In changing the memory for many guests, and it being > late at night after a long day, while meaning to set a > guest's memory to 9728M, it got set to 9728G. When that > guest was cycled we see the message on the console that it's > memory was limited to 8TB (HCPLGN093E), then the VM system > appeared to freeze. > > We couldn't get in via TCP/IP, or the HMC Operating System > Messages screen, or the HMC Integrated 3270. > > Finally had to IPL. Even that was wierd as I'd have > expected the Load > Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo > and all came back up ok... > > I suspect CP was scrambling paging everything in the world > out as Linux > tried to initialize that 8TB of memory... But I'm surprised > I couldn't > even get into the HMC consoles (to kill just that one guest > as opposed to all of them).. > > Any thoughts? > Lee > -- > > Lee Stewart, Senior SE > Sirius Computer Solutions > Phone: (303) 996-7122 > Email: lee.stew...@siriuscom.com > Web: www.siriuscom.com >
Re: VM lockup due to storage typo
Maybe CP couldn't know that the guest would do something bad, but it should know that it has opened itself to the possibility that the guest could, in normal operation, cause the problem. One of Alan's first precepts of information security and integrity is that the guest cannot be allowed to harm the CP. This clearly violates that. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Tom Duerbusch > Sent: Tuesday, September 15, 2009 9:19 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > CP wouldn't know at IPL time, the guest would, not could, but > would cause such harm. > > Just because you say you can use xxx GB, doesn't mean you > would actually use them. > > When page fills, it over flows to spool. > When spool fills, CP abends on the next pageout. > > Tom Duerbusch > THD Consulting > > >>> Marcy Cortes 9/15/2009 > 11:02 AM >>> > See a thread on this list with subject "Sanity check?" from > Oct 2007 for what happened when I did the same thing ;) > > You probably filled page space. > > I still think IBM should refuse to IPL a guest that will > cause such harm. > > > Marcy > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart > Sent: Tuesday, September 15, 2009 8:39 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: [IBMVM] VM lockup due to storage typo > > Does anyone have an idea of how we might have gotten out of > this without an IPL? > > VM LPAR has 175G of memory and a flock of Linux Oracle guests... > Several guests needed more memory added so the directory was > updated and one by one the guests shutdown, logged off and > back on. So far, so good. > > But... In changing the memory for many guests, and it being > late at night after a long day, while meaning to set a > guest's memory to 9728M, it got set to 9728G. When that > guest was cycled we see the message on the console that it's > memory was limited to 8TB (HCPLGN093E), then the VM system > appeared to freeze. > > We couldn't get in via TCP/IP, or the HMC Operating System > Messages screen, or the HMC Integrated 3270. > > Finally had to IPL. Even that was wierd as I'd have > expected the Load > Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo > and all came back up ok... > > I suspect CP was scrambling paging everything in the world > out as Linux > tried to initialize that 8TB of memory... But I'm surprised > I couldn't > even get into the HMC consoles (to kill just that one guest > as opposed to all of them).. > > Any thoughts? > Lee > -- > > Lee Stewart, Senior SE > Sirius Computer Solutions > Phone: (303) 996-7122 > Email: lee.stew...@siriuscom.com > Web: www.siriuscom.com >
Re: VM lockup due to storage typo
This should be treated as a bug. It is not an enhancement or new feature, it brought a running system down. And it probably did not take a dump. Regards, Richard Schuh > -Original Message- > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] On Behalf Of Daniel P. Martin > Sent: Tuesday, September 15, 2009 9:09 AM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: VM lockup due to storage typo > > *cough*SHARE requirement?*cough* > > Marcy Cortes wrote: > > See a thread on this list with subject "Sanity check?" from > Oct 2007 > > for what happened when I did the same thing ;) > > > > You probably filled page space. > > > > I still think IBM should refuse to IPL a guest that will > cause such harm. > > > > > > Marcy > > > > "This message may contain confidential and/or privileged > information. If you are not the addressee or authorized to > receive this for the addressee, you must not use, copy, > disclose, or take any action based on this message or any > information herein. If you have received this message in > error, please advise the sender immediately by reply e-mail > and delete this message. Thank you for your cooperation." > > > > > > -Original Message- > > From: The IBM z/VM Operating System > [mailto:ib...@listserv.uark.edu] > > On Behalf Of Lee Stewart > > Sent: Tuesday, September 15, 2009 8:39 AM > > To: IBMVM@LISTSERV.UARK.EDU > > Subject: [IBMVM] VM lockup due to storage typo > > > > Does anyone have an idea of how we might have gotten out of this > > without an IPL? > > > > VM LPAR has 175G of memory and a flock of Linux Oracle guests... > > Several guests needed more memory added so the directory > was updated > > and one by one the guests shutdown, logged off and back on. > So far, so good. > > > > But... In changing the memory for many guests, and it being late at > > night after a long day, while meaning to set a guest's memory to > > 9728M, it got set to 9728G. When that guest was cycled we see the > > message on the console that it's memory was limited to 8TB > > (HCPLGN093E), then the VM system appeared to freeze. > > > > We couldn't get in via TCP/IP, or the HMC Operating System Messages > > screen, or the HMC Integrated 3270. > > > > Finally had to IPL. Even that was wierd as I'd have > expected the Load > > Normal to shutdown, it just IPLed. We did NoAutolog, > fixed the typo > > and all came back up ok... > > > > I suspect CP was scrambling paging everything in the world > out as Linux > > tried to initialize that 8TB of memory... But I'm > surprised I couldn't > > even get into the HMC consoles (to kill just that one guest > as opposed > > to all of them).. > > > > Any thoughts? > > Lee > > >
Re: VM lockup due to storage typo
*cough*SHARE requirement?*cough* Marcy Cortes wrote: See a thread on this list with subject "Sanity check?" from Oct 2007 for what happened when I did the same thing ;) You probably filled page space. I still think IBM should refuse to IPL a guest that will cause such harm. Marcy "This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation." -Original Message- From: The IBM z/VM Operating System [mailto:ib...@listserv.uark.edu] On Behalf Of Lee Stewart Sent: Tuesday, September 15, 2009 8:39 AM To: IBMVM@LISTSERV.UARK.EDU Subject: [IBMVM] VM lockup due to storage typo Does anyone have an idea of how we might have gotten out of this without an IPL? VM LPAR has 175G of memory and a flock of Linux Oracle guests... Several guests needed more memory added so the directory was updated and one by one the guests shutdown, logged off and back on. So far, so good. But... In changing the memory for many guests, and it being late at night after a long day, while meaning to set a guest's memory to 9728M, it got set to 9728G. When that guest was cycled we see the message on the console that it's memory was limited to 8TB (HCPLGN093E), then the VM system appeared to freeze. We couldn't get in via TCP/IP, or the HMC Operating System Messages screen, or the HMC Integrated 3270. Finally had to IPL. Even that was wierd as I'd have expected the Load Normal to shutdown, it just IPLed. We did NoAutolog, fixed the typo and all came back up ok... I suspect CP was scrambling paging everything in the world out as Linux tried to initialize that 8TB of memory... But I'm surprised I couldn't even get into the HMC consoles (to kill just that one guest as opposed to all of them).. Any thoughts? Lee