Re: zVM crash update
Randy, Thank you for the update munson From: "Burton, Randy" To: IBMVM@LISTSERV.UARK.EDU Date: 06/15/2011 09:52 AM Subject: zVM crash update Sent by:The IBM z/VM Operating System The LPAR had been up and running for weeks, minding its own business, and chugging right along. Then, 3:30 PM Monday, kaboom, disabled wait. IBM has determined we had a tight loop condition that triggered the processor being taken offline. We did get a MCW002 abend and dump. Dump analysis should lead IBM and us to fixing the loop and thus stopping it from happening again. Best theory so far is that zVM tried to restart following the MCW002 abend, couldn't find a console, thus the 1010 disabled wait. Thanks for all the suggestions! -Original Message- From: Burton, Randy Sent: Tuesday, June 14, 2011 9:47 AM To: 'IBMVM@LISTSERV.UARK.EDU' Subject: zVM crash I'm curious if this error rings a bell with any of you. We of course have an ETR open and are working with IBM. No hardware errors on the HMC, so we believe this was software and not hardware. Here's the last operator log message before the LPAR went into a disabled wait: HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT RESPONSIVE. Disabled wait PSW was: 00021010 HMC message was: Central processor (CP) 0 in partition VMD1, entered disabled wait state. Fortunately this was our development (test) zVM system, running a bunch of test zLinux guests. We're running zVM 6.1 on a z10. Of course we are nervous because what happens in test can happen in production. We IPLed and so far so good. Thanks in advance for any help/suggestions! Randy Burton BB&T Bank *** IMPORTANT NOTE*-- The opinions expressed in this message and/or any attachments are those of the author and not necessarily those of Brown Brothers Harriman & Co., its subsidiaries and affiliates ("BBH"). There is no guarantee that this message is either private or confidential, and it may have been altered by unauthorized sources without your or our knowledge. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice. BBH accepts no responsibility for loss or damage from its use, including damage from virus.
zVM crash update
The LPAR had been up and running for weeks, minding its own business, and chugging right along. Then, 3:30 PM Monday, kaboom, disabled wait. IBM has determined we had a tight loop condition that triggered the processor being taken offline. We did get a MCW002 abend and dump. Dump analysis should lead IBM and us to fixing the loop and thus stopping it from happening again. Best theory so far is that zVM tried to restart following the MCW002 abend, couldn't find a console, thus the 1010 disabled wait. Thanks for all the suggestions! -Original Message- From: Burton, Randy Sent: Tuesday, June 14, 2011 9:47 AM To: 'IBMVM@LISTSERV.UARK.EDU' Subject: zVM crash I'm curious if this error rings a bell with any of you. We of course have an ETR open and are working with IBM. No hardware errors on the HMC, so we believe this was software and not hardware. Here's the last operator log message before the LPAR went into a disabled wait: HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT RESPONSIVE. Disabled wait PSW was: 00021010 HMC message was: Central processor (CP) 0 in partition VMD1, entered disabled wait state. Fortunately this was our development (test) zVM system, running a bunch of test zLinux guests. We're running zVM 6.1 on a z10. Of course we are nervous because what happens in test can happen in production. We IPLed and so far so good. Thanks in advance for any help/suggestions! Randy Burton BB&T Bank
Re: zVM crash
Do you have a mcw002 on operatns reader? If so you got a mcw002 abend, then VM restarted after the abend, and the console was not there. Regards From: "Burton, Randy" To: IBMVM@LISTSERV.UARK.EDU Date: 06/14/2011 06:49 AM Subject: zVM crash Sent by:The IBM z/VM Operating System I'm curious if this error rings a bell with any of you. We of course have an ETR open and are working with IBM. No hardware errors on the HMC, so we believe this was software and not hardware. Here's the last operator log message before the LPAR went into a disabled wait: HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT RESPONSIVE. Disabled wait PSW was: 00021010 HMC message was: Central processor (CP) 0 in partition VMD1, entered disabled wait state. Fortunately this was our development (test) zVM system, running a bunch of test zLinux guests. We're running zVM 6.1 on a z10. Of course we are nervous because what happens in test can happen in production. We IPLed and so far so good. Thanks in advance for any help/suggestions! Randy Burton BB&T Bank
Re: zVM crash
When I see a Disabled Wait PSW of 1010 - I think of no console available. did you try and vary on the processor after the 9152E message ? Bill Munson Sr. z/VM Systems Programmer Brown Brothers Harriman & CO. 525 Washington Blvd. Jersey City, NJ 07310 201-418-7588 From: "Burton, Randy" To: IBMVM@LISTSERV.UARK.EDU Date: 06/14/2011 09:46 AM Subject: zVM crash Sent by:The IBM z/VM Operating System I'm curious if this error rings a bell with any of you. We of course have an ETR open and are working with IBM. No hardware errors on the HMC, so we believe this was software and not hardware. Here's the last operator log message before the LPAR went into a disabled wait: HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT RESPONSIVE. Disabled wait PSW was: 00021010 HMC message was: Central processor (CP) 0 in partition VMD1, entered disabled wait state. Fortunately this was our development (test) zVM system, running a bunch of test zLinux guests. We're running zVM 6.1 on a z10. Of course we are nervous because what happens in test can happen in production. We IPLed and so far so good. Thanks in advance for any help/suggestions! Randy Burton BB&T Bank *** IMPORTANT NOTE*-- The opinions expressed in this message and/or any attachments are those of the author and not necessarily those of Brown Brothers Harriman & Co., its subsidiaries and affiliates ("BBH"). There is no guarantee that this message is either private or confidential, and it may have been altered by unauthorized sources without your or our knowledge. Nothing in the message is capable or intended to create any legally binding obligations on either party and it is not intended to provide legal advice. BBH accepts no responsibility for loss or damage from its use, including damage from virus.
Re: zVM crash
That's odd .. wait 1010 means 'no console available'.. any chance someone was deactivating your lpar or trying to reload it? Scott Rohling On Tue, Jun 14, 2011 at 7:46 AM, Burton, Randy wrote: > I'm curious if this error rings a bell with any of you. We of course > have an ETR open and are working with IBM. No hardware errors on the > HMC, so we believe this was software and not hardware. Here's the last > operator log message before the LPAR went into a disabled wait: > > HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT > RESPONSIVE. > > Disabled wait PSW was: > 00021010 > > HMC message was: > Central processor (CP) 0 in partition VMD1, entered disabled wait state. > > Fortunately this was our development (test) zVM system, running a bunch > of test zLinux guests. We're running zVM 6.1 on a z10. Of course we > are nervous because what happens in test can happen in production. We > IPLed and so far so good. > > Thanks in advance for any help/suggestions! > > Randy Burton > BB&T Bank >
Re: zVM crash
Were you IPLing the system at the time. 1010 usually means it could not find the console. Larry Davis -Original Message- From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On Behalf Of Burton, Randy Sent: Tuesday, June 14, 2011 9:47 AM To: IBMVM@LISTSERV.UARK.EDU Subject: zVM crash I'm curious if this error rings a bell with any of you. We of course have an ETR open and are working with IBM. No hardware errors on the HMC, so we believe this was software and not hardware. Here's the last operator log message before the LPAR went into a disabled wait: HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT RESPONSIVE. Disabled wait PSW was: 00021010 HMC message was: Central processor (CP) 0 in partition VMD1, entered disabled wait state. Fortunately this was our development (test) zVM system, running a bunch of test zLinux guests. We're running zVM 6.1 on a z10. Of course we are nervous because what happens in test can happen in production. We IPLed and so far so good. Thanks in advance for any help/suggestions! Randy Burton BB&T Bank
zVM crash
I'm curious if this error rings a bell with any of you. We of course have an ETR open and are working with IBM. No hardware errors on the HMC, so we believe this was software and not hardware. Here's the last operator log message before the LPAR went into a disabled wait: HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT RESPONSIVE. Disabled wait PSW was: 00021010 HMC message was: Central processor (CP) 0 in partition VMD1, entered disabled wait state. Fortunately this was our development (test) zVM system, running a bunch of test zLinux guests. We're running zVM 6.1 on a z10. Of course we are nervous because what happens in test can happen in production. We IPLed and so far so good. Thanks in advance for any help/suggestions! Randy Burton BB&T Bank