Re: zVM crash update

2011-06-15 Thread Bill Munson
Randy,

Thank you for the update 

munson





From:   "Burton, Randy" 
To: IBMVM@LISTSERV.UARK.EDU
Date:   06/15/2011 09:52 AM
Subject:zVM crash update
Sent by:The IBM z/VM Operating System 



The LPAR had been up and running for weeks, minding its own business,
and chugging right along.  Then, 3:30 PM Monday, kaboom, disabled wait.


IBM has determined we had a tight loop condition that triggered the
processor being taken offline.  We did get a MCW002 abend and dump.
Dump analysis should lead IBM and us to fixing the loop and thus
stopping it from happening again. 

Best theory so far is that zVM tried to restart following the MCW002
abend, couldn't find a console, thus the 1010 disabled wait.

Thanks for all the suggestions!



-Original Message-
From: Burton, Randy 
Sent: Tuesday, June 14, 2011 9:47 AM
To: 'IBMVM@LISTSERV.UARK.EDU'
Subject: zVM crash

I'm curious if this error rings a bell with any of you.  We of course
have an ETR open and are working with IBM.  No hardware errors on the
HMC, so we believe this was software and not hardware.  Here's the last
operator log message before the LPAR went into a disabled wait:

HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT
RESPONSIVE. 

Disabled wait PSW was:
00021010

HMC message was:
Central processor (CP) 0 in partition VMD1, entered disabled wait state.

Fortunately this was our development (test) zVM system, running a bunch
of test zLinux guests.  We're running zVM 6.1 on a z10.  Of course we
are nervous because what happens in test can happen in production.  We
IPLed and so far so good.

Thanks in advance for any help/suggestions!

Randy Burton
BB&T Bank



*** IMPORTANT
NOTE*-- The opinions expressed in this
message and/or any attachments are those of the author and not
necessarily those of Brown Brothers Harriman & Co., its
subsidiaries and affiliates ("BBH"). There is no guarantee that
this message is either private or confidential, and it may have
been altered by unauthorized sources without your or our knowledge.
Nothing in the message is capable or intended to create any legally
binding obligations on either party and it is not intended to
provide legal advice. BBH accepts no responsibility for loss or
damage from its use, including damage from virus.


Re: zVM crash

2011-06-14 Thread Michel Raicher
Do you have a mcw002 on operatns reader?
If so you got a mcw002 abend, then VM restarted after the abend, and the 
console was not there.
Regards



From:   "Burton, Randy" 
To: IBMVM@LISTSERV.UARK.EDU
Date:   06/14/2011 06:49 AM
Subject:zVM crash
Sent by:The IBM z/VM Operating System 



I'm curious if this error rings a bell with any of you.  We of course
have an ETR open and are working with IBM.  No hardware errors on the
HMC, so we believe this was software and not hardware.  Here's the last
operator log message before the LPAR went into a disabled wait:

HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT
RESPONSIVE. 

Disabled wait PSW was:
00021010

HMC message was:
Central processor (CP) 0 in partition VMD1, entered disabled wait state.

Fortunately this was our development (test) zVM system, running a bunch
of test zLinux guests.  We're running zVM 6.1 on a z10.  Of course we
are nervous because what happens in test can happen in production.  We
IPLed and so far so good.

Thanks in advance for any help/suggestions!

Randy Burton
BB&T Bank



Re: zVM crash

2011-06-14 Thread Bill Munson
When I see a Disabled Wait PSW of 1010 - I think of no console available.

did you try and vary on the processor after the 9152E message ?

Bill Munson 
Sr. z/VM Systems Programmer 
Brown Brothers Harriman & CO.
525 Washington Blvd. 
Jersey City, NJ 07310 
201-418-7588






From:   "Burton, Randy" 
To: IBMVM@LISTSERV.UARK.EDU
Date:   06/14/2011 09:46 AM
Subject:zVM crash
Sent by:The IBM z/VM Operating System 



I'm curious if this error rings a bell with any of you.  We of course
have an ETR open and are working with IBM.  No hardware errors on the
HMC, so we believe this was software and not hardware.  Here's the last
operator log message before the LPAR went into a disabled wait:

HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT
RESPONSIVE. 

Disabled wait PSW was:
00021010

HMC message was:
Central processor (CP) 0 in partition VMD1, entered disabled wait state.

Fortunately this was our development (test) zVM system, running a bunch
of test zLinux guests.  We're running zVM 6.1 on a z10.  Of course we
are nervous because what happens in test can happen in production.  We
IPLed and so far so good.

Thanks in advance for any help/suggestions!

Randy Burton
BB&T Bank



*** IMPORTANT
NOTE*-- The opinions expressed in this
message and/or any attachments are those of the author and not
necessarily those of Brown Brothers Harriman & Co., its
subsidiaries and affiliates ("BBH"). There is no guarantee that
this message is either private or confidential, and it may have
been altered by unauthorized sources without your or our knowledge.
Nothing in the message is capable or intended to create any legally
binding obligations on either party and it is not intended to
provide legal advice. BBH accepts no responsibility for loss or
damage from its use, including damage from virus.


Re: zVM crash

2011-06-14 Thread Scott Rohling
That's odd ..   wait 1010 means 'no console available'..   any chance
someone was deactivating your lpar or trying to reload it?

Scott Rohling

On Tue, Jun 14, 2011 at 7:46 AM, Burton, Randy  wrote:

> I'm curious if this error rings a bell with any of you.  We of course
> have an ETR open and are working with IBM.  No hardware errors on the
> HMC, so we believe this was software and not hardware.  Here's the last
> operator log message before the LPAR went into a disabled wait:
>
> HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT
> RESPONSIVE.
>
> Disabled wait PSW was:
> 00021010
>
> HMC message was:
> Central processor (CP) 0 in partition VMD1, entered disabled wait state.
>
> Fortunately this was our development (test) zVM system, running a bunch
> of test zLinux guests.  We're running zVM 6.1 on a z10.  Of course we
> are nervous because what happens in test can happen in production.  We
> IPLed and so far so good.
>
> Thanks in advance for any help/suggestions!
>
> Randy Burton
> BB&T Bank
>


Re: zVM crash

2011-06-14 Thread Davis, Larry (National VM/VSE Capability)
Were you IPLing the system at the time. 1010 usually means it could not find 
the console.

Larry Davis

-Original Message-
From: The IBM z/VM Operating System [mailto:IBMVM@LISTSERV.UARK.EDU] On Behalf 
Of Burton, Randy
Sent: Tuesday, June 14, 2011 9:47 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: zVM crash

I'm curious if this error rings a bell with any of you.  We of course
have an ETR open and are working with IBM.  No hardware errors on the
HMC, so we believe this was software and not hardware.  Here's the last
operator log message before the LPAR went into a disabled wait:

HCPMPG9152E PROCESSOR 01 IS BEING VARIED OFFLINE BECAUSE IT IS NOT
RESPONSIVE. 

Disabled wait PSW was:
00021010

HMC message was:
Central processor (CP) 0 in partition VMD1, entered disabled wait state.

Fortunately this was our development (test) zVM system, running a bunch
of test zLinux guests.  We're running zVM 6.1 on a z10.  Of course we
are nervous because what happens in test can happen in production.  We
IPLed and so far so good.

Thanks in advance for any help/suggestions!

Randy Burton
BB&T Bank