Re: Strange behaviour.

2010-05-27 Thread Bob Atton
Thanks to everyone for your responses.  It does look like it was a paging 
problem of some sort.

Regards
 
Bob


Re: Strange behaviour.

2010-05-26 Thread Jim Bohnsack

Time to call IBM service.  The processor may have called home.
Jim

On 5/26/2010 10:05 AM, Bob Atton wrote:

This is a multipart message in MIME format.
--=_alternative 004D688A8025772F_=
Content-Type: text/plain; charset=US-ASCII

We have had a very strange and unusual message today on our z/VM 5.4
system.

I had a report that our zLinux guests were not accessible (putty, ping,
ssh all no good).  I logged on to MAINT OK but soon after when I had
issued a command I got the following messages.

  10:29:47  * MSG FROM MAINT   : DMSITM319T Machine check interrupt was
encountered; MCIC = X'4002AFBD 400B  '
  10:29:47  * MSG FROM MAINT   : DMSITM319T Disabled wait entered, please
re-IPL CMS.
HCPGIR450W CP entered; disabled wait PSW 000A 817C009E

CMS would not IPL but I could IPL 190.

When I tried to start the zLinux guests they initially failed with the
messages above because they IPL CMS but when I IPLed 190 and ran the
profile etc to start linux I saw

HCPMCV1459E The virtual machine is placed in check-stop state due to a
system malfunction with CPU 00.

After a while we IPLed the VM system and all appears to be OK.

Has anyone else seen such strange behaviour?


Regards

Bob
   


--
James Bohnsack
(972) 596-6377 home/office
(972) 342-5823 cell


Re: Strange behaviour.

2010-05-26 Thread Alan Altmark
On Wednesday, 05/26/2010 at 10:06 EDT, Bob Atton bob.j.at...@rrd.com 
wrote:
 We have had a very strange and unusual message today on our z/VM 5.4 
system. 
 
 I had a report that our zLinux guests were not accessible (putty, ping, 
ssh all 
 no good).  I logged on to MAINT OK but soon after when I had issued a 
command I 
 got the following messages. 
 
  10:29:47  * MSG FROM MAINT   : DMSITM319T Machine check interrupt was 
 encountered; MCIC = X'4002AFBD 400B  '

40 = System processing damage
02 = Backed up

That means that the CPU had an error and 'backed up' to the most recent 
internal consistent checkpoint, avoiding damage to memory, registers, 
timers, or the PSW.  The condition is fatal.

 CMS would not IPL but I could IPL 190. 
 
 When I tried to start the zLinux guests they initially failed with the 
messages 
 above because they IPL CMS but when I IPLed 190 and ran the profile etc 
to 
 start linux I saw 
 
 HCPMCV1459E The virtual machine is placed in check-stop state due to a 
system 
 malfunction with CPU 00. 

 After a while we IPLed the VM system and all appears to be OK.

I'm fuzzy on how CPU sparing and recovery work, and I don't know why you 
couldn't IPL an NSS.  Certainly you should check the HMC for hardware 
messages.  It likely called home.

Alan Altmark
z/VM Development
IBM Endicott


Re: Strange behaviour.

2010-05-26 Thread Alan Altmark
On Wednesday, 05/26/2010 at 10:06 EDT, Bob Atton bob.j.at...@rrd.com 
wrote:
 We have had a very strange and unusual message today on our z/VM 5.4 
system. 
 
 I had a report that our zLinux guests were not accessible (putty, ping, 
ssh all 
 no good).  I logged on to MAINT OK but soon after when I had issued a 
command I 
 got the following messages. 
 
  10:29:47  * MSG FROM MAINT   : DMSITM319T Machine check interrupt was 
 encountered; MCIC = X'4002AFBD 400B  '
  10:29:47  * MSG FROM MAINT   : DMSITM319T Disabled wait entered, please 
re-IPL 
 CMS. 
 HCPGIR450W CP entered; disabled wait PSW 000A 817C009E 
 
 CMS would not IPL but I could IPL 190. 

D'oh:  One other thing, check the operator's console for paging errors 
(paging volume or spool).

 When I tried to start the zLinux guests they initially failed with the 
messages 
 above because they IPL CMS but when I IPLed 190 and ran the profile etc 
to 
 start linux I saw 
 
 HCPMCV1459E The virtual machine is placed in check-stop state due to a 
system 
 malfunction with CPU 00. 

The machine check above is what you would get if the guest had page 0/1 
resident when the paging error occurred.

You can get this if CP cannot page in guest page 0 or 1 in order to store 
PSWs and logout data in order to present said machine check.  I do wish CP 
was just a tad more specific than simply system malfunction.

Alan Altmark
z/VM Development
IBM Endicott


Re: Strange behaviour.

2010-05-26 Thread Peter . Webb
I do wish CP was just a tad more specific than simply system
malfunction.

Hey, if it's good enough for Windows...


The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material.  Any 
review retransmission dissemination or other use of or taking any action in 
reliance upon this information by persons or entities other than the intended 
recipient or delegate is strictly prohibited.  If you received this in error 
please contact the sender and delete the material from any computer.  The 
integrity and security of this message cannot be guaranteed on the Internet.  
The sender accepts no liability for the content of this e-mail or for the 
consequences of any actions taken on the basis of information provided.  The 
recipient should check this e-mail and any attachments for the presence of 
viruses.  The sender accepts no liability for any damage caused by any virus 
transmitted by this e-mail.  This disclaimer is property of the TTC and must 
not be altered or circumvented in any manner.