Re: zLinux question

2007-10-31 Thread Alan Altmark
On Tuesday, 10/30/2007 at 09:16 EDT, [EMAIL PROTECTED] wrote:
 Thanks Alan... We do have TSA and do have GDPS set up but the mirrored 
volume 
 is 140 miles away via XRC. Would this still work?

Sorry, no.  TSA does not support the XRC connections managed by GDPS - it 
only deals with PPRC connections.

Alan Altmark
z/VM Development
IBM Endicott


Re: zLinux question

2007-10-30 Thread Rich Smrcina

Andy,

That's probably going to depend upon what lives on 9DA1.  If it's a root 
filesystem or (gasp) a swap disk, then it's probably fair to say Linux 
may throw in the towel.  But the z/VM messages don't tell the whole 
story, /var/log/messages may have more info.  But as I say, if it's the 
root filesystem it may not be able to write the messages, then there's a 
vicious cycle and boom!


I'd say check into what's causing the IFCC problems.

[EMAIL PROTECTED] wrote:


My background is z/OS so please excuse me

Question:
We have recently taken many hits (paths being lost, chip,ids etc 
see below) for some of our DASD which z/VM sits on and Redhat, we are 
looking into why. Each time we take such a hit we have lost different 
instances of zLinux. Now I understand the concept the OS z/Vm does the 
I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any 
linux running under z/VM is dependant on the operating system for 
recovery. So it normal when taking a hit like the one below to loose a 
zLinux instance?


 To me this sounds normal but wanted to make sure it wasn't something we 
missed. We are going to try to set up something in TSA to better monitor 
it but again just checking.


for example

10:24:23 HCPERP602I  DASD  9DA1 AN INTERFACE CONTROL CHECK OCCURRED
10:24:23 HCPERP6303I SENSE = INVALID 
   
10:24:23 HCPERP6304I IRB = 04C24017 46B22670 0002 0010E480   
   
10:24:23 HCPERP6305I USERID = LINUX1 
   
10:24:23 HCPERP2216I CHANNEL PATH ID = C4  
10:24:23 HCPERP2220I PHYSICAL CHANNEL PATH ID = 0403 
   
10:24:31 HCPERP2252I DEV   6A42 PATH 6D NOT OPERATIONAL
10:24:31 HCPERP602I  DEV   6A42 AN INTERFACE CONTROL CHECK OCCURRED
10:24:31 HCPERP6303I SENSE =     000 
   
10:24:31 HCPERP6303I   
10:24:31 HCPERP6304I IRB = 04824017  0002 00084000   
   
10:24:31 HCPERP6305I USERID = SYSTEM 
   
10:24:31 HCPERP2216I CHANNEL PATH ID = 6D  
10:24:31 HCPERP2220I PHYSICAL CHANNEL PATH ID = 0240 
   
10:24:49 HCPERP2252I DEV   F2B1 PATH 65 NOT OPERATIONAL
10:24:49 HCPERP2252I DEV   F2B1 PATH 65 NOT OPERATIONAL


*Thanks*
Andy
Internet: Mailto:[EMAIL PROTECTED]

The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.



--
Rich Smrcina
VM Assist, Inc.
Phone: 414-491-6001
Ans Service:  360-715-2467
rich.smrcina at vmassist.com
http://www.linkedin.com/in/richsmrcina

Catch the WAVV!  http://www.wavv.org
WAVV 2008 - Chattanooga - April 18-22, 2008


Re: zLinux question

2007-10-30 Thread awhite
Rich - 
Thanks for replying, In zLinux is there away to build tolerance 
like any setting to say try 'x' amount of times before taking the error or 
is that all under the covers?

Thanks

Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 02:15:06 PM:

 Andy,
 
 That's probably going to depend upon what lives on 9DA1.  If it's a root 

 filesystem or (gasp) a swap disk, then it's probably fair to say Linux 
 may throw in the towel.  But the z/VM messages don't tell the whole 
 story, /var/log/messages may have more info.  But as I say, if it's the 
 root filesystem it may not be able to write the messages, then there's a 

 vicious cycle and boom!
 
 I'd say check into what's causing the IFCC problems.
 
 [EMAIL PROTECTED] wrote:
  
  My background is z/OS so please excuse me
  
  Question:
  We have recently taken many hits (paths being lost, chip,ids 
etc 
  see below) for some of our DASD which z/VM sits on and Redhat, we are 
  looking into why. Each time we take such a hit we have lost different 
  instances of zLinux. Now I understand the concept the OS z/Vm does the 

  I/O and recovery through MIH etc. I am to assume then Redhat or SUSE 
any 
  linux running under z/VM is dependant on the operating system for 
  recovery. So it normal when taking a hit like the one below to loose a 

  zLinux instance?

The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.


Re: zLinux question

2007-10-30 Thread Macioce, Larry
So I take it the systems come bace(recovery) until another hit is taken?

Mace

 



From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, October 30, 2007 2:17 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: zLinux question

 


Rich - 
Thanks for replying, In zLinux is there away to build tolerance
like any setting to say try 'x' amount of times before taking the error
or is that all under the covers? 

Thanks 

Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on
10/30/2007 02:15:06 PM:

 Andy,
 
 That's probably going to depend upon what lives on 9DA1.  If it's a
root 
 filesystem or (gasp) a swap disk, then it's probably fair to say Linux

 may throw in the towel.  But the z/VM messages don't tell the whole 
 story, /var/log/messages may have more info.  But as I say, if it's
the 
 root filesystem it may not be able to write the messages, then there's
a 
 vicious cycle and boom!
 
 I'd say check into what's causing the IFCC problems.
 
 [EMAIL PROTECTED] wrote:
  
  My background is z/OS so please excuse me
  
  Question:
  We have recently taken many hits (paths being lost, chip,ids
etc 
  see below) for some of our DASD which z/VM sits on and Redhat, we
are 
  looking into why. Each time we take such a hit we have lost
different 
  instances of zLinux. Now I understand the concept the OS z/Vm does
the 
  I/O and recovery through MIH etc. I am to assume then Redhat or SUSE
any 
  linux running under z/VM is dependant on the operating system for 
  recovery. So it normal when taking a hit like the one below to loose
a 
  zLinux instance?

The information contained in this message may be CONFIDENTIAL and is for
the intended addressee only.  Any unauthorized use, dissemination of the
information, or copying of this message is prohibited.  If you are not
the intended addressee, please notify the sender immediately and delete
this message.



-

The information transmitted is intended solely for the individual
or entity to which it is addressed and may contain confidential
and/or
privileged material. Any review, retransmission, dissemination or
other use of or taking action in reliance upon this information by
persons or entities other than the intended recipient is
prohibited. If you have received this email in error please contact
the sender and delete the
material from any computer.



Re: zLinux question

2007-10-30 Thread Rich Smrcina
Actually with a hardware error like that, the z/VM messages tell most of 
the story (I misspoke) and z/VM is your best bet at recovery.  It should 
handle the error condition better than Linux will (assuming you are 
using minidisks).


Fixing your IFCC problem is the quickest route to a cure.

Unless there's something in the newer DASD drivers, I don't know of any 
configurable retry mechanism.  But that IFCC issue may cause you some 
real problems if it isn't corrected.


[EMAIL PROTECTED] wrote:


Rich -
Thanks for replying, In zLinux is there away to build tolerance 
like any setting to say try 'x' amount of times before taking the error 
or is that all under the covers?


Thanks

Andy
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 02:15:06 PM:


  Andy,
 
  That's probably going to depend upon what lives on 9DA1.  If it's a root
  filesystem or (gasp) a swap disk, then it's probably fair to say Linux
  may throw in the towel.  But the z/VM messages don't tell the whole
  story, /var/log/messages may have more info.  But as I say, if it's the
  root filesystem it may not be able to write the messages, then there's a
  vicious cycle and boom!
 
  I'd say check into what's causing the IFCC problems.
 
  [EMAIL PROTECTED] wrote:
  
   My background is z/OS so please excuse me
  
   Question:
   We have recently taken many hits (paths being lost, 
chip,ids etc

   see below) for some of our DASD which z/VM sits on and Redhat, we are
   looking into why. Each time we take such a hit we have lost different
   instances of zLinux. Now I understand the concept the OS z/Vm does the
   I/O and recovery through MIH etc. I am to assume then Redhat or 
SUSE any

   linux running under z/VM is dependant on the operating system for
   recovery. So it normal when taking a hit like the one below to loose a
   zLinux instance?

The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.



--
Rich Smrcina
VM Assist, Inc.
Phone: 414-491-6001
Ans Service:  360-715-2467
rich.smrcina at vmassist.com
http://www.linkedin.com/in/richsmrcina

Catch the WAVV!  http://www.wavv.org
WAVV 2008 - Chattanooga - April 18-22, 2008


Re: zLinux question

2007-10-30 Thread Macioce, Larry
Dumb question , I assuming the packs can be shared between VM and MVS.
Have you checked MVS syslog to make sure someone hasn't varied a pack
off/on? Or tried to access a pack they shouldn't.  I know it seems like
they wouldn't do it as often as it has happened but worth a look. 

The machine itself hasn't complained of any problems has it?? 

Mace



From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, October 30, 2007 2:17 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: zLinux question

 


Rich - 
Thanks for replying, In zLinux is there away to build tolerance
like any setting to say try 'x' amount of times before taking the error
or is that all under the covers? 

Thanks 

Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on
10/30/2007 02:15:06 PM:

 Andy,
 
 That's probably going to depend upon what lives on 9DA1.  If it's a
root 
 filesystem or (gasp) a swap disk, then it's probably fair to say Linux

 may throw in the towel.  But the z/VM messages don't tell the whole 
 story, /var/log/messages may have more info.  But as I say, if it's
the 
 root filesystem it may not be able to write the messages, then there's
a 
 vicious cycle and boom!
 
 I'd say check into what's causing the IFCC problems.
 
 [EMAIL PROTECTED] wrote:
  
  My background is z/OS so please excuse me
  
  Question:
  We have recently taken many hits (paths being lost, chip,ids
etc 
  see below) for some of our DASD which z/VM sits on and Redhat, we
are 
  looking into why. Each time we take such a hit we have lost
different 
  instances of zLinux. Now I understand the concept the OS z/Vm does
the 
  I/O and recovery through MIH etc. I am to assume then Redhat or SUSE
any 
  linux running under z/VM is dependant on the operating system for 
  recovery. So it normal when taking a hit like the one below to loose
a 
  zLinux instance?

The information contained in this message may be CONFIDENTIAL and is for
the intended addressee only.  Any unauthorized use, dissemination of the
information, or copying of this message is prohibited.  If you are not
the intended addressee, please notify the sender immediately and delete
this message.



-

The information transmitted is intended solely for the individual
or entity to which it is addressed and may contain confidential
and/or
privileged material. Any review, retransmission, dissemination or
other use of or taking action in reliance upon this information by
persons or entities other than the intended recipient is
prohibited. If you have received this email in error please contact
the sender and delete the
material from any computer.



Re: zLinux question

2007-10-30 Thread RPN01
Note also that erep may have information that would be useful in diagnosing
the problem. Get into the erep manual and figure out how to get the
information its hoarding; give it to your systems or hardware people...

-- 
   .~.Robert P. Nix Mayo Foundation
   /V\RO-OE-5-55200 First Street SW
  /( )\   507-284-0844  Rochester, MN 55905
  ^^-^^   - 
In theory, theory and practice are the same, but
 in practice, theory and practice are different.




On 10/30/07 1:53 PM, Rich Smrcina [EMAIL PROTECTED] wrote:

 Actually with a hardware error like that, the z/VM messages tell most of
 the story (I misspoke) and z/VM is your best bet at recovery.  It should
 handle the error condition better than Linux will (assuming you are
 using minidisks).
 
 Fixing your IFCC problem is the quickest route to a cure.
 
 Unless there's something in the newer DASD drivers, I don't know of any
 configurable retry mechanism.  But that IFCC issue may cause you some
 real problems if it isn't corrected.
 
 [EMAIL PROTECTED] wrote:
 
 Rich -
 Thanks for replying, In zLinux is there away to build tolerance
 like any setting to say try 'x' amount of times before taking the error
 or is that all under the covers?
 
 Thanks
 
 Andy
 Internet: Mailto:[EMAIL PROTECTED]
 
 
 The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on
 10/30/2007 02:15:06 PM:
 
 Andy,
 
 That's probably going to depend upon what lives on 9DA1.  If it's a root
 filesystem or (gasp) a swap disk, then it's probably fair to say Linux
 may throw in the towel.  But the z/VM messages don't tell the whole
 story, /var/log/messages may have more info.  But as I say, if it's the
 root filesystem it may not be able to write the messages, then there's a
 vicious cycle and boom!
 
 I'd say check into what's causing the IFCC problems.
 
 [EMAIL PROTECTED] wrote:
 
 My background is z/OS so please excuse me
 
 Question:
 We have recently taken many hits (paths being lost,
 chip,ids etc
 see below) for some of our DASD which z/VM sits on and Redhat, we are
 looking into why. Each time we take such a hit we have lost different
 instances of zLinux. Now I understand the concept the OS z/Vm does the
 I/O and recovery through MIH etc. I am to assume then Redhat or
 SUSE any
 linux running under z/VM is dependant on the operating system for
 recovery. So it normal when taking a hit like the one below to loose a
 zLinux instance?
 
 The information contained in this message may be CONFIDENTIAL and is for the
 intended addressee only.  Any unauthorized use, dissemination of the
 information, or copying of this message is prohibited.  If you are not the
 intended addressee, please notify the sender immediately and delete this
 message.
 


Re: zLinux question

2007-10-30 Thread awhite
Larry - 
Im missing something we dont share these packs/dasd with any MVS 
system so what log am I checking?

Thanks
Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 02:32:25 PM:

 Dumb question , I assuming the packs can be shared between VM and 
 MVS. Have you checked MVS syslog to make sure someone hasn?t varied 
 a pack off/on? Or tried to access a pack they shouldn?t.  I know it 
 seems like they wouldn?t do it as often as it has happened but worth a 
look. 
 The machine itself hasn?t complained of any problems has it?? 
 Mace
 
 


The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.


Re: zLinux question

2007-10-30 Thread awhite
Thanks Robert But we already went down this route as I explained we know 
it was a hardware hit to a brocade device. That really wasnt my question 
we were told within a few minutes the chip'ds came back VM stayed up. But 
zLinux crashed on the VM system. Those of course involved with the chip'd 
taking the errors. Im just trying to find out is this normal, there are no 
time out values for zLinux to wait before going down hard or if it cant 
get to Root lets say once it dies? Sounds that way but wanting to see if 
there is anything we can do to prevent this other then the obvious make 
sure we dont take hardware hits ;)


Andy
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 03:58:42 PM:

 Note also that erep may have information that would be useful in 
diagnosing
 the problem. Get into the erep manual and figure out how to get the
 information its hoarding; give it to your systems or hardware people...
 
 -- 
.~.Robert P. Nix Mayo Foundation
/V\RO-OE-5-55200 First Street SW



The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.


Re: zLinux question

2007-10-30 Thread Rich Smrcina

Is this an FCP device?  I wonder if this is an MIH problem.

To the group: Are there special MIH settings required for FCP devices?

[EMAIL PROTECTED] wrote:
Thanks Robert But we already went down this route as I explained we know 
it was a hardware hit to a brocade device. That really wasnt my question 
we were told within a few minutes the chip'ds came back VM stayed up. But 
zLinux crashed on the VM system. Those of course involved with the chip'd 
taking the errors. Im just trying to find out is this normal, there are no 
time out values for zLinux to wait before going down hard or if it cant 
get to Root lets say once it dies? Sounds that way but wanting to see if 
there is anything we can do to prevent this other then the obvious make 
sure we dont take hardware hits ;)



Andy
Internet: Mailto:[EMAIL PROTECTED]


--
Rich Smrcina
VM Assist, Inc.
Phone: 414-491-6001
Ans Service:  360-715-2467
rich.smrcina at vmassist.com
http://www.linkedin.com/in/richsmrcina

Catch the WAVV!  http://www.wavv.org
WAVV 2008 - Chattanooga - April 18-22, 2008


Re: zLinux question

2007-10-30 Thread awhite
Rich - 
Sorry what is an FCP device? They are DASD IBM DS8300 or DS8000 
type device in XRC mode. We have MIH set for 2:30 for them and was 
confirmed to be correct with our hardware person.
Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 07:33:26 PM:

 Is this an FCP device?  I wonder if this is an MIH problem.
 

 



The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.


Re: zLinux question

2007-10-30 Thread Mark Post
 On Tue, Oct 30, 2007 at  7:15 PM, in message
[EMAIL PROTECTED],
[EMAIL PROTECTED] wrote: 
 Thanks Robert But we already went down this route as I explained we know 
 it was a hardware hit to a brocade device. That really wasnt my question 
 we were told within a few minutes the chip'ds came back VM stayed up. But 
 zLinux crashed on the VM system. Those of course involved with the chip'd 
 taking the errors. Im just trying to find out is this normal, there are no 

As just about every response on mailing lists start with, it depends...  On 
what OS was using a particular device for.  If it had been one of z/VM's paging 
packs involved, things could have gotten ugly.  (Not necessarily terminal, but 
certainly a little scary.)  If it was one of Linux's application data volumes, 
more than likely Linux would have stayed up while the application died.  
There's no hard and fast rule here.

 time out values for zLinux to wait before going down hard or if it cant 
 get to Root lets say once it dies? Sounds that way but wanting to see if 
 there is anything we can do to prevent this other then the obvious make 
 sure we dont take hardware hits ;)

The Linux DASD device drivers have a fair amount of their own error recovery 
code in them.  Not as good as z/VM's, probably (I'm in no position to judge 
that), but Linux doesn't just fall over with the first I/O error, either, since 
it also runs in an LPAR, and can't count on z/VM doing error recovery for it.

It's usually going to be something fairly serious that causes a Linux system to 
crash.  For example, I've been following an internal mailing list thread that 
was talking about a customer's midrange SLES system having the root file system 
get re-mounted as read-only.  Various people confirmed that if Linux 
experiences non temporary errors writing to a file system, even /, it will 
re-mount the file system as read-only in an effort to prevent any (further) 
data corruption on that file system.  If Linux is no longer able to even _read_ 
things from a file system that it needs to keep running, then yeah, your system 
is likely to throw a kernel panic and die.

In your case, it sounds like you had something important to Linux go away for a 
long while (a few minutes is an eternity when you're talking about computers 
and I/O), and z/VM wasn't depending on any of those devices for its own 
continued functioning.  Just be glad it wasn't the other way around.  :)

The things you'll want to look at are redundancy everywhere.  In your paths to 
the switches (plural!), from the switches to the storage arrays (plural!), and 
so on.  If an application is important enough, then you need to be looking at 
High Availability clustering techniques, and so on.  With mainframe hardware, 
simply eliminating single points of failure gets you most of the way there.


Mark Post


Re: zLinux question

2007-10-30 Thread Rich Smrcina
So are the devices being accessed in 3390 mode?  If so then they are not 
FCP devices.


[EMAIL PROTECTED] wrote:
Rich - 
Sorry what is an FCP device? They are DASD IBM DS8300 or DS8000 
type device in XRC mode. We have MIH set for 2:30 for them and was 
confirmed to be correct with our hardware person.
Andy 
Internet: Mailto:[EMAIL PROTECTED]



The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 07:33:26 PM:



Is this an FCP device?  I wonder if this is an MIH problem.






The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.



--
Rich Smrcina
VM Assist, Inc.
Phone: 414-491-6001
Ans Service:  360-715-2467
rich.smrcina at vmassist.com
http://www.linkedin.com/in/richsmrcina

Catch the WAVV!  http://www.wavv.org
WAVV 2008 - Chattanooga - April 18-22, 2008


Re: zLinux question

2007-10-30 Thread awhite
Rich - yes in 3390-3 emulation.
Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 08:02:38 PM:

 So are the devices being accessed in 3390 mode?  If so then they are not 

 FCP devices.
 
 [EMAIL PROTECTED] wrote:
  Rich - 
  Sorry what is an FCP device? They are DASD IBM DS8300 or 
DS8000 
  type device in XRC mode. We have MIH set for 2:30 for them and was 
  confirmed to be correct with our hardware person.
  Andy 
  Internet: Mailto:[EMAIL PROTECTED]
  
  
  The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
  10/30/2007 07:33:26 PM:
  
  Is this an FCP device?  I wonder if this is an MIH problem.
 
  
  
  
 
 
 -- 
 
 



The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.


Re: zLinux question

2007-10-30 Thread awhite
Thanks Alan... We do have TSA and do have GDPS set up but the mirrored 
volume is 140 miles away via XRC. Would this still work?
Andy 
Internet: Mailto:[EMAIL PROTECTED]


The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 
10/30/2007 08:58:54 PM:

 On Tuesday, 10/30/2007 at 07:16 EDT, [EMAIL PROTECTED] wrote:
  Thanks Robert But we already went down this route as I explained we 
know
  it was a hardware hit to a brocade device. That really wasnt my 
question
  we were told within a few minutes the chip'ds came back VM stayed up. 
 But
  zLinux crashed on the VM system. Those of course involved with the 
 chip'd
  taking the errors. Im just trying to find out is this normal, there 
are 
 no
  time out values for zLinux to wait before going down hard or if it 
cant
  get to Root lets say once it dies? Sounds that way but wanting to see 
if
  there is anything we can do to prevent this other then the obvious 
make
  sure we dont take hardware hits ;)
 
 To answer your question, an interface control check is a permanent I/O 
 error (hence the HCPERP notifications and an EREP record was likely 
 created).  The channel subsystem has already tried all available paths 
to 
 get to the device.  There's nothing the guest can do to fix it.
 
 This is exactly the the kind of thing that Linux-HA and/or Tivoli System 

 Automation for Linux [I think] can address using z/VM's HYPERSWAP 
command, 
 if you have a z/OS GDPS solution.  The I/O error would be trapped by the 

 monitoring [Linux] guest and the failing volume replaced by a mirrored 
 volume.  (GDPS manages the mirroring.)
 
 Of course, if the primary and secondary volumes are coming through the 
 same FICON switch then it won't help as much (protecting you only from 
 port failures).
 
 Alan Altmark
 z/VM Development
 IBM Endicott
 

The information contained in this message may be CONFIDENTIAL and is for the 
intended addressee only.  Any unauthorized use, dissemination of the 
information, or copying of this message is prohibited.  If you are not the 
intended addressee, please notify the sender immediately and delete this 
message.


Re: zLinux question

2007-10-30 Thread Alan Altmark
On Tuesday, 10/30/2007 at 07:16 EDT, [EMAIL PROTECTED] wrote:
 Thanks Robert But we already went down this route as I explained we know
 it was a hardware hit to a brocade device. That really wasnt my question
 we were told within a few minutes the chip'ds came back VM stayed up. 
But
 zLinux crashed on the VM system. Those of course involved with the 
chip'd
 taking the errors. Im just trying to find out is this normal, there are 
no
 time out values for zLinux to wait before going down hard or if it cant
 get to Root lets say once it dies? Sounds that way but wanting to see if
 there is anything we can do to prevent this other then the obvious make
 sure we dont take hardware hits ;)

To answer your question, an interface control check is a permanent I/O 
error (hence the HCPERP notifications and an EREP record was likely 
created).  The channel subsystem has already tried all available paths to 
get to the device.  There's nothing the guest can do to fix it.

This is exactly the the kind of thing that Linux-HA and/or Tivoli System 
Automation for Linux [I think] can address using z/VM's HYPERSWAP command, 
if you have a z/OS GDPS solution.  The I/O error would be trapped by the 
monitoring [Linux] guest and the failing volume replaced by a mirrored 
volume.  (GDPS manages the mirroring.)

Of course, if the primary and secondary volumes are coming through the 
same FICON switch then it won't help as much (protecting you only from 
port failures).

Alan Altmark
z/VM Development
IBM Endicott