Re: zLinux question
On Tuesday, 10/30/2007 at 09:16 EDT, [EMAIL PROTECTED] wrote: Thanks Alan... We do have TSA and do have GDPS set up but the mirrored volume is 140 miles away via XRC. Would this still work? Sorry, no. TSA does not support the XRC connections managed by GDPS - it only deals with PPRC connections. Alan Altmark z/VM Development IBM Endicott
zLinux question
My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? To me this sounds normal but wanted to make sure it wasn't something we missed. We are going to try to set up something in TSA to better monitor it but again just checking. for example 10:24:23 HCPERP602I DASD 9DA1 AN INTERFACE CONTROL CHECK OCCURRED 10:24:23 HCPERP6303I SENSE = INVALID 10:24:23 HCPERP6304I IRB = 04C24017 46B22670 0002 0010E480 10:24:23 HCPERP6305I USERID = LINUX1 10:24:23 HCPERP2216I CHANNEL PATH ID = C4 10:24:23 HCPERP2220I PHYSICAL CHANNEL PATH ID = 0403 10:24:31 HCPERP2252I DEV 6A42 PATH 6D NOT OPERATIONAL 10:24:31 HCPERP602I DEV 6A42 AN INTERFACE CONTROL CHECK OCCURRED 10:24:31 HCPERP6303I SENSE = 000 10:24:31 HCPERP6303I 10:24:31 HCPERP6304I IRB = 04824017 0002 00084000 10:24:31 HCPERP6305I USERID = SYSTEM 10:24:31 HCPERP2216I CHANNEL PATH ID = 6D 10:24:31 HCPERP2220I PHYSICAL CHANNEL PATH ID = 0240 10:24:49 HCPERP2252I DEV F2B1 PATH 65 NOT OPERATIONAL 10:24:49 HCPERP2252I DEV F2B1 PATH 65 NOT OPERATIONAL Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
Andy, That's probably going to depend upon what lives on 9DA1. If it's a root filesystem or (gasp) a swap disk, then it's probably fair to say Linux may throw in the towel. But the z/VM messages don't tell the whole story, /var/log/messages may have more info. But as I say, if it's the root filesystem it may not be able to write the messages, then there's a vicious cycle and boom! I'd say check into what's causing the IFCC problems. [EMAIL PROTECTED] wrote: My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? To me this sounds normal but wanted to make sure it wasn't something we missed. We are going to try to set up something in TSA to better monitor it but again just checking. for example 10:24:23 HCPERP602I DASD 9DA1 AN INTERFACE CONTROL CHECK OCCURRED 10:24:23 HCPERP6303I SENSE = INVALID 10:24:23 HCPERP6304I IRB = 04C24017 46B22670 0002 0010E480 10:24:23 HCPERP6305I USERID = LINUX1 10:24:23 HCPERP2216I CHANNEL PATH ID = C4 10:24:23 HCPERP2220I PHYSICAL CHANNEL PATH ID = 0403 10:24:31 HCPERP2252I DEV 6A42 PATH 6D NOT OPERATIONAL 10:24:31 HCPERP602I DEV 6A42 AN INTERFACE CONTROL CHECK OCCURRED 10:24:31 HCPERP6303I SENSE = 000 10:24:31 HCPERP6303I 10:24:31 HCPERP6304I IRB = 04824017 0002 00084000 10:24:31 HCPERP6305I USERID = SYSTEM 10:24:31 HCPERP2216I CHANNEL PATH ID = 6D 10:24:31 HCPERP2220I PHYSICAL CHANNEL PATH ID = 0240 10:24:49 HCPERP2252I DEV F2B1 PATH 65 NOT OPERATIONAL 10:24:49 HCPERP2252I DEV F2B1 PATH 65 NOT OPERATIONAL *Thanks* Andy Internet: Mailto:[EMAIL PROTECTED] The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message. -- Rich Smrcina VM Assist, Inc. Phone: 414-491-6001 Ans Service: 360-715-2467 rich.smrcina at vmassist.com http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2008 - Chattanooga - April 18-22, 2008
Re: zLinux question
Rich - Thanks for replying, In zLinux is there away to build tolerance like any setting to say try 'x' amount of times before taking the error or is that all under the covers? Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 02:15:06 PM: Andy, That's probably going to depend upon what lives on 9DA1. If it's a root filesystem or (gasp) a swap disk, then it's probably fair to say Linux may throw in the towel. But the z/VM messages don't tell the whole story, /var/log/messages may have more info. But as I say, if it's the root filesystem it may not be able to write the messages, then there's a vicious cycle and boom! I'd say check into what's causing the IFCC problems. [EMAIL PROTECTED] wrote: My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
So I take it the systems come bace(recovery) until another hit is taken? Mace From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 2:17 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: zLinux question Rich - Thanks for replying, In zLinux is there away to build tolerance like any setting to say try 'x' amount of times before taking the error or is that all under the covers? Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 02:15:06 PM: Andy, That's probably going to depend upon what lives on 9DA1. If it's a root filesystem or (gasp) a swap disk, then it's probably fair to say Linux may throw in the towel. But the z/VM messages don't tell the whole story, /var/log/messages may have more info. But as I say, if it's the root filesystem it may not be able to write the messages, then there's a vicious cycle and boom! I'd say check into what's causing the IFCC problems. [EMAIL PROTECTED] wrote: My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message. - The information transmitted is intended solely for the individual or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this email in error please contact the sender and delete the material from any computer.
Re: zLinux question
Actually with a hardware error like that, the z/VM messages tell most of the story (I misspoke) and z/VM is your best bet at recovery. It should handle the error condition better than Linux will (assuming you are using minidisks). Fixing your IFCC problem is the quickest route to a cure. Unless there's something in the newer DASD drivers, I don't know of any configurable retry mechanism. But that IFCC issue may cause you some real problems if it isn't corrected. [EMAIL PROTECTED] wrote: Rich - Thanks for replying, In zLinux is there away to build tolerance like any setting to say try 'x' amount of times before taking the error or is that all under the covers? Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 02:15:06 PM: Andy, That's probably going to depend upon what lives on 9DA1. If it's a root filesystem or (gasp) a swap disk, then it's probably fair to say Linux may throw in the towel. But the z/VM messages don't tell the whole story, /var/log/messages may have more info. But as I say, if it's the root filesystem it may not be able to write the messages, then there's a vicious cycle and boom! I'd say check into what's causing the IFCC problems. [EMAIL PROTECTED] wrote: My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message. -- Rich Smrcina VM Assist, Inc. Phone: 414-491-6001 Ans Service: 360-715-2467 rich.smrcina at vmassist.com http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2008 - Chattanooga - April 18-22, 2008
Re: zLinux question
Dumb question , I assuming the packs can be shared between VM and MVS. Have you checked MVS syslog to make sure someone hasn't varied a pack off/on? Or tried to access a pack they shouldn't. I know it seems like they wouldn't do it as often as it has happened but worth a look. The machine itself hasn't complained of any problems has it?? Mace From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 2:17 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: zLinux question Rich - Thanks for replying, In zLinux is there away to build tolerance like any setting to say try 'x' amount of times before taking the error or is that all under the covers? Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 02:15:06 PM: Andy, That's probably going to depend upon what lives on 9DA1. If it's a root filesystem or (gasp) a swap disk, then it's probably fair to say Linux may throw in the towel. But the z/VM messages don't tell the whole story, /var/log/messages may have more info. But as I say, if it's the root filesystem it may not be able to write the messages, then there's a vicious cycle and boom! I'd say check into what's causing the IFCC problems. [EMAIL PROTECTED] wrote: My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message. - The information transmitted is intended solely for the individual or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this email in error please contact the sender and delete the material from any computer.
Re: zLinux question
Note also that erep may have information that would be useful in diagnosing the problem. Get into the erep manual and figure out how to get the information its hoarding; give it to your systems or hardware people... -- .~.Robert P. Nix Mayo Foundation /V\RO-OE-5-55200 First Street SW /( )\ 507-284-0844 Rochester, MN 55905 ^^-^^ - In theory, theory and practice are the same, but in practice, theory and practice are different. On 10/30/07 1:53 PM, Rich Smrcina [EMAIL PROTECTED] wrote: Actually with a hardware error like that, the z/VM messages tell most of the story (I misspoke) and z/VM is your best bet at recovery. It should handle the error condition better than Linux will (assuming you are using minidisks). Fixing your IFCC problem is the quickest route to a cure. Unless there's something in the newer DASD drivers, I don't know of any configurable retry mechanism. But that IFCC issue may cause you some real problems if it isn't corrected. [EMAIL PROTECTED] wrote: Rich - Thanks for replying, In zLinux is there away to build tolerance like any setting to say try 'x' amount of times before taking the error or is that all under the covers? Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 02:15:06 PM: Andy, That's probably going to depend upon what lives on 9DA1. If it's a root filesystem or (gasp) a swap disk, then it's probably fair to say Linux may throw in the towel. But the z/VM messages don't tell the whole story, /var/log/messages may have more info. But as I say, if it's the root filesystem it may not be able to write the messages, then there's a vicious cycle and boom! I'd say check into what's causing the IFCC problems. [EMAIL PROTECTED] wrote: My background is z/OS so please excuse me Question: We have recently taken many hits (paths being lost, chip,ids etc see below) for some of our DASD which z/VM sits on and Redhat, we are looking into why. Each time we take such a hit we have lost different instances of zLinux. Now I understand the concept the OS z/Vm does the I/O and recovery through MIH etc. I am to assume then Redhat or SUSE any linux running under z/VM is dependant on the operating system for recovery. So it normal when taking a hit like the one below to loose a zLinux instance? The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
Larry - Im missing something we dont share these packs/dasd with any MVS system so what log am I checking? Thanks Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 02:32:25 PM: Dumb question , I assuming the packs can be shared between VM and MVS. Have you checked MVS syslog to make sure someone hasn?t varied a pack off/on? Or tried to access a pack they shouldn?t. I know it seems like they wouldn?t do it as often as it has happened but worth a look. The machine itself hasn?t complained of any problems has it?? Mace The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
Thanks Robert But we already went down this route as I explained we know it was a hardware hit to a brocade device. That really wasnt my question we were told within a few minutes the chip'ds came back VM stayed up. But zLinux crashed on the VM system. Those of course involved with the chip'd taking the errors. Im just trying to find out is this normal, there are no time out values for zLinux to wait before going down hard or if it cant get to Root lets say once it dies? Sounds that way but wanting to see if there is anything we can do to prevent this other then the obvious make sure we dont take hardware hits ;) Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 03:58:42 PM: Note also that erep may have information that would be useful in diagnosing the problem. Get into the erep manual and figure out how to get the information its hoarding; give it to your systems or hardware people... -- .~.Robert P. Nix Mayo Foundation /V\RO-OE-5-55200 First Street SW The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
Is this an FCP device? I wonder if this is an MIH problem. To the group: Are there special MIH settings required for FCP devices? [EMAIL PROTECTED] wrote: Thanks Robert But we already went down this route as I explained we know it was a hardware hit to a brocade device. That really wasnt my question we were told within a few minutes the chip'ds came back VM stayed up. But zLinux crashed on the VM system. Those of course involved with the chip'd taking the errors. Im just trying to find out is this normal, there are no time out values for zLinux to wait before going down hard or if it cant get to Root lets say once it dies? Sounds that way but wanting to see if there is anything we can do to prevent this other then the obvious make sure we dont take hardware hits ;) Andy Internet: Mailto:[EMAIL PROTECTED] -- Rich Smrcina VM Assist, Inc. Phone: 414-491-6001 Ans Service: 360-715-2467 rich.smrcina at vmassist.com http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2008 - Chattanooga - April 18-22, 2008
Re: zLinux question
Rich - Sorry what is an FCP device? They are DASD IBM DS8300 or DS8000 type device in XRC mode. We have MIH set for 2:30 for them and was confirmed to be correct with our hardware person. Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 07:33:26 PM: Is this an FCP device? I wonder if this is an MIH problem. The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
On Tue, Oct 30, 2007 at 7:15 PM, in message [EMAIL PROTECTED], [EMAIL PROTECTED] wrote: Thanks Robert But we already went down this route as I explained we know it was a hardware hit to a brocade device. That really wasnt my question we were told within a few minutes the chip'ds came back VM stayed up. But zLinux crashed on the VM system. Those of course involved with the chip'd taking the errors. Im just trying to find out is this normal, there are no As just about every response on mailing lists start with, it depends... On what OS was using a particular device for. If it had been one of z/VM's paging packs involved, things could have gotten ugly. (Not necessarily terminal, but certainly a little scary.) If it was one of Linux's application data volumes, more than likely Linux would have stayed up while the application died. There's no hard and fast rule here. time out values for zLinux to wait before going down hard or if it cant get to Root lets say once it dies? Sounds that way but wanting to see if there is anything we can do to prevent this other then the obvious make sure we dont take hardware hits ;) The Linux DASD device drivers have a fair amount of their own error recovery code in them. Not as good as z/VM's, probably (I'm in no position to judge that), but Linux doesn't just fall over with the first I/O error, either, since it also runs in an LPAR, and can't count on z/VM doing error recovery for it. It's usually going to be something fairly serious that causes a Linux system to crash. For example, I've been following an internal mailing list thread that was talking about a customer's midrange SLES system having the root file system get re-mounted as read-only. Various people confirmed that if Linux experiences non temporary errors writing to a file system, even /, it will re-mount the file system as read-only in an effort to prevent any (further) data corruption on that file system. If Linux is no longer able to even _read_ things from a file system that it needs to keep running, then yeah, your system is likely to throw a kernel panic and die. In your case, it sounds like you had something important to Linux go away for a long while (a few minutes is an eternity when you're talking about computers and I/O), and z/VM wasn't depending on any of those devices for its own continued functioning. Just be glad it wasn't the other way around. :) The things you'll want to look at are redundancy everywhere. In your paths to the switches (plural!), from the switches to the storage arrays (plural!), and so on. If an application is important enough, then you need to be looking at High Availability clustering techniques, and so on. With mainframe hardware, simply eliminating single points of failure gets you most of the way there. Mark Post
Re: zLinux question
So are the devices being accessed in 3390 mode? If so then they are not FCP devices. [EMAIL PROTECTED] wrote: Rich - Sorry what is an FCP device? They are DASD IBM DS8300 or DS8000 type device in XRC mode. We have MIH set for 2:30 for them and was confirmed to be correct with our hardware person. Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 07:33:26 PM: Is this an FCP device? I wonder if this is an MIH problem. The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message. -- Rich Smrcina VM Assist, Inc. Phone: 414-491-6001 Ans Service: 360-715-2467 rich.smrcina at vmassist.com http://www.linkedin.com/in/richsmrcina Catch the WAVV! http://www.wavv.org WAVV 2008 - Chattanooga - April 18-22, 2008
Re: zLinux question
Rich - yes in 3390-3 emulation. Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 08:02:38 PM: So are the devices being accessed in 3390 mode? If so then they are not FCP devices. [EMAIL PROTECTED] wrote: Rich - Sorry what is an FCP device? They are DASD IBM DS8300 or DS8000 type device in XRC mode. We have MIH set for 2:30 for them and was confirmed to be correct with our hardware person. Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 07:33:26 PM: Is this an FCP device? I wonder if this is an MIH problem. -- The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
Thanks Alan... We do have TSA and do have GDPS set up but the mirrored volume is 140 miles away via XRC. Would this still work? Andy Internet: Mailto:[EMAIL PROTECTED] The IBM z/VM Operating System IBMVM@LISTSERV.UARK.EDU wrote on 10/30/2007 08:58:54 PM: On Tuesday, 10/30/2007 at 07:16 EDT, [EMAIL PROTECTED] wrote: Thanks Robert But we already went down this route as I explained we know it was a hardware hit to a brocade device. That really wasnt my question we were told within a few minutes the chip'ds came back VM stayed up. But zLinux crashed on the VM system. Those of course involved with the chip'd taking the errors. Im just trying to find out is this normal, there are no time out values for zLinux to wait before going down hard or if it cant get to Root lets say once it dies? Sounds that way but wanting to see if there is anything we can do to prevent this other then the obvious make sure we dont take hardware hits ;) To answer your question, an interface control check is a permanent I/O error (hence the HCPERP notifications and an EREP record was likely created). The channel subsystem has already tried all available paths to get to the device. There's nothing the guest can do to fix it. This is exactly the the kind of thing that Linux-HA and/or Tivoli System Automation for Linux [I think] can address using z/VM's HYPERSWAP command, if you have a z/OS GDPS solution. The I/O error would be trapped by the monitoring [Linux] guest and the failing volume replaced by a mirrored volume. (GDPS manages the mirroring.) Of course, if the primary and secondary volumes are coming through the same FICON switch then it won't help as much (protecting you only from port failures). Alan Altmark z/VM Development IBM Endicott The information contained in this message may be CONFIDENTIAL and is for the intended addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. If you are not the intended addressee, please notify the sender immediately and delete this message.
Re: zLinux question
On Tuesday, 10/30/2007 at 07:16 EDT, [EMAIL PROTECTED] wrote: Thanks Robert But we already went down this route as I explained we know it was a hardware hit to a brocade device. That really wasnt my question we were told within a few minutes the chip'ds came back VM stayed up. But zLinux crashed on the VM system. Those of course involved with the chip'd taking the errors. Im just trying to find out is this normal, there are no time out values for zLinux to wait before going down hard or if it cant get to Root lets say once it dies? Sounds that way but wanting to see if there is anything we can do to prevent this other then the obvious make sure we dont take hardware hits ;) To answer your question, an interface control check is a permanent I/O error (hence the HCPERP notifications and an EREP record was likely created). The channel subsystem has already tried all available paths to get to the device. There's nothing the guest can do to fix it. This is exactly the the kind of thing that Linux-HA and/or Tivoli System Automation for Linux [I think] can address using z/VM's HYPERSWAP command, if you have a z/OS GDPS solution. The I/O error would be trapped by the monitoring [Linux] guest and the failing volume replaced by a mirrored volume. (GDPS manages the mirroring.) Of course, if the primary and secondary volumes are coming through the same FICON switch then it won't help as much (protecting you only from port failures). Alan Altmark z/VM Development IBM Endicott