Re: Recommendations for RECOVERY options
NTAC:3NS-20 IBM told me about the default. It is only changeable when you use SCOPE=CU. Didn't mean to upset anyone. Also didn't account for it being a non-work week for many. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Tom Marchant Sent: Thursday, December 29, 2016 12:50 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Recommendations for RECOVERY options On Thu, 29 Dec 2016 13:45:53 +, James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't >believe that nobody is offering suggestions for the RECOVERY parameter >when asked. This less than one day after asking the initial question. If you want to piss people off and get ignored, a good way to do it is to assume that you are entitled to a rapid response to your question. Or even any response at all. The people on this list are here voluntarily. We are here to learn and to offer help where we can. AFAIK, even the people from IBM who are here do so voluntarily. There are no Service Level Agreements, and no support contracts requiring a particular response time to your questions. This particular question is a rather esoteric one, and perhaps not many of us have experience in this particular area. I, for one, do not. You didn't even say that you were looking for the IOS Recovery options. I suppose we should have been smart enough to figure that out. >The default is: >RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100 Is it? According to the manual, PATH_INTERVAL and PATH_THRESHOLD can only be specified with SCOPE=CU. D IOS,RECOVERY will only display PATH_INTERVAL and PATH_THRESHOLD when SCOPE=CU. https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_support _knowledgecenter_en_SSLTBW-5F2.2.0_com.ibm.zos.v2r2.ieae200_ieae200362.h tm=DgIFaQ=K5gMqH44tVpW9Mb7NvpzqAFAhrpSdUITR819D8huNsU=JBknay_mAJnJ KiefH4EC1w=53xrR1GIbE_QgUP29ajWREBmhze6wlPN2cNw-sXMfAs=B2f6alOR-dsWS Q3mqmW8ax4nimxv_xD4-OGCYW2Wr-4= https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_support _knowledgecenter_en_SSLTBW-5F2.2.0_com.ibm.zos.v2r2.ieae200_ieae200360.h tm=DgIFaQ=K5gMqH44tVpW9Mb7NvpzqAFAhrpSdUITR819D8huNsU=JBknay_mAJnJ KiefH4EC1w=53xrR1GIbE_QgUP29ajWREBmhze6wlPN2cNw-sXMfAs=e-iEnhLMp2ggY jboHmbKq657V4374ig-SknztmfN3r8= -- Tom Marchant -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
Amen! It's also a non-work week for some. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Tom Marchant Sent: Thursday, December 29, 2016 10:50 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Recommendations for RECOVERY options On Thu, 29 Dec 2016 13:45:53 +, James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't >believe that nobody is offering suggestions for the RECOVERY parameter >when asked. This less than one day after asking the initial question. If you want to piss people off and get ignored, a good way to do it is to assume that you are entitled to a rapid response to your question. Or even any response at all. The people on this list are here voluntarily. We are here to learn and to offer help where we can. AFAIK, even the people from IBM who are here do so voluntarily. There are no Service Level Agreements, and no support contracts requiring a particular response time to your questions. This particular question is a rather esoteric one, and perhaps not many of us have experience in this particular area. I, for one, do not. You didn't even say that you were looking for the IOS Recovery options. I suppose we should have been smart enough to figure that out. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
On Thu, 29 Dec 2016 13:45:53 +, James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't >believe that nobody is offering suggestions for the RECOVERY parameter >when asked. This less than one day after asking the initial question. If you want to piss people off and get ignored, a good way to do it is to assume that you are entitled to a rapid response to your question. Or even any response at all. The people on this list are here voluntarily. We are here to learn and to offer help where we can. AFAIK, even the people from IBM who are here do so voluntarily. There are no Service Level Agreements, and no support contracts requiring a particular response time to your questions. This particular question is a rather esoteric one, and perhaps not many of us have experience in this particular area. I, for one, do not. You didn't even say that you were looking for the IOS Recovery options. I suppose we should have been smart enough to figure that out. >The default is: >RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100 Is it? According to the manual, PATH_INTERVAL and PATH_THRESHOLD can only be specified with SCOPE=CU. D IOS,RECOVERY will only display PATH_INTERVAL and PATH_THRESHOLD when SCOPE=CU. https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieae200/ieae200362.htm https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieae200/ieae200360.htm -- Tom Marchant -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
W dniu 2016-12-29 o 14:45, James Peddycord pisze: [...] In this forum with so many people who have so many opinions, I can't believe that nobody is offering suggestions for the RECOVERY parameter when asked. Well, I had noticed this topic, let's call it "system parameters related to given hardware" (HCD, IECIOS, SE panels) is not popular here. Regarding my settings: no RECEOVERY statement in IECIOS. => defaults I've been using this setting with FICON-attached VMAX, DMX, Shark 800 and HDS9900 (over the years). Bad cable was rare, but sometimes happened. Except error messages (few, no flood) no problems were observed. Obviously YMMV. -- Radoslaw Skorupka Lodz, Poland --- Tre tej wiadomoci moe zawiera informacje prawnie chronione Banku przeznaczone wycznie do uytku subowego adresata. Odbiorc moe by jedynie jej adresat z wyczeniem dostpu osób trzecich. Jeeli nie jeste adresatem niniejszej wiadomoci lub pracownikiem upowanionym do jej przekazania adresatowi, informujemy, e jej rozpowszechnianie, kopiowanie, rozprowadzanie lub inne dziaanie o podobnym charakterze jest prawnie zabronione i moe by karalne. Jeeli otrzymae t wiadomo omykowo, prosimy niezwocznie zawiadomi nadawc wysyajc odpowied oraz trwale usun t wiadomo wczajc w to wszelkie jej kopie wydrukowane lub zapisane na dysku. This e-mail may contain legally privileged information of the Bank and is intended solely for business use of the addressee. This e-mail may only be received by the addressee and may not be disclosed to any third parties. If you are not the intended addressee of this e-mail or the employee authorized to forward it to the addressee, be advised that any dissemination, copying, distribution or any other similar activity is legally prohibited and may be punishable. If you received this e-mail by mistake please advise the sender immediately by using the reply facility in your e-mail software and delete permanently this e-mail including any copies of it either printed or saved to hard drive. mBank S.A. z siedzib w Warszawie, ul. Senatorska 18, 00-950 Warszawa, www.mBank.pl, e-mail: kont...@mbank.pl Sd Rejonowy dla m. st. Warszawy XII Wydzia Gospodarczy Krajowego Rejestru Sdowego, nr rejestru przedsibiorców KRS 025237, NIP: 526-021-50-88. Wedug stanu na dzie 01.01.2016 r. kapita zakadowy mBanku S.A. (w caoci wpacony) wynosi 168.955.696 zotych. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
I was happy at your first post to check our values for comparison, but you have not said (up to now) where 'RECOVERY' is specified. I find none of the values you cite in PARMLIB, so we may be taking all defaults. I’m not the I/O guy but have access to most everything on the mainframe proper. IF RECOVERY is specified in the Brocade itself, I have colleagues could check there for reference. In order to get a good answer, you have start with a good question. As for the chaos caused by failing hardware, I've seen many instances over the years. What amazes me today is the resilience that z/OS exhibits. There's lots of cacophony and confusion, but MVS tends to ride it out with minimal if any effect on applications. That was not always the case. One problem is that is that if a device cannot be reached, IOS waits for the Missing Interrupt Handler interval to expire before moving on the next device. For a few devices this is not a major problem. For hundreds or thousands of devices, it can extend the agony for a log time. Getting an offending chpid offline as quickly as possible is by far the best response. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 323-715-0595 Mobile 626-543-6132 Office ⇐=== NEW robin...@sce.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of James Peddycord Sent: Thursday, December 29, 2016 6:57 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: (External):Re: Recommendations for RECOVERY options NTAC:3NS-20 Thanks for the suggestion. I will repeat the question in a couple of weeks. Jim -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Elardus Engelbrecht Sent: Thursday, December 29, 2016 8:14 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Recommendations for RECOVERY options James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't believe that nobody is offering suggestions for the RECOVERY parameter when asked. Perhaps these persons with that specific hardware combination are still on holiday. Or if someone has this specific combination like yours indeed reads your message, but never experieneced this specific bad cable problem and perhaps can't contribute. Others like me are working on other things like RACF (myself), general z/OS + JES2 + TSO/ISPF matters (also myself). There are programmers in various languages also active here. Now and then you get some one with problems with a specific product like DFDSS, DFSORT, CICS, DB2 etc. Please note: I was previously responsible for storage (SMS/HSM) and managing hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, except those from a madman earlier this month. I humbly suggest that you repeat your question in second week of January 2017 when most are then hopefully back from their holidays. Good luck in finding a good solution. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
NTAC:3NS-20 Thanks for the suggestion. I will repeat the question in a couple of weeks. Jim -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Elardus Engelbrecht Sent: Thursday, December 29, 2016 8:14 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Recommendations for RECOVERY options James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't believe that nobody is offering suggestions for the RECOVERY parameter when asked. Perhaps these persons with that specific hardware combination are still on holiday. Or if someone has this specific combination like yours indeed reads your message, but never experieneced this specific bad cable problem and perhaps can't contribute. Others like me are working on other things like RACF (myself), general z/OS + JES2 + TSO/ISPF matters (also myself). There are programmers in various languages also active here. Now and then you get some one with problems with a specific product like DFDSS, DFSORT, CICS, DB2 etc. Please note: I was previously responsible for storage (SMS/HSM) and managing hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, except those from a madman earlier this month. I humbly suggest that you repeat your question in second week of January 2017 when most are then hopefully back from their holidays. Good luck in finding a good solution. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
I've gone thru pain with new hardware coming from 2 z/10 processors, with a n ATL running 2G fiber, . VIRT tape and DASD running over 4g FICON , to 2 z/12's a new DLM and Data Domain, new 8870's all running 8g FICON running thru new brocade switches and 2 old patch panels, I've seen slowdowns, and I've seen channel paths go offline due to an excess of IOS errors. I've never looking at the recovery options and for many weeks using a fiber test kit I've found the issue and either replaced cables to/from the DASD to the switch and to / from both patch panels , most of my problems were due to bad ports on the patch panel and fortunately I had spare ports. I worked the issue from the hardware standpoint, so I have no experience with changing the IECIOS recovery options for FICON channels I'll be interested in what others have done in the same situation, SETIOS using new recovery options? or if the performance is not killing the system work the problem with the hardware Carmen - Original Message - From: "Elardus Engelbrecht" <elardus.engelbre...@sita.co.za> To: IBM-MAIN@LISTSERV.UA.EDU Sent: Thursday, December 29, 2016 8:13:58 AM Subject: Re: Recommendations for RECOVERY options James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't believe >that nobody is offering suggestions for the RECOVERY parameter when asked. Perhaps these persons with that specific hardware combination are still on holiday. Or if someone has this specific combination like yours indeed reads your message, but never experieneced this specific bad cable problem and perhaps can't contribute. Others like me are working on other things like RACF (myself), general z/OS + JES2 + TSO/ISPF matters (also myself). There are programmers in various languages also active here. Now and then you get some one with problems with a specific product like DFDSS, DFSORT, CICS, DB2 etc. Please note: I was previously responsible for storage (SMS/HSM) and managing hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, except those from a madman earlier this month. I humbly suggest that you repeat your question in second week of January 2017 when most are then hopefully back from their holidays. Good luck in finding a good solution. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
As pointed out, there may be people taking year-end holidays. Also, the list is not required to answer any question asked. They can, when they have time, provide guidance. If your question is urgent, then the best option is to contact the vendor for the equipment or software you are having issues with. They can provide the support needed. They provide the support that the list cannot do since we do not have access to your shop, your configuration, or your details. The list is more like a group of people sitting around chatting about the mainframe and supporting functions. I would suggest you may wish to start a problem ticket with your vendor(s) on this. The z13 on other lists have had issues - I would definitely start with IBM and z13 support. This may be another manifestation of microcode issues. Lizette > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of James Peddycord > Sent: Thursday, December 29, 2016 6:46 AM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Recommendations for RECOVERY options > > NTAC:3NS-20 > > Our SAN people said they saw no errors on the switch (do I believe this? > IDK) , which is dedicated to the mainframe. The problem was a bad cable > between the switch and the storage, which both mainframes share, so there was > an issue on one CHPID on each system. > > In this forum with so many people who have so many opinions, I can't believe > that nobody is offering suggestions for the RECOVERY parameter when asked. > > The default is: > RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100 > This is what caused our pain. > z/OS has to see 100 errors per minute for 10 minutes before taking the path > off of a single device. This was happening to every device. > > On our test system I started with: > RECOVERY,PATH_SCOPE=CU,PATH_INTERVAL=5,PATH_THRESHOLD=50 > z/OS has to see 50 errors per minute for 5 minutes before taking the path off > of every device in the LCU. > > I was hoping to see some real world examples that work for others. > > Jim > > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of Alan (GMAIL) Watthey > Sent: Wednesday, December 28, 2016 11:55 PM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Recommendations for RECOVERY options > > Jim, > > As the network guy who looks after the SAN I would not be expecting my z/OS > guys to do anything in this situation. In fact z/OS cannot see our whole SAN > as I have other things on it (eg. ISLs, backend tapes). > Fortunately, the Brocade switches are dedicated to the mainframes and devices > used by the mainframes here. The Brocade will see issues that it recovers > from well before z/OS sees anything. > > Of course, I don't know exactly what your problem was and what your Brocade > would have seen but I'd suggest the bottleneckmon command to detect latency > issues. Running the porterrshow command from time to time would also give you > a good idea as to whether your SFPs and fibres are performing as well as they > should. If you have the Fabric Vision/Watch licenses then you do fancy stuff > like fence errant ports before they impact anything. > > > Regards, > Alan Watthey > -Original Message- > From: James Peddycord [mailto:j...@ntrs.com] > Sent: 28 December 2016 4:56 pm > Subject: Recommendations for RECOVERY options > > NTAC:3NS-20 > We had a situation with a bad cable that resulted in a huge performance impact > due to the default way that z/OS (we are at 1.13) handles error recovery on > Ficon paths. > The symptoms were many (thousands) of IOS050I messages in the task's joblog, > followed by an IOS450E message, which took the path offline to a single > device. > This was happening for every device (around 3000) that the affected path was > attached to. > As soon as I saw the messages I configured the CHPID offline and the problem > stopped. > We have put in automation that will immediately configure a CHPID offline as > soon as a single IOS450E message is detected, and now I am experimenting with > RECOVERY options. > IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to > 1 and leave PATH_THRESHOLD=10, and adjust from there. > > Due to the paperwork involved with making any change in our environment, I > would like to implement this with a minimum of 'adjustment'. > > Does anyone have any recommendations? > We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD. > > Thanks, > Jim > -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
James Peddycord wrote: >In this forum with so many people who have so many opinions, I can't believe >that nobody is offering suggestions for the RECOVERY parameter when asked. Perhaps these persons with that specific hardware combination are still on holiday. Or if someone has this specific combination like yours indeed reads your message, but never experieneced this specific bad cable problem and perhaps can't contribute. Others like me are working on other things like RACF (myself), general z/OS + JES2 + TSO/ISPF matters (also myself). There are programmers in various languages also active here. Now and then you get some one with problems with a specific product like DFDSS, DFSORT, CICS, DB2 etc. Please note: I was previously responsible for storage (SMS/HSM) and managing hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, except those from a madman earlier this month. I humbly suggest that you repeat your question in second week of January 2017 when most are then hopefully back from their holidays. Good luck in finding a good solution. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
NTAC:3NS-20 Our SAN people said they saw no errors on the switch (do I believe this? IDK) , which is dedicated to the mainframe. The problem was a bad cable between the switch and the storage, which both mainframes share, so there was an issue on one CHPID on each system. In this forum with so many people who have so many opinions, I can't believe that nobody is offering suggestions for the RECOVERY parameter when asked. The default is: RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100 This is what caused our pain. z/OS has to see 100 errors per minute for 10 minutes before taking the path off of a single device. This was happening to every device. On our test system I started with: RECOVERY,PATH_SCOPE=CU,PATH_INTERVAL=5,PATH_THRESHOLD=50 z/OS has to see 50 errors per minute for 5 minutes before taking the path off of every device in the LCU. I was hoping to see some real world examples that work for others. Jim -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Alan (GMAIL) Watthey Sent: Wednesday, December 28, 2016 11:55 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Recommendations for RECOVERY options Jim, As the network guy who looks after the SAN I would not be expecting my z/OS guys to do anything in this situation. In fact z/OS cannot see our whole SAN as I have other things on it (eg. ISLs, backend tapes). Fortunately, the Brocade switches are dedicated to the mainframes and devices used by the mainframes here. The Brocade will see issues that it recovers from well before z/OS sees anything. Of course, I don't know exactly what your problem was and what your Brocade would have seen but I'd suggest the bottleneckmon command to detect latency issues. Running the porterrshow command from time to time would also give you a good idea as to whether your SFPs and fibres are performing as well as they should. If you have the Fabric Vision/Watch licenses then you do fancy stuff like fence errant ports before they impact anything. Regards, Alan Watthey -Original Message- From: James Peddycord [mailto:j...@ntrs.com] Sent: 28 December 2016 4:56 pm Subject: Recommendations for RECOVERY options NTAC:3NS-20 We had a situation with a bad cable that resulted in a huge performance impact due to the default way that z/OS (we are at 1.13) handles error recovery on Ficon paths. The symptoms were many (thousands) of IOS050I messages in the task's joblog, followed by an IOS450E message, which took the path offline to a single device. This was happening for every device (around 3000) that the affected path was attached to. As soon as I saw the messages I configured the CHPID offline and the problem stopped. We have put in automation that will immediately configure a CHPID offline as soon as a single IOS450E message is detected, and now I am experimenting with RECOVERY options. IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to 1 and leave PATH_THRESHOLD=10, and adjust from there. Due to the paperwork involved with making any change in our environment, I would like to implement this with a minimum of 'adjustment'. Does anyone have any recommendations? We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD. Thanks, Jim -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
Jim, As the network guy who looks after the SAN I would not be expecting my z/OS guys to do anything in this situation. In fact z/OS cannot see our whole SAN as I have other things on it (eg. ISLs, backend tapes). Fortunately, the Brocade switches are dedicated to the mainframes and devices used by the mainframes here. The Brocade will see issues that it recovers from well before z/OS sees anything. Of course, I don't know exactly what your problem was and what your Brocade would have seen but I'd suggest the bottleneckmon command to detect latency issues. Running the porterrshow command from time to time would also give you a good idea as to whether your SFPs and fibres are performing as well as they should. If you have the Fabric Vision/Watch licenses then you do fancy stuff like fence errant ports before they impact anything. Regards, Alan Watthey -Original Message- From: James Peddycord [mailto:j...@ntrs.com] Sent: 28 December 2016 4:56 pm Subject: Recommendations for RECOVERY options NTAC:3NS-20 We had a situation with a bad cable that resulted in a huge performance impact due to the default way that z/OS (we are at 1.13) handles error recovery on Ficon paths. The symptoms were many (thousands) of IOS050I messages in the task's joblog, followed by an IOS450E message, which took the path offline to a single device. This was happening for every device (around 3000) that the affected path was attached to. As soon as I saw the messages I configured the CHPID offline and the problem stopped. We have put in automation that will immediately configure a CHPID offline as soon as a single IOS450E message is detected, and now I am experimenting with RECOVERY options. IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to 1 and leave PATH_THRESHOLD=10, and adjust from there. Due to the paperwork involved with making any change in our environment, I would like to implement this with a minimum of 'adjustment'. Does anyone have any recommendations? We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD. Thanks, Jim -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
NTAC:3NS-20 The performance impact was slow I/O to that group of devices, one at a time, until the path was taken offline. To the users, transactions that usually take less than a second were taking in some cases minutes. Bad enough to be considered an 'outage' from the user's perspective. No issue with console flooding. We have been suppressing IOS050I from the console for a long time. Jim -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jousma, David Sent: Wednesday, December 28, 2016 8:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Recommendations for RECOVERY options What specifically was the performance impact? The loss of the ficon channel and reduced i/o bandwidth? Or was it the console message flooding? If the latter, implementing Message Flood automation will stop the flooding of messages. It is pretty easy to implement. Dave _ Dave Jousma Manager Mainframe Engineering, Assistant Vice President david.jou...@53.com 1830 East Paris, Grand Rapids, MI 49546 MD RSCB2H p 616.653.8429 f 616.653.2717 -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of James Peddycord Sent: Wednesday, December 28, 2016 8:56 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Recommendations for RECOVERY options NTAC:3NS-20 We had a situation with a bad cable that resulted in a huge performance impact due to the default way that z/OS (we are at 1.13) handles error recovery on Ficon paths. The symptoms were many (thousands) of IOS050I messages in the task's joblog, followed by an IOS450E message, which took the path offline to a single device. This was happening for every device (around 3000) that the affected path was attached to. As soon as I saw the messages I configured the CHPID offline and the problem stopped. We have put in automation that will immediately configure a CHPID offline as soon as a single IOS450E message is detected, and now I am experimenting with RECOVERY options. IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to 1 and leave PATH_THRESHOLD=10, and adjust from there. Due to the paperwork involved with making any change in our environment, I would like to implement this with a minimum of 'adjustment'. Does anyone have any recommendations? We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD. Thanks, Jim -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Recommendations for RECOVERY options
What specifically was the performance impact? The loss of the ficon channel and reduced i/o bandwidth? Or was it the console message flooding? If the latter, implementing Message Flood automation will stop the flooding of messages. It is pretty easy to implement. Dave _ Dave Jousma Manager Mainframe Engineering, Assistant Vice President david.jou...@53.com 1830 East Paris, Grand Rapids, MI 49546 MD RSCB2H p 616.653.8429 f 616.653.2717 -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of James Peddycord Sent: Wednesday, December 28, 2016 8:56 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Recommendations for RECOVERY options NTAC:3NS-20 We had a situation with a bad cable that resulted in a huge performance impact due to the default way that z/OS (we are at 1.13) handles error recovery on Ficon paths. The symptoms were many (thousands) of IOS050I messages in the task's joblog, followed by an IOS450E message, which took the path offline to a single device. This was happening for every device (around 3000) that the affected path was attached to. As soon as I saw the messages I configured the CHPID offline and the problem stopped. We have put in automation that will immediately configure a CHPID offline as soon as a single IOS450E message is detected, and now I am experimenting with RECOVERY options. IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to 1 and leave PATH_THRESHOLD=10, and adjust from there. Due to the paperwork involved with making any change in our environment, I would like to implement this with a minimum of 'adjustment'. Does anyone have any recommendations? We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD. Thanks, Jim -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN This e-mail transmission contains information that is confidential and may be privileged. It is intended only for the addressee(s) named above. If you receive this e-mail in error, please do not read, copy or disseminate it in any manner. If you are not the intended recipient, any disclosure, copying, distribution or use of the contents of this information is prohibited. Please reply to the message immediately by informing the sender that the message was misdirected. After replying, please erase it from your computer system. Your assistance in correcting this error is appreciated. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN