Re: Recommendations for RECOVERY options

2016-12-29 Thread James Peddycord
NTAC:3NS-20

IBM told me about the default. It is only changeable when you use
SCOPE=CU.
Didn't mean to upset anyone. Also didn't account for it being a non-work
week for many.


-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Tom Marchant
Sent: Thursday, December 29, 2016 12:50 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recommendations for RECOVERY options

On Thu, 29 Dec 2016 13:45:53 +, James Peddycord wrote:

>In this forum with so many people who have so many opinions, I can't
>believe that nobody is offering suggestions for the RECOVERY parameter
>when asked.

This less than one day after asking the initial question.

If you want to piss people off and get ignored, a good way to do it is
to assume that you are entitled to a rapid response to your question. Or
even any response at all.

The people on this list are here voluntarily. We are here to learn and
to offer help where we can. AFAIK, even the people from IBM who are here
do so voluntarily.

There are no Service Level Agreements, and no support contracts
requiring a particular response time to your questions.

This particular question is a rather esoteric one, and perhaps not many
of us have experience in this particular area. I, for one, do not. You
didn't even say that you were looking for the IOS Recovery options. I
suppose we should have been smart enough to figure that out.

>The default is:
>RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100

Is it? According to the manual, PATH_INTERVAL and PATH_THRESHOLD can
only be specified with SCOPE=CU. D IOS,RECOVERY will only display
PATH_INTERVAL and PATH_THRESHOLD when SCOPE=CU.

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_support
_knowledgecenter_en_SSLTBW-5F2.2.0_com.ibm.zos.v2r2.ieae200_ieae200362.h
tm=DgIFaQ=K5gMqH44tVpW9Mb7NvpzqAFAhrpSdUITR819D8huNsU=JBknay_mAJnJ
KiefH4EC1w=53xrR1GIbE_QgUP29ajWREBmhze6wlPN2cNw-sXMfAs=B2f6alOR-dsWS
Q3mqmW8ax4nimxv_xD4-OGCYW2Wr-4=

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ibm.com_support
_knowledgecenter_en_SSLTBW-5F2.2.0_com.ibm.zos.v2r2.ieae200_ieae200360.h
tm=DgIFaQ=K5gMqH44tVpW9Mb7NvpzqAFAhrpSdUITR819D8huNsU=JBknay_mAJnJ
KiefH4EC1w=53xrR1GIbE_QgUP29ajWREBmhze6wlPN2cNw-sXMfAs=e-iEnhLMp2ggY
jboHmbKq657V4374ig-SknztmfN3r8=

--
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread Charles Mills
Amen! It's also a non-work week for some.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Tom Marchant
Sent: Thursday, December 29, 2016 10:50 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recommendations for RECOVERY options

On Thu, 29 Dec 2016 13:45:53 +, James Peddycord wrote:

>In this forum with so many people who have so many opinions, I can't 
>believe that nobody is offering suggestions for the RECOVERY parameter 
>when asked.

This less than one day after asking the initial question.

If you want to piss people off and get ignored, a good way to do it is to 
assume that you are entitled to a rapid response to your question. Or even any 
response at all.

The people on this list are here voluntarily. We are here to learn and to offer 
help where we can. AFAIK, even the people from IBM who are here do so 
voluntarily.

There are no Service Level Agreements, and no support contracts requiring a 
particular response time to your questions.

This particular question is a rather esoteric one, and perhaps not many of us 
have experience in this particular area. I, for one, do not. You didn't even 
say that you were looking for the IOS Recovery options. I suppose we should 
have been smart enough to figure that out.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread Tom Marchant
On Thu, 29 Dec 2016 13:45:53 +, James Peddycord wrote:

>In this forum with so many people who have so many opinions, I can't
>believe that nobody is offering suggestions for the RECOVERY parameter
>when asked.

This less than one day after asking the initial question.

If you want to piss people off and get ignored, a good way to do it is to 
assume that you are entitled to a rapid response to your question. Or 
even any response at all.

The people on this list are here voluntarily. We are here to learn and to 
offer help where we can. AFAIK, even the people from IBM who are here 
do so voluntarily.

There are no Service Level Agreements, and no support contracts 
requiring a particular response time to your questions.

This particular question is a rather esoteric one, and perhaps not many 
of us have experience in this particular area. I, for one, do not. You didn't 
even say that you were looking for the IOS Recovery options. I suppose 
we should have been smart enough to figure that out.

>The default is:
>RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100

Is it? According to the manual, PATH_INTERVAL and PATH_THRESHOLD can 
only be specified with SCOPE=CU. D IOS,RECOVERY will only display 
PATH_INTERVAL and PATH_THRESHOLD when SCOPE=CU.

https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieae200/ieae200362.htm

https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieae200/ieae200360.htm

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread R.S.

W dniu 2016-12-29 o 14:45, James Peddycord pisze:

[...]
In this forum with so many people who have so many opinions, I can't
believe that nobody is offering suggestions for the RECOVERY parameter
when asked.


Well, I had noticed this topic, let's call it "system parameters related 
to given hardware" (HCD, IECIOS, SE panels) is not popular here.


Regarding my settings: no RECEOVERY statement in IECIOS. => defaults
I've been using this setting with FICON-attached VMAX, DMX, Shark 800 
and HDS9900 (over the years). Bad cable was rare, but sometimes 
happened. Except error messages (few, no flood) no problems were 
observed. Obviously YMMV.


--
Radoslaw Skorupka
Lodz, Poland






---
Tre tej wiadomoci moe zawiera informacje prawnie chronione Banku 
przeznaczone wycznie do uytku subowego adresata. Odbiorc moe by jedynie 
jej adresat z wyczeniem dostpu osób trzecich. Jeeli nie jeste adresatem 
niniejszej wiadomoci lub pracownikiem upowanionym do jej przekazania 
adresatowi, informujemy, e jej rozpowszechnianie, kopiowanie, rozprowadzanie 
lub inne dziaanie o podobnym charakterze jest prawnie zabronione i moe by 
karalne. Jeeli otrzymae t wiadomo omykowo, prosimy niezwocznie 
zawiadomi nadawc wysyajc odpowied oraz trwale usun t wiadomo 
wczajc w to wszelkie jej kopie wydrukowane lub zapisane na dysku.

This e-mail may contain legally privileged information of the Bank and is 
intended solely for business use of the addressee. This e-mail may only be 
received by the addressee and may not be disclosed to any third parties. If you 
are not the intended addressee of this e-mail or the employee authorized to 
forward it to the addressee, be advised that any dissemination, copying, 
distribution or any other similar activity is legally prohibited and may be 
punishable. If you received this e-mail by mistake please advise the sender 
immediately by using the reply facility in your e-mail software and delete 
permanently this e-mail including any copies of it either printed or saved to 
hard drive.

mBank S.A. z siedzib w Warszawie, ul. Senatorska 18, 00-950 Warszawa, 
www.mBank.pl, e-mail: kont...@mbank.pl
Sd Rejonowy dla m. st. Warszawy XII Wydzia Gospodarczy Krajowego Rejestru 
Sdowego, nr rejestru przedsibiorców KRS 025237, NIP: 526-021-50-88. 
Wedug stanu na dzie 01.01.2016 r. kapita zakadowy mBanku S.A. (w caoci 
wpacony) wynosi 168.955.696 zotych.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread Jesse 1 Robinson
I was happy at your first post to check our values for comparison, but you have 
not said (up to now) where 'RECOVERY' is specified. I find none of the values 
you cite in PARMLIB, so we may be taking all defaults. I’m not the I/O guy but 
have access to most everything on the mainframe proper. IF RECOVERY is 
specified in the Brocade itself, I have colleagues could check there for 
reference. In order to get a good answer, you have start with a good question. 

As for the chaos caused by failing hardware, I've seen many instances over the 
years. What amazes me today is the resilience that z/OS exhibits. There's lots 
of cacophony and confusion, but MVS tends to ride it out with minimal if any 
effect on applications. That was not always the case. 

One problem is that is that if a device cannot be reached, IOS waits for the 
Missing Interrupt Handler interval to expire before moving on the next device. 
For a few devices this is not a major problem. For hundreds or thousands of 
devices, it can extend the agony for a log time. Getting an offending chpid 
offline as quickly as possible is by far the best response.

.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
323-715-0595 Mobile
626-543-6132 Office ⇐=== NEW
robin...@sce.com

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of James Peddycord
Sent: Thursday, December 29, 2016 6:57 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: (External):Re: Recommendations for RECOVERY options

NTAC:3NS-20

Thanks for the suggestion. I will repeat the question in a couple of weeks.

Jim

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Elardus Engelbrecht
Sent: Thursday, December 29, 2016 8:14 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recommendations for RECOVERY options

James Peddycord wrote:

>In this forum with so many people who have so many opinions, I can't
believe that nobody is offering suggestions for the RECOVERY parameter when 
asked.

Perhaps these persons with that specific hardware combination are still on 
holiday. Or if someone has this specific combination like yours indeed reads 
your message, but never experieneced this specific bad cable problem and 
perhaps can't contribute.

Others like me are working on other things like RACF (myself), general z/OS + 
JES2 + TSO/ISPF matters (also myself). There are programmers in various 
languages also active here. Now and then you get some one with problems with a 
specific product like DFDSS, DFSORT, CICS, DB2 etc.

Please note: I was previously responsible for storage (SMS/HSM) and managing 
hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some 
experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, 
except those from a madman earlier this month.

I humbly suggest that you repeat your question in second week of January
2017 when most are then hopefully back from their holidays.

Good luck in finding a good solution.

Groete / Greetings
Elardus Engelbrecht


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread James Peddycord
NTAC:3NS-20

Thanks for the suggestion. I will repeat the question in a couple of
weeks.

Jim

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Elardus Engelbrecht
Sent: Thursday, December 29, 2016 8:14 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recommendations for RECOVERY options

James Peddycord wrote:

>In this forum with so many people who have so many opinions, I can't
believe that nobody is offering suggestions for the RECOVERY parameter
when asked.

Perhaps these persons with that specific hardware combination are still
on holiday. Or if someone has this specific combination like yours
indeed reads your message, but never experieneced this specific bad
cable problem and perhaps can't contribute.

Others like me are working on other things like RACF (myself), general
z/OS + JES2 + TSO/ISPF matters (also myself). There are programmers in
various languages also active here. Now and then you get some one with
problems with a specific product like DFDSS, DFSORT, CICS, DB2 etc.

Please note: I was previously responsible for storage (SMS/HSM) and
managing hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I
have some experience with ESCON, but not with Ficon, but I do read ALL
posts on IBM, except those from a madman earlier this month.

I humbly suggest that you repeat your question in second week of January
2017 when most are then hopefully back from their holidays.

Good luck in finding a good solution.

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread Carmen Vitullo
I've gone thru pain with new hardware coming from 2 z/10 processors, with a n 
ATL running 2G fiber, . VIRT tape and DASD running over 4g FICON , to 2 z/12's 
a new DLM and Data Domain, new 8870's all running 8g FICON  running thru new 
brocade switches and 2 old patch panels, I've seen slowdowns, and I've seen 
channel paths go offline due to an excess of IOS errors. 
  I've never looking at the recovery options and  for many weeks using a fiber 
test kit I've found the issue and either  replaced cables to/from the DASD to 
the switch and to / from both patch panels , most of my problems were due to 
bad ports on the patch panel and fortunately I had spare ports. 
 I worked the issue from the hardware standpoint, so I have no experience with 
changing the  IECIOS recovery options    for FICON channels 
I'll be interested in what others have done in the same situation, SETIOS using 
new recovery options? or if the performance is not killing the system work the 
problem with the hardware 
Carmen 


- Original Message -

From: "Elardus Engelbrecht" <elardus.engelbre...@sita.co.za> 
To: IBM-MAIN@LISTSERV.UA.EDU 
Sent: Thursday, December 29, 2016 8:13:58 AM 
Subject: Re: Recommendations for RECOVERY options 

James Peddycord wrote: 

>In this forum with so many people who have so many opinions, I can't believe 
>that nobody is offering suggestions for the RECOVERY parameter when asked. 

Perhaps these persons with that specific hardware combination are still on 
holiday. Or if someone has this specific combination like yours indeed reads 
your message, but never experieneced this specific bad cable problem and 
perhaps can't contribute. 

Others like me are working on other things like RACF (myself), general z/OS + 
JES2 + TSO/ISPF matters (also myself). There are programmers in various 
languages also active here. Now and then you get some one with problems with a 
specific product like DFDSS, DFSORT, CICS, DB2 etc. 

Please note: I was previously responsible for storage (SMS/HSM) and managing 
hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some 
experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, 
except those from a madman earlier this month. 

I humbly suggest that you repeat your question in second week of January 2017 
when most are then hopefully back from their holidays. 

Good luck in finding a good solution. 

Groete / Greetings 
Elardus Engelbrecht 

-- 
For IBM-MAIN subscribe / signoff / archive access instructions, 
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN 


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread Lizette Koehler
As pointed out, there may be people taking year-end holidays.

Also, the list is not required to answer any question asked.  They can, when
they have time, provide guidance.

If your question is urgent, then the best option is to contact the vendor for
the equipment or software you are having issues with.  They can provide the
support needed.  They provide the support that the list cannot do since we do
not have access to your shop, your configuration, or your details.

The list is more like a group of people sitting around chatting about the
mainframe and supporting functions.  

I would suggest you may wish to start a problem ticket with your vendor(s) on
this.


The z13 on other lists have had issues - I would definitely start with IBM and
z13 support.  This may be another manifestation of microcode issues.


Lizette


> -Original Message-
> From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
> Behalf Of James Peddycord
> Sent: Thursday, December 29, 2016 6:46 AM
> To: IBM-MAIN@LISTSERV.UA.EDU
> Subject: Re: Recommendations for RECOVERY options
> 
> NTAC:3NS-20
> 
> Our SAN people said they saw no errors on the switch (do I believe this?
> IDK) , which is dedicated to the mainframe. The problem was a bad cable
> between the switch and the storage, which both mainframes share, so there was
> an issue on one CHPID on each system.
> 
> In this forum with so many people who have so many opinions, I can't believe
> that nobody is offering suggestions for the RECOVERY parameter when asked.
> 
> The default is:
> RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100
> This is what caused our pain.
> z/OS has to see 100 errors per minute for 10 minutes before taking the path
> off of a single device. This was happening to every device.
> 
> On our test system I started with:
> RECOVERY,PATH_SCOPE=CU,PATH_INTERVAL=5,PATH_THRESHOLD=50
> z/OS has to see 50 errors per minute for 5 minutes before taking the path off
> of every device in the LCU.
> 
> I was hoping to see some real world examples that work for others.
> 
> Jim
> 
> -Original Message-
> From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
> Behalf Of Alan (GMAIL) Watthey
> Sent: Wednesday, December 28, 2016 11:55 PM
> To: IBM-MAIN@LISTSERV.UA.EDU
> Subject: Re: Recommendations for RECOVERY options
> 
> Jim,
> 
> As the network guy who looks after the SAN I would not be expecting my z/OS
> guys to do anything in this situation.  In fact z/OS cannot see our whole SAN
> as I have other things on it (eg. ISLs, backend tapes).
> Fortunately, the Brocade switches are dedicated to the mainframes and devices
> used by the mainframes here.  The Brocade will see issues that it recovers
> from well before z/OS sees anything.
> 
> Of course, I don't know exactly what your problem was and what your Brocade
> would have seen but I'd suggest the bottleneckmon command to detect latency
> issues.  Running the porterrshow command from time to time would also give you
> a good idea as to whether your SFPs and fibres are performing as well as they
> should.  If you have the Fabric Vision/Watch licenses then you do fancy stuff
> like fence errant ports before they impact anything.
> 
> 
> Regards,
> Alan Watthey
> -Original Message-
> From: James Peddycord [mailto:j...@ntrs.com]
> Sent: 28 December 2016 4:56 pm
> Subject: Recommendations for RECOVERY options
> 
> NTAC:3NS-20
> We had a situation with a bad cable that resulted in a huge performance impact
> due to the default way that z/OS (we are at 1.13) handles error recovery on
> Ficon paths.
> The symptoms were many (thousands) of IOS050I messages in the task's joblog,
> followed by an IOS450E message, which took the path offline to a single
> device.
> This was happening for every device (around 3000) that the affected path was
> attached to.
> As soon as I saw the messages I configured the CHPID offline and the problem
> stopped.
> We have put in automation that will immediately configure a CHPID offline as
> soon as a single IOS450E message is detected, and now I am experimenting with
> RECOVERY options.
> IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to
> 1 and leave PATH_THRESHOLD=10, and adjust from there.
> 
> Due to the paperwork involved with making any change in our environment, I
> would like to implement this with a minimum of 'adjustment'.
> 
> Does anyone have any recommendations?
> We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD.
> 
> Thanks,
> Jim
> 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread Elardus Engelbrecht
James Peddycord wrote:

>In this forum with so many people who have so many opinions, I can't believe 
>that nobody is offering suggestions for the RECOVERY parameter when asked.

Perhaps these persons with that specific hardware combination are still on 
holiday. Or if someone has this specific combination like yours indeed reads 
your message, but never experieneced this specific bad cable problem and 
perhaps can't contribute.

Others like me are working on other things like RACF (myself), general z/OS + 
JES2 + TSO/ISPF matters (also myself). There are programmers in various 
languages also active here. Now and then you get some one with problems with a 
specific product like DFDSS, DFSORT, CICS, DB2 etc. 

Please note: I was previously responsible for storage (SMS/HSM) and managing 
hardware (3380 and later 3390, 3490 tapes and Magstar, etc.). I have some 
experience with ESCON, but not with Ficon, but I do read ALL posts on IBM, 
except those from a madman earlier this month.

I humbly suggest that you repeat your question in second week of January 2017 
when most are then hopefully back from their holidays.

Good luck in finding a good solution. 

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-29 Thread James Peddycord
NTAC:3NS-20

Our SAN people said they saw no errors on the switch (do I believe this?
IDK) , which is dedicated to the mainframe. The problem was a bad cable
between the switch and the storage, which both mainframes share, so
there was an issue on one CHPID on each system.

In this forum with so many people who have so many opinions, I can't
believe that nobody is offering suggestions for the RECOVERY parameter
when asked.

The default is:
RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100
This is what caused our pain.
z/OS has to see 100 errors per minute for 10 minutes before taking the
path off of a single device. This was happening to every device.

On our test system I started with:
RECOVERY,PATH_SCOPE=CU,PATH_INTERVAL=5,PATH_THRESHOLD=50
z/OS has to see 50 errors per minute for 5 minutes before taking the
path off of every device in the LCU.

I was hoping to see some real world examples that work for others.

Jim

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Alan (GMAIL) Watthey
Sent: Wednesday, December 28, 2016 11:55 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recommendations for RECOVERY options

Jim,

As the network guy who looks after the SAN I would not be expecting my
z/OS guys to do anything in this situation.  In fact z/OS cannot see our
whole SAN as I have other things on it (eg. ISLs, backend tapes).
Fortunately, the Brocade switches are dedicated to the mainframes and
devices used by the mainframes here.  The Brocade will see issues that
it recovers from well before z/OS sees anything.

Of course, I don't know exactly what your problem was and what your
Brocade would have seen but I'd suggest the bottleneckmon command to
detect latency issues.  Running the porterrshow command from time to
time would also give you a good idea as to whether your SFPs and fibres
are performing as well as they should.  If you have the Fabric
Vision/Watch licenses then you do fancy stuff like fence errant ports
before they impact anything.


Regards,
Alan Watthey
-Original Message-
From: James Peddycord [mailto:j...@ntrs.com]
Sent: 28 December 2016 4:56 pm
Subject: Recommendations for RECOVERY options

NTAC:3NS-20
We had a situation with a bad cable that resulted in a huge performance
impact due to the default way that z/OS (we are at 1.13) handles error
recovery on Ficon paths.
The symptoms were many (thousands) of IOS050I messages in the task's
joblog, followed by an IOS450E message, which took the path offline to a
single device.
This was happening for every device (around 3000) that the affected path
was attached to.
As soon as I saw the messages I configured the CHPID offline and the
problem stopped.
We have put in automation that will immediately configure a CHPID
offline as soon as a single IOS450E message is detected, and now I am
experimenting with RECOVERY options.
IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to
1 and leave PATH_THRESHOLD=10, and adjust from there.

Due to the paperwork involved with making any change in our environment,
I would like to implement this with a minimum of 'adjustment'.

Does anyone have any recommendations?
We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx
DASD.

Thanks,
Jim


--
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-28 Thread Alan (GMAIL) Watthey
Jim,

As the network guy who looks after the SAN I would not be expecting my z/OS
guys to do anything in this situation.  In fact z/OS cannot see our whole
SAN as I have other things on it (eg. ISLs, backend tapes).  Fortunately,
the Brocade switches are dedicated to the mainframes and devices used by the
mainframes here.  The Brocade will see issues that it recovers from well
before z/OS sees anything.

Of course, I don't know exactly what your problem was and what your Brocade
would have seen but I'd suggest the bottleneckmon command to detect latency
issues.  Running the porterrshow command from time to time would also give
you a good idea as to whether your SFPs and fibres are performing as well as
they should.  If you have the Fabric Vision/Watch licenses then you do fancy
stuff like fence errant ports before they impact anything.


Regards,
Alan Watthey
-Original Message-
From: James Peddycord [mailto:j...@ntrs.com] 
Sent: 28 December 2016 4:56 pm
Subject: Recommendations for RECOVERY options

NTAC:3NS-20
We had a situation with a bad cable that resulted in a huge performance
impact due to the default way that z/OS (we are at 1.13) handles error
recovery on Ficon paths.
The symptoms were many (thousands) of IOS050I messages in the task's joblog,
followed by an IOS450E message, which took the path offline to a single
device.
This was happening for every device (around 3000) that the affected path was
attached to.
As soon as I saw the messages I configured the CHPID offline and the problem
stopped.
We have put in automation that will immediately configure a CHPID offline as
soon as a single IOS450E message is detected, and now I am experimenting
with RECOVERY options.
IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to 1
and leave PATH_THRESHOLD=10, and adjust from there.

Due to the paperwork involved with making any change in our environment, I
would like to implement this with a minimum of 'adjustment'.

Does anyone have any recommendations?
We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx
DASD.

Thanks,
Jim


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-28 Thread James Peddycord
NTAC:3NS-20

The performance impact was slow I/O to that group of devices, one at a
time, until the path was taken offline. To the users, transactions that
usually take less than a second were taking in some cases minutes. Bad
enough to be considered an 'outage' from the user's perspective.
No issue with console flooding. We have been suppressing IOS050I from
the console for a long time.

Jim

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Jousma, David
Sent: Wednesday, December 28, 2016 8:03 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recommendations for RECOVERY options

What specifically was the performance impact?  The loss of the ficon
channel and reduced i/o bandwidth?  Or was it the console message
flooding?  If the latter, implementing Message Flood automation will
stop the flooding of messages.  It is pretty easy to implement.

Dave

_
Dave Jousma
Manager Mainframe Engineering, Assistant Vice President
david.jou...@53.com
1830 East Paris, Grand Rapids, MI  49546 MD RSCB2H p 616.653.8429 f
616.653.2717


-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of James Peddycord
Sent: Wednesday, December 28, 2016 8:56 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Recommendations for RECOVERY options

NTAC:3NS-20
We had a situation with a bad cable that resulted in a huge performance
impact due to the default way that z/OS (we are at 1.13) handles error
recovery on Ficon paths.
The symptoms were many (thousands) of IOS050I messages in the task's
joblog, followed by an IOS450E message, which took the path offline to a
single device.
This was happening for every device (around 3000) that the affected path
was attached to.
As soon as I saw the messages I configured the CHPID offline and the
problem stopped.
We have put in automation that will immediately configure a CHPID
offline as soon as a single IOS450E message is detected, and now I am
experimenting with RECOVERY options.
IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to
1 and leave PATH_THRESHOLD=10, and adjust from there.

Due to the paperwork involved with making any change in our environment,
I would like to implement this with a minimum of 'adjustment'.

Does anyone have any recommendations?
We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx
DASD.

Thanks,
Jim


--
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

This e-mail transmission contains information that is confidential and
may be privileged.   It is intended only for the addressee(s) named
above. If you receive this e-mail in error, please do not read, copy or
disseminate it in any manner. If you are not the intended recipient, any
disclosure, copying, distribution or use of the contents of this
information is prohibited. Please reply to the message immediately by
informing the sender that the message was misdirected. After replying,
please erase it from your computer system. Your assistance in correcting
this error is appreciated.

--
For IBM-MAIN subscribe / signoff / archive access instructions, send
email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Recommendations for RECOVERY options

2016-12-28 Thread Jousma, David
What specifically was the performance impact?  The loss of the ficon channel 
and reduced i/o bandwidth?  Or was it the console message flooding?  If the 
latter, implementing Message Flood automation will stop the flooding of 
messages.  It is pretty easy to implement.

Dave

_
Dave Jousma
Manager Mainframe Engineering, Assistant Vice President
david.jou...@53.com
1830 East Paris, Grand Rapids, MI  49546 MD RSCB2H
p 616.653.8429
f 616.653.2717


-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of James Peddycord
Sent: Wednesday, December 28, 2016 8:56 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Recommendations for RECOVERY options

NTAC:3NS-20
We had a situation with a bad cable that resulted in a huge performance impact 
due to the default way that z/OS (we are at 1.13) handles error recovery on 
Ficon paths.
The symptoms were many (thousands) of IOS050I messages in the task's joblog, 
followed by an IOS450E message, which took the path offline to a single device.
This was happening for every device (around 3000) that the affected path was 
attached to.
As soon as I saw the messages I configured the CHPID offline and the problem 
stopped.
We have put in automation that will immediately configure a CHPID offline as 
soon as a single IOS450E message is detected, and now I am experimenting with 
RECOVERY options.
IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to 1 and 
leave PATH_THRESHOLD=10, and adjust from there.

Due to the paperwork involved with making any change in our environment, I 
would like to implement this with a minimum of 'adjustment'.

Does anyone have any recommendations?
We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD.

Thanks,
Jim


--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

This e-mail transmission contains information that is confidential and may be 
privileged.   It is intended only for the addressee(s) named above. If you 
receive this e-mail in error, please do not read, copy or disseminate it in any 
manner. If you are not the intended recipient, any disclosure, copying, 
distribution or use of the contents of this information is prohibited. Please 
reply to the message immediately by informing the sender that the message was 
misdirected. After replying, please erase it from your computer system. Your 
assistance in correcting this error is appreciated.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN