>As a result, MVSB (running in site B) lost its connectivity to the
>primary SYSPELX couple data set residing on dasd in site A, and issued the
>following message: IXC253I
>
>The above message was then issued by MVSA as well. Sadly enough, our
>alternate SYSPLEX couple data set resides on dasd in site B. So MVSA had
>no connectivity to it, which led to a Disabled Wait 0A2 RC 20 in MVSA.
When you loose the primary sysplex CDS, MVS indepently of each other will
attempt the switch to the alternate sysplex CDS. MVSB could connect since
the alternate was in site B, MVSA could not and entered the wait state
without further ado to keep data integrity. 

>After that, MVSB issued the following message:
>IXC256A REMOVAL OF PRIMARY COUPLE DATA SET XCF.COUPLE01 FOR SYSPLEX
>CANNOT COMPLETE UNTIL THE FOLLOWING SYSTEM(S) ACKNOWLEDGE THE REMOVAL:
>MVSA
>Of course, MVSA could never acknowledge since it was in a disabled wait.
And they could not communicate with each other. Does 'lost communication'
also mean that all (signalling) CTCs and all signalling structures were
lost? Answering my own question: Probably yes, since otherwise MVSB would
have known that MVSA had wait stated itself.

>IXC256A rolled off the MVSB console (which was in DEL=R mode), so by the
>time I got to the console I couldnt see it and didnt know it was issued.
>At MVSB's console, I issued a D R,R and didnt see anything.
Since this is not a reply, you wouldn't. Do you use AMRF? This is a CE
message and if AMRF is used, it would have been retained.

>After I saw why MVSA entered the wait, I issued D XCF,C at MVSB's console
>and never got a response.
Strange. I think you should have seen that the system was in the middle of
the CDS switch.

>Eventually we IPLed both MVSB and MVSA because it seemed like MVSB was
>hung...
Yes, it would. All XCF requests get delayed. And you wouldn't believe what
needs XCF these days.

>I realize there were many mistakes done along the way here, my question
>is, how could I know that IXC256A was issued if it rolled off the console
>TSO/E was hung too)?? If i knew it was issued, i would issue a V
>XCF,MVSA,OFFLINE,FORCE and let MVSB complete its couple data set switch...

Have automation in place that on IXC256a will do the following:
1. Set all consoles to DEL=RD
2. Re-issue the message in red

>Also, I dont understand the logic here. MVSA had access to the primary,
>but not to the alternate. MVSB had access to the alternate, but not to the
>primary. Still, MVSA disabled wait and MVSB stayed up, hung until MVSA
>cleanup...
Integrity requires that all members of the sysplex have to have the same
primary CDS. Without seeing the actual messages, it is unclear to me if
maybe MVSA issued the IXC253I for the alternate CDS as it couldn't get to
it. That would mean that there was sysplex communication still going on via
CF or CTC, so MVSA had seen MVSB's request to switch to the alternate (that
would get communicated across the sysplex) but due to later loss of
communication couldn't tell MVSB that it would wait state itself.

>In either case, I ended up with half a sysplex in a disabled wait and half
>hung. Which got me thinking... what if there were 7 systems on site A and
>only 1 system on site B?? would z/OS logic still be to enter 7 systems
>into a disabled wait instead of only the 1 system that lost access to the
>primary???
I believe that without SFM, yes that would be the logic on the premise that
all systems have to have the same primary and alternate CDS. But I can be
wrong on this.

>Basically you can say we learned the true value of SFM. Had we been using
>it, it would probably prevent the hang in MVSB, because it would clean up
>the mess left by MVSA after it entered the disabled wait. Would SFM also
>help in the 7-1 case??
SFM would have helped only in the sense that it would have detected that
MVSA was not updating its status anymore (since it was wait stated).
Depending on policy, you would have either gotten IXC402D or an automatic
removal from the sysplex.
I don't think that the 7-1 issue would be addressed by SFM at all, as SFM
weights are for 'status update missing' conditions, not for loss of
connectivity to the sysplex CDS due to I/O error.

Best regards, Barbara Nitz

-- 
10 GB Mailbox, 100 FreeSMS/Monat http://www.gmx.net/de/go/topmail
+++ GMX - die erste Adresse für Mail, Message, More +++

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to