Bill, Thanks a lot for the explanation. Gil.
On 11/28/05, Bill Neiman <[EMAIL PROTECTED]> wrote: > > Gil, > > When any system detects a permanent I/O error during an attempt to > access a couple data set, it initiates removal of that CDS from service. > The removal protocol involves notifying all other systems of the error by > XCF signal, which causes each of the other systems to remove the CDS from > service as well. Although you say you lost connectivity between your > sites, it must have been the case that signalling connectivity still > existed between them. Otherwise, MVSA could not have reacted to the loss > of the primary sysplex CDS detected by MVSB. The existence of signalling > connectivity created a race condition, in which MVSA and MVSB were > competing to detect and report the loss of access to the CDS at their > respective sites. MVSB won the race, detecting and signalling the loss of > the primary CDS before MVSA detected loss of the alternate. MVSA got > MVSB's signal, initiated removal of the primary, and then detected the > inaccessibility of the alternate. In that situation, with only one CDS > remaining, MVSA wait states but does not signal loss of the remaining CDS, > in the hope that its access problem is only a local issue (which it was). > MVSB therefore remained alive, because it was still able to use the > alternate CDS. > > The CDS removal protocol requires that each system acknowledge the > removal signals sent by each other system. MVSA apparently died before > acknowledging one of MVSB's signals, so MVSB was unable to complete > removal of the primary CDS. Hence the IXC256A message. I'm not sure why > a D R,R failed to display the outstanding message, since IXC256A is issued > with descriptor code 11. Our usual recommendation is that either (1) the > installation maintain a console defined with DEL(RD) and routecode and > level attributes that collect action and eventual action messages, and / > or (2) automate IXC256A. > > In the 7-1 case, the same race condition would exist. If the "1" > system detected and signalled the loss of one CDS before any of the "7" > systems detected and signalled the loss of the other, you'd wind up with 7 > systems down and 1 up but hung waiting for the resolution of IXC256A. > > To resolve IXC256A in this situation, it is necessary to partition > the (wait-stated) systems named in it out of the sysplex. Since a > permanent error involving the sysplex CDS is in progress, this would > require the FORCE form of the V XCF command (V XCF,sysname,OFF,FORCE). > This response is documented with IXC256A. > > Bill Neiman > z/OS Development > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO > Search the archives at http://bama.ua.edu/archives/ibm-main.html > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html