Barbara,

"Thank You!" for the heads-up. I'll be on that migration path in the next 
twelve months or so.

Bob


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On Behalf Of 
Barbara Nitz
Sent: Tuesday, October 11, 2011 11:53 PM
To: IBM-MAIN@bama.ua.edu
Subject: maxsystem in a sysplex - belated heads-up

When we migrated from a z9 to a z196 - big-bang swap - we had a lot of trouble 
getting one sysplex up. The first system came up fine, but the second or third 
system could not be ipl'd into that plex. It ended up on a GRS (!) wait0A3 
rsn9C. The accompanying ISG message would only have been readable in a 
standalone dump, but never on a real console. My colleagues had not taken an 
sadump. z/OS 1.10.

It turned out that there was nothing whatsoever wrong with the GRS lock 
structure. There was nothing wrong with GRS *at all*. The cause of this was an 
IBM design change in z/OS 1.9, where IBM unilaterally decided to give up on the 
concept of maxsystem determining how many systems can join a sysplex. Our 
sysplex CDS was formatted with maxsystem(5) (because there used to be 5 systems 
in that sysplex - 2 of them gone for more than a year, and both of them 
occupying the first two slots in the sysplex CDS). The CFRM CDS had to get 
reformatted for the big-bang replacement, and it got formatted with 
maxsystem(3), which reflected the true capacity of the sysplex.

Well, the capacity of *that* sysplex was exactly one, because every other 
system would get a 'CFRM CDS unusable', due to the fact that the sysplex CDS 
had a higher maxsystem value. In addition, it was clearly visible that the 
incoming system *had* established signalling connectivity with the system 
already in the sysplex, which it could only do by successfully!!! reading the 
CFRM CDS to get at the names of the signalling structures. In addition, the 
reply I to 're-initialize the sysplex' when the first system was IPL'd (plus 
the accompanying explanation in the docs that everything in the sysplex will be 
treated as residual) are wrong. *Nothing* is treated as residual.

Looking at the 1.12 documentation for the maxsystem parm (right, *everybody* 
looks at that book *every* time a canned job that existed since the dawn of 
sysplex is submitted) says this:
"When formatting the couple data set to contain the CFRM policy data,  ensure 
that the value you specify for MAXSYSTEM matches the value for  MAXSYSTEM that 
was used when the sysplex couple data set was formatted. When coupling facility 
structures are used for XCF signaling, if the MAXSYSTEM value specified for the 
CFRM couple data set is less than that of the sysplex couple data set, systems 
might not be able to join the sysplex. For example, if MAXSYSTEM=16 is 
specified for the sysplex  couple data set and MAXSYSTEM=8 is specified for the 
CFRM couple data set, then only eight systems will be allowed in the sysplex."
This clearly implies that the lower of all the maxsystem values is the capacity 
of the sysplex. IT IS NOT. It is unpredictable, especially if your sysplex CDS 
is so old that it still has systems in it that are long gone. Which will be 
preserved along with all the junk that might have once been in the sysplex.

IBM told me (and I give a big thanks to the lady who actually went beyond the 
canned answer I first got and *looked* into this despite the fact that all I 
had in terms of docs was a syslog) that all of this is broken as designed:

"Cluster MR support introduced the requirement to preserve information  about 
manageable resources (which includes sysplexes, systems, CF's,  structures and 
connectors) across a sysplex-wide IPL. To XCF, a sysplex, system, etc. that 
terminates and is reIPLed is an entirely new entity.  To the MR infrastructure, 
however, it is the same entity transitioning between different states.  
Therefore, XCF as of V1R9 needs to preserve and reuse system slots whenever 
possible,        
regardless of the reply "I" to IXC405D, as we see." and
"We agree that the design change in R9 does make the actual system capacity 
unpredictable. Also, the MAXSYSTEM write up in Setting up a Sysplex needs to be 
cleaned up. Thus, a documentation update is in order."

The unpredictability of maxsystem is apparently addressed in 1.12 by 'fixing' 
SUG apar OA27634 which makes the GRS wait state message visible by spitting out 
a readable message instead of wait stating the system. 

So, if you have old and long gone systems in your sysplex CDS, happen to get a 
lower maxsystem value in your CFRM CDS and end up in a wait0A3 - delete and 
reformat your sysplex CDS. That will 'fix' the problem that has nothing 
whatsoever to do with GRS. Do NOT be fooled by the fact that signalling is 
established and the CFRM CDS was usable for *that* - such inconsistencies we 
are not supposed to care about. 

In any case, we will never run into this again. Permanent part of our DR setup 
is now to always delete both the sysplex CDS and the CFRM CDS and redefine them 
freshly in order to avoid this unpredictability.

Barbara Nitz

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to