Re: maxsystem in a sysplex - belated heads-up
I have spoken with the reviewer to clarify the request, and requirement MR1026112735 has now been updated to a status of Recognized. This means that IBM concurs that it is a desirable function but makes no explicit commitment as to when or whether it can be incorporated into the product. I'll work with the planners to see if we can get it done. Bill Neiman Parallel Sysplex Development, IBM -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN
Re: maxsystem in a sysplex - belated heads-up
Barbara Nitz nitz-...@gmx.net wrote in message news:9193446080894951.wa.nitzibmgmx@bama.ua.edu... Barbara - I've asked our account team to add us as concurring with the requirement and to be added to the Interested Parties list. The requirement has been rejected with this: User Group Number - MR1026112735 Title - Provide a utility to remove residual systems from the sysplex CDS IBM believes that the request described has been solved with a current product / service / policy / etc. Please see apar OA37776 That apar is closed DOC and describes the current behaviour. It does not 'solve' anything. Why am I not surprised? So to repeat the warning: If you have ever terminated any lpar and gotten rid of it never to IPL it again, make sure you do a sysplex IPL on freshly formatted sysplex CDSs. Otherwise you will need to increase MAXSYSTEM to higher and higher values on all your CDSs just to keep old junk in the sysplex CSD around. Barbara Yes, sounds contradictory: IBM recommending a sysplex wide IPL, what was a sysplex intended for in the first place? They did solve a similar issue with RRS, where you not could get rid of your Archive logstream without an RRS cold start, until they decided to develop functionality to drop and remove the logstream dynamically. Kees. For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN
Re: maxsystem in a sysplex - belated heads-up
Yes, sounds contradictory: IBM recommending a sysplex wide IPL, what was a sysplex intended for in the first place? I didn't read that *IBM* recommends a sysplex-wide IPL. *I* do to get rid of the junk that IBM accumulates in the sysplex CDS. I believe IBM frowns upon sysplex-wide IPLs. On the other hand, they are unwilling to provide a utility that lets you do cleanup. They did solve a similar issue with RRS, where you not could get rid of your Archive logstream without an RRS cold start, until they decided to develop functionality to drop and remove the logstream dynamically. Different components with wildly different attitudes. 'nuff said. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN
Re: maxsystem in a sysplex - belated heads-up
Barbara - I've asked our account team to add us as concurring with the requirement and to be added to the Interested Parties list. The requirement has been rejected with this: User Group Number - MR1026112735 Title - Provide a utility to remove residual systems from the sysplex CDS IBM believes that the request described has been solved with a current product / service / policy / etc. Please see apar OA37776 That apar is closed DOC and describes the current behaviour. It does not 'solve' anything. Why am I not surprised? So to repeat the warning: If you have ever terminated any lpar and gotten rid of it never to IPL it again, make sure you do a sysplex IPL on freshly formatted sysplex CDSs. Otherwise you will need to increase MAXSYSTEM to higher and higher values on all your CDSs just to keep old junk in the sysplex CSD around. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN
Re: maxsystem in a sysplex - belated heads-up
Barbara - I've asked our account team to add us as concurring with the requirement and to be added to the Interested Parties list. Jerry Whitteridge Design Engineer Safeway Inc. 925 951 4184 If you feel in control you just aren't going fast enough. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On Behalf Of Barbara Nitz Sent: Wednesday, October 26, 2011 9:08 PM To: IBM-MAIN@bama.ua.edu Subject: Re: maxsystem in a sysplex - belated heads-up To finish this thread: There is now marketing requirement MR1026112735 that describes the need for a cleanup utility and asks for a way to really re-initialize the sysplex on the ixc405 message. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html Email Firewall made the following annotations. -- Warning: All e-mail sent to this address will be received by the corporate e-mail system, and is subject to archival and review by someone other than the recipient. This e-mail may contain proprietary information and is intended only for the use of the intended recipient(s). If the reader of this message is not the intended recipient(s), you are notified that you have received this message in error and that any review, dissemination, distribution or copying of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately. == -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
To finish this thread: There is now marketing requirement MR1026112735 that describes the need for a cleanup utility and asks for a way to really re-initialize the sysplex on the ixc405 message. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
Bill and Mark, I am aware of the chicken-and-egg problem. It was explained to me back in the nineties when I first learned that CTCs were not required anymore to establish signalling because it was the first thing I asked about. IBM obviously does not get why I am saying that XCF lies to me/us. My problem is not that limited interface, the problem is that a customer sees that 'signalling is established', which in turn says that the CFRM CDS is usable. And *then* the system wait states saying that the CFRM CDS is NOT usable. That certainly did NOT make any sense to my colleagues that night (*if* they had been able to see the wait state message from GRS, which they could not because it disappears before a human can even see that there is a message. That is supposedly fixed in 1.12.) So what's left is that a design change in XCF results in a GRS wait state saying that the GRS lock structure is unusable. That makes even less sense in such a situation, since one system is already using the GRS structure. And that was all the external information they had. It certainly does not give any hint to the true cause of the problem, much less point the way in the right direction. From a customer perspective, all I care about in such a situation is getting my sysplex back up, and that is almost impossible with the contradicting information I get. I don't really care what component is 'at fault'. I am not looking for fault, I am looking to fix things. But thanks for clarifying that this can happen with any CDS, not just the CFRM CDS. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
To clarify some technical issues that have been discussed in this thread: On the second and subsequent systems to join the sysplex, XCF signaling has a chicken-and-egg problem when it attempts to determine what signaling connectivity exists with other systems. It may be that the only signaling paths available are via XCF signaling structures, but it must perform this evaluation at a point in IPL *before* the CFRM couple data set has been brought into use. It therefore does not use the CFRM CDS in the normal manner. It is permitted to perform a limited subset of the operations normally available, viz., unserialized reads of individual subrecords of the CFRM active policy, to determine what structures exist. Unserialized reads are permitted specifically because they do not depend on the MAXSYSTEM parameter with which the CDS was formatted, whereas serialized operations do depend on MAXSYSTEM. (The number of systems for which the CDS was formatted determines the number of lock blocks that are used to ensure that operations initiated by one sy! stem do not collide with operations initiated by another.) XCF is not lying to you when it establishes connectivity via signaling structures but later rejects the CFRM CDS because it is formatted for too few systems. XCF does not and never will wait state simply because it is unable to use a function CDS (i.e., a CDS of type other than sysplex). In the specific case of the CFRM CDS, XCF cannot determine that the installation is trying to IPL in GRS star mode, nor does XCF even understand that the CFRM CDS is required for star mode. For other CDS types, it is perfectly possible for a system to continue running if it doesn't have access to that CDS. I don't think it would be popular if XCF wait stated because the CFRM CDS was formatted for foo few systems when the plex was running in ring mode, or if it wait stated because the n'th system into the plex couldn't use the ARM (or SFM or Logger or WLM or BPX) CDS. Instead, we should - and do - permit the system to IPL into the plex, report that it can't use the function, and allow the installation to take corrective action by making a larger CDS available. The change to retain information about inactive systems in the sysplex CDS was implemented in support of a business resilience function that provided infrastructure for automated management of sysplex resources. That structure required the ability to retain sysplex-related information even across sysplex-wide IPLs. Because of the retention requirement, I doubt that we would implement a function to automatically purge inactive systems after some period. We would have the same problem we always have when trying to choose a threshold - whatever we chose would be too short for some installations, too long for others, and not correct for anyone. We'd have to introduce yet another parameter for customers to manage, to control the retention period. Bill Neiman Parallel Sysplex development IBM -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
Hi, I would also point out that from an XCF design perspective, it is perfectly acceptable to have a function CDS be accessible from a subset of the systems, In practice, the requirement imposed by the exploiters is that the function CDS be accessible from all systems that require the function. For many exploiters, that is tantamount to requiring that the function CDS be accessible from every system in the sysplex. But it is not a requirement that XCF imposes. The XCF signalling service does NOT lie about establishing signalling connectivity with other systems. If we issue a message saying we established it, then we did. It might change a nanosecond later, but as of the moment that we decided to issue the message, it was truth. Bill Neiman is correct about the chicken and egg situation. To be a little more precise, the IPLing system takes a look at the signalling structures that have been defined for its use. We then peek inside the CFRM CDS to see if those structures have been physically allocated. For those structures that exist, we then do a limited form of connect to the structure that permits us to go read XCF control data from the respective signalling structures. We use that information to determine what other systems are using the structure for signalling. Based on that data, we then forecast what signalling paths would be established if the IPL were to continue. If the predicted paths (along with any ones that have actually been established -- ie CTC paths) are not sufficient to establish connectivity, XCF complains about insufficient signalling paths and prompts the operator to resolve the problem. If the predicted paths appear to be sufficient for establishing connectivity, we allow the system to proceed to become active in the sysplex. As soon as the system becomes active in the sysplex, it can now use the CFRM policy and related CF structures for real. So the first thing XCF does is connect to the signal structures for real and attempt to establish full signalling connectivity. If we do establish connectivity with every other system in the sysplex, we issue truthful messages to say so and the IPL proceeds. If we cannot establish signalling connectivity with every other system in the sysplex, the IPL does NOT proceed. And we engage the operator to express our displeasure and plead for resolution. Only on the rarest of occasions have I ever seen situations where the predicted paths fail to become real paths and connectivity is not established. I can certainly envision cases where it wouldn't work. The simplest one would be to have the active systems stop using the structure (s) for signalling between the time that the IPLing system looks to see who was using the structures and the time it gets around to actually trying to establish those paths for real. In short, an IPLing system will not make it into the sysplex unless it appears to have a reasonable chance of establishing signalling connectivity, and the IPL will not proceed unless it actually can. The interior workings are likely a mystery to many. I suspect there is much that can be misinterpreted. But we do not lie. If a message was in fact issued to indicate that connectivity was established when it in fact had not, please open a defect and we will fix it. We do not intentionally issue false and misleading messages. Mark A. Brooks z/OS Sysplex design and development 845-435-5149 T/L 8-295-5149 Poughkeepsie, NY mabr...@us.ibm.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
Barbara, If you were to open a requirement requesting a utility of some kind to report and delete residual system information from the sysplex couple data set, we might be able to do something about this. I don't think it would be hard to implement something similar to IXCMIAPU for this purpose. Bill Neiman Parallel Sysplex development IBM -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
How about expiring the data from slot not used for 30 days? An IPL after that would retrieve the needed info in a few seconds when it joins the sysplex? On Thu, Oct 20, 2011 at 8:21 AM, Bill Neiman nei...@us.ibm.com wrote: Barbara, If you were to open a requirement requesting a utility of some kind to report and delete residual system information from the sysplex couple data set, we might be able to do something about this. I don't think it would be hard to implement something similar to IXCMIAPU for this purpose. Bill Neiman Parallel Sysplex development IBM -- Mike A Schwab, Springfield IL USA Where do Forest Rangers go to get away from it all? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
How about expiring the data from slot not used for 30 days? An IPL after that would retrieve the needed info in a few seconds when it joins the sysplex? I like that idea. But as I said in my (private) note to Bill, since the design change from z/OS 1.9 isn't externally documented (that I know of) this might not be possible. My suggestion to get out of this mess is different: Introduce a new wait0A2 (the general XCF wait state) reason code whenever a functional CDS has a maxsystem smaller than the sysplex CDS and don't allow even the first system into the plex. By first system I mean the system that issued message IXC405D and that got the reply 'I' for re-initialization of the plex. This would automatically take care of the other problem that isn't addressed at all as far as I can see: The CFRM CDS *was* usable enough for XCF to go there, read it, get the names of the signalling structures and *establish signalling* with the system that was already in the plex. If a CFRM CDS is declared not usable, then it cannot have been used to establish signalling via structures. In essence, XCF LIES when it says that signalling (both pathin and pathout) was established with the system just joining the plex. This will complicate *any* problem determination and send a customer in the wrong direction every time. It also means that I cannot trust *any* message content anymore - it may be a lie. Personally, I find that unacceptable. (But who am I?) I take the MR for the utility (or rather, have the MR taken for me), if that's all IBM is willing/able to commit to. As of yesterday, IBM has taken apar OA37776, which is supposed to address the documentation issue. Nothing will get fixed via this apar, AFAIK. The complete recreation scenario (including how to 'prime' the sysplex CDS) is in there. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
Kees, Thanks, we will be adding systems to our sysplexes soon, which fit within our maxsystems, but now I have to recheck if they really fit next to the current systens and ancient rubbish. maybe you can test for me if one can find out via simple display commands what rubbish is kept in the sysplex CDS (I obviously cannot, since I've cleaned up all sysplex CDSs with residual information): D XCF,S,ALL supposedly lists all systems in the sysplex. It would be interesting to see if those are really all of them or only those that are active. The book isn't really clear on that distinction. D XCF,GRP will give a list of all defined groups. That means also groups that are no longer valid in the plex (and will never become active again) will still be listed (XCF group member state changes when permanent status recording is on are complicated enough in theory and in the books, but that design change might have also changed this further, without documentation update). From here on out you have to specify each group name individually to see all member(s). So that is not exactly easy to determine how much rubbish might have accumulated. For comparison, take a dump of XCFAS and all its dataspace and then issue the ipcs couple sysplex detail and couple group detail commands to see what *that* might tell you (and where it differs from the displayed information). The way I determined that there were residual systems in the sysplex CDS (which was inactive at the time, so no display commands possible) was to simply *review* browse (from the cbttape) the sysplex CDS. The system names in there fairly leap at you. Looking at group information is harder, as that requires switching left and right. Thanks in advance, Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
Barbara Nitz nitz-...@gmx.net wrote in message news:8637203851702442.wa.nitzibmgmx@bama.ua.edu... Kees, Thanks, we will be adding systems to our sysplexes soon, which fit within our maxsystems, but now I have to recheck if they really fit next to the current systens and ancient rubbish. maybe you can test for me if one can find out via simple display commands what rubbish is kept in the sysplex CDS (I obviously cannot, since I've cleaned up all sysplex CDSs with residual information): D XCF,S,ALL supposedly lists all systems in the sysplex. It would be interesting to see if those are really all of them or only those that are active. The book isn't really clear on that distinction. D XCF,GRP will give a list of all defined groups. That means also groups that are no longer valid in the plex (and will never become active again) will still be listed (XCF group member state changes when permanent status recording is on are complicated enough in theory and in the books, but that design change might have also changed this further, without documentation update). From here on out you have to specify each group name individually to see all member(s). So that is not exactly easy to determine how much rubbish might have accumulated. For comparison, take a dump of XCFAS and all its dataspace and then issue the ipcs couple sysplex detail and couple group detail commands to see what *that* might tell you (and where it differs from the displayed information). The way I determined that there were residual systems in the sysplex CDS (which was inactive at the time, so no display commands possible) was to simply *review* browse (from the cbttape) the sysplex CDS. The system names in there fairly leap at you. Looking at group information is harder, as that requires switching left and right. Thanks in advance, Barbara Barbara, I already did some research . I used IDCAMS PRINT to check the contents of the CDSs and found only valid systemids. Checking back what we did when, I concluded that we could not have polution in the CDSs. We converted our 'testsysplex' to a fully isolated sysplex, but this meant only full dasd isolation. The new testsysplex is polution free, so is the prodsysplex and the old testsysplex has been removed. Furthermore all this happened under z/OS 1.8, so I can't help you. The only thing I can do for you is check D XCF,S,ALL when one of the systems has been brought down, but I suppose you have a testplex yourself? Kees. For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
The only thing I can do for you is check D XCF,S,ALL when one of the systems has been brought down, but I suppose you have a testplex yourself? Yes, we do. I don't remember ever to have issued this command with the ALL parm when one of the systems was down, but it will be easy to do that once the next IPL comes around. I had also forgotten about the idcams print job. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
I have two sysplexes with one or two members not currently running. None show up with D XCF,S,ALL . What I see is lots of detail about the one member that is running. Nothing about the others. . . JO.Skip Robinson SCE Infrastructure Technology Services Electric Dragon Team Paddler SHARE MVS Program Co-Manager 626-302-7535 Office 323-715-0595 Mobile jo.skip.robin...@sce.com From: Barbara Nitz nitz-...@gmx.net To: IBM-MAIN@bama.ua.edu Date: 10/18/2011 02:12 AM Subject:Re: maxsystem in a sysplex - belated heads-up Sent by:IBM Mainframe Discussion List IBM-MAIN@bama.ua.edu The only thing I can do for you is check D XCF,S,ALL when one of the systems has been brought down, but I suppose you have a testplex yourself? Yes, we do. I don't remember ever to have issued this command with the ALL parm when one of the systems was down, but it will be easy to do that once the next IPL comes around. I had also forgotten about the idcams print job. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
I have two sysplexes with one or two members not currently running. None show up with D XCF,S,ALL . What I see is lots of detail about the one member that is running. Nothing about the others. Skip, thanks for testing that for me. Just as I figured - there isn't really an easy way to find out *what* junk might have accumulated in the sysplex CDS over the years. On the other hand, a D XCF,CPL would faithfully show an ARM CDS that hasn't been around for years, and that isn't used. (Unless that has been silently fixed since we experienced it.) What a mess. Barbara -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: maxsystem in a sysplex - belated heads-up
Barbara, Thanks, we will be adding systems to our sysplexes soon, which fit within our maxsystems, but now I have to recheck if they really fit next to the current systens and ancient rubbish. Kees. Richards, Robert B. robert.richa...@opm.gov wrote in message news:2d14e7856646224aacdda13ab1d35557193235d...@wdcv7exvs2.opm.gov... Barbara, Thank You! for the heads-up. I'll be on that migration path in the next twelve months or so. Bob -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On Behalf Of Barbara Nitz Sent: Tuesday, October 11, 2011 11:53 PM To: IBM-MAIN@bama.ua.edu Subject: maxsystem in a sysplex - belated heads-up When we migrated from a z9 to a z196 - big-bang swap - we had a lot of trouble getting one sysplex up. The first system came up fine, but the second or third system could not be ipl'd into that plex. It ended up on a GRS (!) wait0A3 rsn9C. The accompanying ISG message would only have been readable in a standalone dump, but never on a real console. My colleagues had not taken an sadump. z/OS 1.10. It turned out that there was nothing whatsoever wrong with the GRS lock structure. There was nothing wrong with GRS *at all*. The cause of this was an IBM design change in z/OS 1.9, where IBM unilaterally decided to give up on the concept of maxsystem determining how many systems can join a sysplex. Our sysplex CDS was formatted with maxsystem(5) (because there used to be 5 systems in that sysplex - 2 of them gone for more than a year, and both of them occupying the first two slots in the sysplex CDS). The CFRM CDS had to get reformatted for the big-bang replacement, and it got formatted with maxsystem(3), which reflected the true capacity of the sysplex. Well, the capacity of *that* sysplex was exactly one, because every other system would get a 'CFRM CDS unusable', due to the fact that the sysplex CDS had a higher maxsystem value. In addition, it was clearly visible that the incoming system *had* established signalling connectivity with the system already in the sysplex, which it could only do by successfully!!! reading the CFRM CDS to get at the names of the signalling structures. In addition, the reply I to 're-initialize the sysplex' when the first system was IPL'd (plus the accompanying explanation in the docs that everything in the sysplex will be treated as residual) are wrong. *Nothing* is treated as residual. Looking at the 1.12 documentation for the maxsystem parm (right, *everybody* looks at that book *every* time a canned job that existed since the dawn of sysplex is submitted) says this: When formatting the couple data set to contain the CFRM policy data, ensure that the value you specify for MAXSYSTEM matches the value for MAXSYSTEM that was used when the sysplex couple data set was formatted. When coupling facility structures are used for XCF signaling, if the MAXSYSTEM value specified for the CFRM couple data set is less than that of the sysplex couple data set, systems might not be able to join the sysplex. For example, if MAXSYSTEM=16 is specified for the sysplex couple data set and MAXSYSTEM=8 is specified for the CFRM couple data set, then only eight systems will be allowed in the sysplex. This clearly implies that the lower of all the maxsystem values is the capacity of the sysplex. IT IS NOT. It is unpredictable, especially if your sysplex CDS is so old that it still has systems in it that are long gone. Which will be preserved along with all the junk that might have once been in the sysplex. IBM told me (and I give a big thanks to the lady who actually went beyond the canned answer I first got and *looked* into this despite the fact that all I had in terms of docs was a syslog) that all of this is broken as designed: Cluster MR support introduced the requirement to preserve information about manageable resources (which includes sysplexes, systems, CF's, structures and connectors) across a sysplex-wide IPL. To XCF, a sysplex, system, etc. that terminates and is reIPLed is an entirely new entity. To the MR infrastructure, however, it is the same entity transitioning between different states. Therefore, XCF as of V1R9 needs to preserve and reuse system slots whenever possible, regardless of the reply I to IXC405D, as we see. and We agree that the design change in R9 does make the actual system capacity unpredictable. Also, the MAXSYSTEM write up in Setting up a Sysplex needs to be cleaned up. Thus, a documentation update is in order. The unpredictability of maxsystem is apparently addressed in 1.12 by 'fixing' SUG apar OA27634 which makes the GRS wait state message visible by spitting out a readable message instead of wait stating the system. So, if you have old and long gone systems in your sysplex CDS, happen to get a lower maxsystem value in your CFRM CDS and end up in a wait0A3 - delete and reformat your sysplex CDS. That will 'fix' the problem that has nothing whatsoever
Re: maxsystem in a sysplex - belated heads-up
Barbara, Thank You! for the heads-up. I'll be on that migration path in the next twelve months or so. Bob -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On Behalf Of Barbara Nitz Sent: Tuesday, October 11, 2011 11:53 PM To: IBM-MAIN@bama.ua.edu Subject: maxsystem in a sysplex - belated heads-up When we migrated from a z9 to a z196 - big-bang swap - we had a lot of trouble getting one sysplex up. The first system came up fine, but the second or third system could not be ipl'd into that plex. It ended up on a GRS (!) wait0A3 rsn9C. The accompanying ISG message would only have been readable in a standalone dump, but never on a real console. My colleagues had not taken an sadump. z/OS 1.10. It turned out that there was nothing whatsoever wrong with the GRS lock structure. There was nothing wrong with GRS *at all*. The cause of this was an IBM design change in z/OS 1.9, where IBM unilaterally decided to give up on the concept of maxsystem determining how many systems can join a sysplex. Our sysplex CDS was formatted with maxsystem(5) (because there used to be 5 systems in that sysplex - 2 of them gone for more than a year, and both of them occupying the first two slots in the sysplex CDS). The CFRM CDS had to get reformatted for the big-bang replacement, and it got formatted with maxsystem(3), which reflected the true capacity of the sysplex. Well, the capacity of *that* sysplex was exactly one, because every other system would get a 'CFRM CDS unusable', due to the fact that the sysplex CDS had a higher maxsystem value. In addition, it was clearly visible that the incoming system *had* established signalling connectivity with the system already in the sysplex, which it could only do by successfully!!! reading the CFRM CDS to get at the names of the signalling structures. In addition, the reply I to 're-initialize the sysplex' when the first system was IPL'd (plus the accompanying explanation in the docs that everything in the sysplex will be treated as residual) are wrong. *Nothing* is treated as residual. Looking at the 1.12 documentation for the maxsystem parm (right, *everybody* looks at that book *every* time a canned job that existed since the dawn of sysplex is submitted) says this: When formatting the couple data set to contain the CFRM policy data, ensure that the value you specify for MAXSYSTEM matches the value for MAXSYSTEM that was used when the sysplex couple data set was formatted. When coupling facility structures are used for XCF signaling, if the MAXSYSTEM value specified for the CFRM couple data set is less than that of the sysplex couple data set, systems might not be able to join the sysplex. For example, if MAXSYSTEM=16 is specified for the sysplex couple data set and MAXSYSTEM=8 is specified for the CFRM couple data set, then only eight systems will be allowed in the sysplex. This clearly implies that the lower of all the maxsystem values is the capacity of the sysplex. IT IS NOT. It is unpredictable, especially if your sysplex CDS is so old that it still has systems in it that are long gone. Which will be preserved along with all the junk that might have once been in the sysplex. IBM told me (and I give a big thanks to the lady who actually went beyond the canned answer I first got and *looked* into this despite the fact that all I had in terms of docs was a syslog) that all of this is broken as designed: Cluster MR support introduced the requirement to preserve information about manageable resources (which includes sysplexes, systems, CF's, structures and connectors) across a sysplex-wide IPL. To XCF, a sysplex, system, etc. that terminates and is reIPLed is an entirely new entity. To the MR infrastructure, however, it is the same entity transitioning between different states. Therefore, XCF as of V1R9 needs to preserve and reuse system slots whenever possible, regardless of the reply I to IXC405D, as we see. and We agree that the design change in R9 does make the actual system capacity unpredictable. Also, the MAXSYSTEM write up in Setting up a Sysplex needs to be cleaned up. Thus, a documentation update is in order. The unpredictability of maxsystem is apparently addressed in 1.12 by 'fixing' SUG apar OA27634 which makes the GRS wait state message visible by spitting out a readable message instead of wait stating the system. So, if you have old and long gone systems in your sysplex CDS, happen to get a lower maxsystem value in your CFRM CDS and end up in a wait0A3 - delete and reformat your sysplex CDS. That will 'fix' the problem that has nothing whatsoever to do with GRS. Do NOT be fooled by the fact that signalling is established and the CFRM CDS was usable for *that* - such inconsistencies we are not supposed to care about. In any case, we will never run into this again. Permanent part of our DR setup is now to always delete both the sysplex CDS
maxsystem in a sysplex - belated heads-up
When we migrated from a z9 to a z196 - big-bang swap - we had a lot of trouble getting one sysplex up. The first system came up fine, but the second or third system could not be ipl'd into that plex. It ended up on a GRS (!) wait0A3 rsn9C. The accompanying ISG message would only have been readable in a standalone dump, but never on a real console. My colleagues had not taken an sadump. z/OS 1.10. It turned out that there was nothing whatsoever wrong with the GRS lock structure. There was nothing wrong with GRS *at all*. The cause of this was an IBM design change in z/OS 1.9, where IBM unilaterally decided to give up on the concept of maxsystem determining how many systems can join a sysplex. Our sysplex CDS was formatted with maxsystem(5) (because there used to be 5 systems in that sysplex - 2 of them gone for more than a year, and both of them occupying the first two slots in the sysplex CDS). The CFRM CDS had to get reformatted for the big-bang replacement, and it got formatted with maxsystem(3), which reflected the true capacity of the sysplex. Well, the capacity of *that* sysplex was exactly one, because every other system would get a 'CFRM CDS unusable', due to the fact that the sysplex CDS had a higher maxsystem value. In addition, it was clearly visible that the incoming system *had* established signalling connectivity with the system already in the sysplex, which it could only do by successfully!!! reading the CFRM CDS to get at the names of the signalling structures. In addition, the reply I to 're-initialize the sysplex' when the first system was IPL'd (plus the accompanying explanation in the docs that everything in the sysplex will be treated as residual) are wrong. *Nothing* is treated as residual. Looking at the 1.12 documentation for the maxsystem parm (right, *everybody* looks at that book *every* time a canned job that existed since the dawn of sysplex is submitted) says this: When formatting the couple data set to contain the CFRM policy data, ensure that the value you specify for MAXSYSTEM matches the value for MAXSYSTEM that was used when the sysplex couple data set was formatted. When coupling facility structures are used for XCF signaling, if the MAXSYSTEM value specified for the CFRM couple data set is less than that of the sysplex couple data set, systems might not be able to join the sysplex. For example, if MAXSYSTEM=16 is specified for the sysplex couple data set and MAXSYSTEM=8 is specified for the CFRM couple data set, then only eight systems will be allowed in the sysplex. This clearly implies that the lower of all the maxsystem values is the capacity of the sysplex. IT IS NOT. It is unpredictable, especially if your sysplex CDS is so old that it still has systems in it that are long gone. Which will be preserved along with all the junk that might have once been in the sysplex. IBM told me (and I give a big thanks to the lady who actually went beyond the canned answer I first got and *looked* into this despite the fact that all I had in terms of docs was a syslog) that all of this is broken as designed: Cluster MR support introduced the requirement to preserve information about manageable resources (which includes sysplexes, systems, CF's, structures and connectors) across a sysplex-wide IPL. To XCF, a sysplex, system, etc. that terminates and is reIPLed is an entirely new entity. To the MR infrastructure, however, it is the same entity transitioning between different states. Therefore, XCF as of V1R9 needs to preserve and reuse system slots whenever possible, regardless of the reply I to IXC405D, as we see. and We agree that the design change in R9 does make the actual system capacity unpredictable. Also, the MAXSYSTEM write up in Setting up a Sysplex needs to be cleaned up. Thus, a documentation update is in order. The unpredictability of maxsystem is apparently addressed in 1.12 by 'fixing' SUG apar OA27634 which makes the GRS wait state message visible by spitting out a readable message instead of wait stating the system. So, if you have old and long gone systems in your sysplex CDS, happen to get a lower maxsystem value in your CFRM CDS and end up in a wait0A3 - delete and reformat your sysplex CDS. That will 'fix' the problem that has nothing whatsoever to do with GRS. Do NOT be fooled by the fact that signalling is established and the CFRM CDS was usable for *that* - such inconsistencies we are not supposed to care about. In any case, we will never run into this again. Permanent part of our DR setup is now to always delete both the sysplex CDS and the CFRM CDS and redefine them freshly in order to avoid this unpredictability. Barbara Nitz -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at