Re: maxsystem in a sysplex - belated heads-up

2012-01-16 Thread Bill Neiman
I have spoken with the reviewer to clarify the request, and requirement 
MR1026112735 has now been updated to a status of Recognized.  This means that 
IBM concurs that it is a desirable function but makes no explicit commitment as 
to when or whether it can be incorporated into the product.  I'll work with the 
planners to see if we can get it done.

 Bill Neiman
 Parallel Sysplex Development, IBM

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: maxsystem in a sysplex - belated heads-up

2012-01-12 Thread Vernooij, CP - SPLXM
Barbara Nitz nitz-...@gmx.net wrote in message
news:9193446080894951.wa.nitzibmgmx@bama.ua.edu...
 Barbara - I've asked our account team to add us as concurring with
the requirement and to be added to the Interested Parties list.
 
 The requirement has been rejected with this:
 User Group Number - MR1026112735
 
 Title - Provide a utility to remove residual systems from the sysplex
CDS
 IBM believes that the request described has been solved with a current
 product / service / policy / etc.
 
 Please see apar OA37776
 
 That apar is closed DOC and describes the current behaviour. It does
not 'solve' anything. Why am I not surprised?
 
 So to repeat the warning: If you have ever terminated any lpar and
gotten rid of it never to IPL it again, make sure you do a sysplex IPL
on freshly formatted sysplex CDSs. Otherwise you will need to increase
MAXSYSTEM to higher and higher values on all your CDSs just to keep old
junk in the sysplex CSD around.
 
 Barbara
 

Yes, sounds contradictory: IBM recommending a sysplex wide IPL, what was
a sysplex intended for in the first place?

They did solve a similar issue with RRS, where you not could get rid of
your Archive logstream without an RRS cold start, until they decided to
develop functionality to drop and remove the logstream dynamically.

Kees.

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message. 

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt. 
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: maxsystem in a sysplex - belated heads-up

2012-01-12 Thread Barbara Nitz
Yes, sounds contradictory: IBM recommending a sysplex wide IPL, what was
a sysplex intended for in the first place?

I didn't read that *IBM* recommends a sysplex-wide IPL. *I* do to get rid of 
the junk that IBM accumulates in the sysplex CDS. I believe IBM frowns upon 
sysplex-wide IPLs. On the other hand, they are unwilling to provide a utility 
that lets you do cleanup.

They did solve a similar issue with RRS, where you not could get rid of
your Archive logstream without an RRS cold start, until they decided to
develop functionality to drop and remove the logstream dynamically.
Different components with wildly different attitudes. 'nuff said.

Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: maxsystem in a sysplex - belated heads-up

2012-01-11 Thread Barbara Nitz
Barbara - I've asked our account team to add us as concurring with the 
requirement and to be added to the Interested Parties list.

The requirement has been rejected with this:
User Group Number - MR1026112735

Title - Provide a utility to remove residual systems from the sysplex CDS
IBM believes that the request described has been solved with a current
product / service / policy / etc.

Please see apar OA37776

That apar is closed DOC and describes the current behaviour. It does not 
'solve' anything. Why am I not surprised?

So to repeat the warning: If you have ever terminated any lpar and gotten rid 
of it never to IPL it again, make sure you do a sysplex IPL on freshly 
formatted sysplex CDSs. Otherwise you will need to increase MAXSYSTEM to higher 
and higher values on all your CDSs just to keep old junk in the sysplex CSD 
around.

Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN


Re: maxsystem in a sysplex - belated heads-up

2011-10-27 Thread Jerry Whitteridge
Barbara - I've asked our account team to add us as concurring with the 
requirement and to be added to the Interested Parties list.

Jerry Whitteridge
Design Engineer
Safeway Inc.
925 951 4184

If you feel in control
you just aren't going fast enough.


 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On
 Behalf Of Barbara Nitz
 Sent: Wednesday, October 26, 2011 9:08 PM
 To: IBM-MAIN@bama.ua.edu
 Subject: Re: maxsystem in a sysplex - belated heads-up
 
 To finish this thread: There is now marketing requirement
 MR1026112735 that describes the need for a cleanup utility and asks
 for a way to really re-initialize the sysplex on the ixc405 message.
 Barbara
 
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@bama.ua.edu with the message: GET IBM-MAIN
 INFO
 Search the archives at http://bama.ua.edu/archives/ibm-main.html


Email Firewall made the following annotations.
--

Warning: 
All e-mail sent to this address will be received by the corporate e-mail 
system, and is subject to archival and review by someone other than the 
recipient.  This e-mail may contain proprietary information and is intended 
only for the use of the intended recipient(s).  If the reader of this message 
is not the intended recipient(s), you are notified that you have received this 
message in error and that any review, dissemination, distribution or copying of 
this message is strictly prohibited.  If you have received this message in 
error, please notify the sender immediately.   
 
==

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-26 Thread Barbara Nitz
To finish this thread: There is now marketing requirement MR1026112735 that 
describes the need for a cleanup utility and asks for a way to really 
re-initialize the sysplex on the ixc405 message.
Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-23 Thread Barbara Nitz
Bill and Mark,

I am aware of the chicken-and-egg problem. It was explained to me back in the 
nineties when I first learned that CTCs were not required anymore to establish 
signalling because it was the first thing I asked about.

IBM obviously does not get why I am saying that XCF lies to me/us. My 
problem is not that limited interface, the problem is that a customer sees that 
'signalling is established', which in turn says that the CFRM CDS is usable. 
And *then* the system wait states saying that the CFRM CDS is NOT usable. That 
certainly did NOT make any sense to my colleagues that night (*if* they had 
been able to see the wait state message from GRS, which they could not because 
it disappears before a human can even see that there is a message. That is 
supposedly fixed in 1.12.)
So what's left is that a design change in XCF results in a GRS wait state 
saying that the GRS lock structure is unusable. That makes even less sense in 
such a situation, since one system is already using the GRS structure. And that 
was all the external information they had. It certainly does not give any hint 
to the true cause of the problem, much less point the way in the right 
direction.

From a customer perspective, all I care about in such a situation is getting 
my sysplex back up, and that is almost impossible with the contradicting 
information I get. I don't really care what component is 'at fault'. I am not 
looking for fault, I am looking to fix things.

But thanks for clarifying that this can happen with any CDS, not just the CFRM 
CDS.

Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-21 Thread Bill Neiman
To clarify some technical issues that have been discussed in this thread:

On the second and subsequent systems to join the sysplex, XCF signaling 
has a chicken-and-egg problem when it attempts to determine what signaling 
connectivity exists with other systems.  It may be that the only signaling 
paths available are via XCF signaling structures, but it must perform this 
evaluation at a point in IPL *before* the CFRM couple data set has been brought 
into use.  It therefore does not use the CFRM CDS in the normal manner.  It is 
permitted to perform a limited subset of the operations normally available, 
viz., unserialized reads of individual subrecords of the CFRM active policy, to 
determine what structures exist.  Unserialized reads are permitted specifically 
because they do not depend on the MAXSYSTEM parameter with which the CDS was 
formatted, whereas serialized operations do depend on MAXSYSTEM.  (The number 
of systems for which the CDS was formatted determines the number of lock blocks 
that are used to ensure that operations initiated by one sy!
 stem do not collide with operations initiated by another.)  XCF is not lying 
to you when it establishes connectivity via signaling structures but later 
rejects the CFRM CDS because it is formatted for too few systems.

XCF does not and never will wait state simply because it is unable to 
use a function CDS (i.e., a CDS of type other than sysplex).  In the specific 
case of the CFRM CDS, XCF cannot determine that the installation is trying to 
IPL in GRS star mode, nor does XCF even understand that the CFRM CDS is 
required for star mode.  For other CDS types, it is perfectly possible for a 
system to continue running if it doesn't have access to that CDS.  I don't 
think it would be popular if XCF wait stated because the CFRM CDS was formatted 
for foo few systems when the plex was running in ring mode, or if it wait 
stated because the n'th system into the plex couldn't use the ARM (or SFM or 
Logger or WLM or BPX) CDS.  Instead, we should - and do - permit the system to 
IPL into the plex, report that it can't use the function, and allow the 
installation to take corrective action by making a larger CDS available.

The change to retain information about inactive systems in the sysplex 
CDS was implemented in support of a business resilience function that provided 
infrastructure for automated management of sysplex resources.  That structure 
required the ability to retain sysplex-related information even across 
sysplex-wide IPLs.  Because of the retention requirement, I doubt that we would 
implement a function to automatically purge inactive systems after some period. 
 We would have the same problem we always have when trying to choose a 
threshold - whatever we chose would be too short for some installations, too 
long for others, and not correct for anyone.  We'd have to introduce yet 
another parameter for customers to manage, to control the retention period.

Bill Neiman
Parallel Sysplex development
IBM

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-21 Thread Mark Brooks
Hi,
I would also point out that from an XCF design perspective, it is
perfectly acceptable to have a function CDS be accessible from a subset of
the systems,   In practice, the requirement imposed by the exploiters is
that the function CDS be accessible from all systems that require the
function.  For many exploiters, that is tantamount to requiring that the
function CDS be accessible from every system in the sysplex.  But it is not
a requirement that XCF imposes.
The XCF signalling service does NOT lie about establishing signalling
connectivity with other systems.  If we issue a message saying we
established it, then we did.  It might change a nanosecond later, but as of
the moment that we decided to issue the message, it was truth.
Bill Neiman  is correct about the chicken and egg situation.  To be a
little more precise, the IPLing system takes a look at the signalling
structures that have been defined for its use.  We then peek inside the
CFRM CDS to see if those structures have been physically allocated.  For
those structures that exist, we then do a limited form of connect to the
structure that permits us to go read XCF control data from the respective
signalling structures.  We use that information to determine what other
systems are using the structure for signalling.  Based on that data, we
then forecast what signalling paths would be established if the IPL were to
continue.  If the predicted paths (along with any ones that have actually
been established -- ie CTC paths) are not sufficient to establish
connectivity, XCF complains about insufficient signalling paths and
prompts the operator to resolve the problem.  If the predicted paths appear
to be sufficient for establishing connectivity, we allow the system to
proceed to become active in the sysplex.
As soon as the system becomes active in the sysplex, it can now use
the CFRM policy and related CF structures for real.  So the first thing XCF
does is connect to the signal structures for real and attempt to
establish full signalling connectivity.  If we do establish connectivity
with every other system in the sysplex,  we issue truthful messages to say
so and the IPL proceeds.  If we cannot establish signalling connectivity
with every other system in the sysplex, the IPL does NOT proceed.  And we
engage the operator to express our displeasure and plead for resolution.
Only on the rarest of occasions have I ever seen situations where the
predicted paths fail to become real paths and connectivity is not
established.  I can certainly envision cases where it wouldn't work.  The
simplest one would be to have the active systems stop using the structure
(s) for signalling between the time that the IPLing system looks to see who
was using the structures and the time it gets around to actually trying to
establish those paths for real.

In short, an IPLing system will not make it into the sysplex unless
it appears to have a reasonable chance of establishing signalling
connectivity, and the IPL will not proceed unless it actually can.  The
interior workings are likely a mystery to many.  I suspect there is much
that can be misinterpreted.  But we do not lie.  If a message was in fact
issued to indicate that connectivity was established when it in fact had
not, please open a defect and we will fix it.  We do not intentionally
issue false and misleading messages.

Mark A. Brooks
z/OS Sysplex design and development
845-435-5149   T/L 8-295-5149
Poughkeepsie, NY
mabr...@us.ibm.com

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-20 Thread Bill Neiman
Barbara,

  If you were to open a requirement requesting a utility of some kind 
to report and delete residual system information from the sysplex couple data 
set, we might be able to do something about this.  I don't think it would be 
hard to implement something similar to IXCMIAPU for this purpose.

  Bill Neiman
  Parallel Sysplex development
  IBM

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-20 Thread Mike Schwab
How about expiring the data from slot not used for 30 days?  An IPL
after that would retrieve the needed info in a few seconds when it
joins the sysplex?

On Thu, Oct 20, 2011 at 8:21 AM, Bill Neiman nei...@us.ibm.com wrote:
 Barbara,

          If you were to open a requirement requesting a utility of some kind 
 to report and delete residual system information from the sysplex couple data 
 set, we might be able to do something about this.  I don't think it would be 
 hard to implement something similar to IXCMIAPU for this purpose.

          Bill Neiman
          Parallel Sysplex development
          IBM

-- 
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-20 Thread Barbara Nitz
How about expiring the data from slot not used for 30 days?  An IPL
after that would retrieve the needed info in a few seconds when it
joins the sysplex?

I like that idea. But as I said in my (private) note to Bill, since the design 
change from z/OS 1.9 isn't externally documented (that I know of) this might 
not be possible. 

My suggestion to get out of this mess is different: Introduce a new wait0A2 
(the general XCF wait state) reason code whenever a functional CDS has a 
maxsystem smaller than the sysplex CDS and don't allow even the first system 
into the plex. By first system I mean the system that issued message IXC405D 
and that got the reply 'I' for re-initialization of the plex. 

This would automatically take care of the other problem that isn't addressed at 
all as far as I can see: The CFRM CDS *was* usable enough for XCF to go there, 
read it, get the names of the signalling structures and *establish signalling* 
with the system that was already in the plex. If a CFRM CDS is declared not 
usable, then it cannot have been used to establish signalling via structures. 
In essence, XCF LIES when it says that signalling (both pathin and pathout) was 
established with the system just joining the plex. This will complicate *any* 
problem determination and send a customer in the wrong direction every time. It 
also means that I cannot trust *any* message content anymore - it may be a lie. 
Personally, I find that unacceptable. (But who am I?)

I take the MR for the utility (or rather, have the MR taken for me), if that's 
all IBM is willing/able to commit to.

As of yesterday, IBM has taken apar OA37776, which is supposed to address the 
documentation issue. Nothing will get fixed via this apar, AFAIK. The complete 
recreation scenario (including how to 'prime' the sysplex CDS) is in there.

Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-18 Thread Barbara Nitz
Kees,

Thanks, we will be adding systems to our sysplexes soon, which fit
within our maxsystems, but now I have to recheck if they really fit next
to the current systens and ancient rubbish.

maybe you can test for me if one can find out via simple display commands what 
rubbish is kept in the sysplex CDS (I obviously cannot, since I've cleaned up 
all sysplex CDSs with residual information):

D XCF,S,ALL supposedly lists all systems in the sysplex. It would be 
interesting to see if those are really all of them or only those that are 
active. The book isn't really clear on that distinction.

D XCF,GRP will give a list of all defined groups. That means also groups that 
are no longer valid in the plex (and will never become active again) will still 
be listed (XCF group member state changes when permanent status recording is on 
are complicated enough in theory and in the books, but that design change might 
have also changed this further, without documentation update). From here on out 
you have to specify each group name individually to see all member(s). So that 
is not exactly easy to determine how much rubbish might have accumulated.

For comparison, take a dump of XCFAS and all its dataspace and then issue the 
ipcs couple sysplex detail and couple group detail commands to see what *that* 
might tell you (and where it differs from the displayed information).

The way I determined that there were residual systems in the sysplex CDS (which 
was inactive at the time, so no display commands possible) was to simply 
*review* browse (from the cbttape) the sysplex CDS. The system names in there 
fairly leap at you. Looking at group information is harder, as that requires 
switching left and right.

Thanks in advance, Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-18 Thread Vernooij, CP - SPLXM
Barbara Nitz nitz-...@gmx.net wrote in message
news:8637203851702442.wa.nitzibmgmx@bama.ua.edu...
 Kees,
 
 Thanks, we will be adding systems to our sysplexes soon, which fit
 within our maxsystems, but now I have to recheck if they really fit
next
 to the current systens and ancient rubbish.
 
 maybe you can test for me if one can find out via simple display
commands what rubbish is kept in the sysplex CDS (I obviously cannot,
since I've cleaned up all sysplex CDSs with residual information):
 
 D XCF,S,ALL supposedly lists all systems in the sysplex. It would be
interesting to see if those are really all of them or only those that
are active. The book isn't really clear on that distinction.
 
 D XCF,GRP will give a list of all defined groups. That means also
groups that are no longer valid in the plex (and will never become
active again) will still be listed (XCF group member state changes when
permanent status recording is on are complicated enough in theory and in
the books, but that design change might have also changed this further,
without documentation update). From here on out you have to specify each
group name individually to see all member(s). So that is not exactly
easy to determine how much rubbish might have accumulated.
 
 For comparison, take a dump of XCFAS and all its dataspace and then
issue the ipcs couple sysplex detail and couple group detail commands to
see what *that* might tell you (and where it differs from the displayed
information).
 
 The way I determined that there were residual systems in the sysplex
CDS (which was inactive at the time, so no display commands possible)
was to simply *review* browse (from the cbttape) the sysplex CDS. The
system names in there fairly leap at you. Looking at group information
is harder, as that requires switching left and right.
 
 Thanks in advance, Barbara
 

Barbara,

I already did some research . 
I used IDCAMS PRINT to check the contents of the CDSs and found only
valid systemids. 
Checking back what we did when, I concluded that we could not have
polution in the CDSs. We converted our 'testsysplex' to a fully isolated
sysplex, but this meant only full dasd isolation. The new testsysplex is
polution free, so is the prodsysplex and the old testsysplex has been
removed. Furthermore all this happened under z/OS 1.8, so I can't help
you.
The only thing I can do for you is check D XCF,S,ALL when one of the
systems has been brought down, but I suppose you have a testplex
yourself?

Kees.

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message. 

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt. 
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-18 Thread Barbara Nitz
The only thing I can do for you is check D XCF,S,ALL when one of the
systems has been brought down, but I suppose you have a testplex
yourself?

Yes, we do. I don't remember ever to have issued this command with the ALL parm 
when one of the systems was down, but it will be easy to do that once the next 
IPL comes around. I had also forgotten about the idcams print job.

Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-18 Thread Skip Robinson
I have two sysplexes with one or two members not currently running. None 
show up with D XCF,S,ALL . What I see is lots of detail about the one 
member that is running. Nothing about the others.

.
.
JO.Skip Robinson
SCE Infrastructure Technology Services
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
626-302-7535 Office
323-715-0595 Mobile
jo.skip.robin...@sce.com



From:   Barbara Nitz nitz-...@gmx.net
To: IBM-MAIN@bama.ua.edu
Date:   10/18/2011 02:12 AM
Subject:Re: maxsystem in a sysplex - belated heads-up
Sent by:IBM Mainframe Discussion List IBM-MAIN@bama.ua.edu



The only thing I can do for you is check D XCF,S,ALL when one of the
systems has been brought down, but I suppose you have a testplex
yourself?

Yes, we do. I don't remember ever to have issued this command with the ALL 
parm when one of the systems was down, but it will be easy to do that once 
the next IPL comes around. I had also forgotten about the idcams print 
job.

Barbara


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-18 Thread Barbara Nitz
I have two sysplexes with one or two members not currently running. None
show up with D XCF,S,ALL . What I see is lots of detail about the one
member that is running. Nothing about the others.

Skip, thanks for testing that for me. Just as I figured - there isn't really an 
easy way to find out *what* junk might have accumulated in the sysplex CDS over 
the years. On the other hand, a D XCF,CPL would faithfully show an ARM CDS that 
hasn't been around for years, and that isn't used. (Unless that has been 
silently fixed since we experienced it.) What a mess. 

Barbara

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: maxsystem in a sysplex - belated heads-up

2011-10-17 Thread Vernooij, CP - SPLXM
Barbara,

Thanks, we will be adding systems to our sysplexes soon, which fit
within our maxsystems, but now I have to recheck if they really fit next
to the current systens and ancient rubbish.

Kees.

Richards, Robert B. robert.richa...@opm.gov wrote in message
news:2d14e7856646224aacdda13ab1d35557193235d...@wdcv7exvs2.opm.gov...
 Barbara,
 
 Thank You! for the heads-up. I'll be on that migration path in the
next twelve months or so.
 
 Bob
 
 
 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On
Behalf Of Barbara Nitz
 Sent: Tuesday, October 11, 2011 11:53 PM
 To: IBM-MAIN@bama.ua.edu
 Subject: maxsystem in a sysplex - belated heads-up
 
 When we migrated from a z9 to a z196 - big-bang swap - we had a lot of
trouble getting one sysplex up. The first system came up fine, but the
second or third system could not be ipl'd into that plex. It ended up on
a GRS (!) wait0A3 rsn9C. The accompanying ISG message would only have
been readable in a standalone dump, but never on a real console. My
colleagues had not taken an sadump. z/OS 1.10.
 
 It turned out that there was nothing whatsoever wrong with the GRS
lock structure. There was nothing wrong with GRS *at all*. The cause of
this was an IBM design change in z/OS 1.9, where IBM unilaterally
decided to give up on the concept of maxsystem determining how many
systems can join a sysplex. Our sysplex CDS was formatted with
maxsystem(5) (because there used to be 5 systems in that sysplex - 2 of
them gone for more than a year, and both of them occupying the first two
slots in the sysplex CDS). The CFRM CDS had to get reformatted for the
big-bang replacement, and it got formatted with maxsystem(3), which
reflected the true capacity of the sysplex.
 
 Well, the capacity of *that* sysplex was exactly one, because every
other system would get a 'CFRM CDS unusable', due to the fact that the
sysplex CDS had a higher maxsystem value. In addition, it was clearly
visible that the incoming system *had* established signalling
connectivity with the system already in the sysplex, which it could only
do by successfully!!! reading the CFRM CDS to get at the names of the
signalling structures. In addition, the reply I to 're-initialize the
sysplex' when the first system was IPL'd (plus the accompanying
explanation in the docs that everything in the sysplex will be treated
as residual) are wrong. *Nothing* is treated as residual.
 
 Looking at the 1.12 documentation for the maxsystem parm (right,
*everybody* looks at that book *every* time a canned job that existed
since the dawn of sysplex is submitted) says this:
 When formatting the couple data set to contain the CFRM policy data,
ensure that the value you specify for MAXSYSTEM matches the value for
MAXSYSTEM that was used when the sysplex couple data set was formatted.
When coupling facility structures are used for XCF signaling, if the
MAXSYSTEM value specified for the CFRM couple data set is less than that
of the sysplex couple data set, systems might not be able to join the
sysplex. For example, if MAXSYSTEM=16 is specified for the sysplex
couple data set and MAXSYSTEM=8 is specified for the CFRM couple data
set, then only eight systems will be allowed in the sysplex.
 This clearly implies that the lower of all the maxsystem values is the
capacity of the sysplex. IT IS NOT. It is unpredictable, especially if
your sysplex CDS is so old that it still has systems in it that are long
gone. Which will be preserved along with all the junk that might have
once been in the sysplex.
 
 IBM told me (and I give a big thanks to the lady who actually went
beyond the canned answer I first got and *looked* into this despite the
fact that all I had in terms of docs was a syslog) that all of this is
broken as designed:
 
 Cluster MR support introduced the requirement to preserve information
about manageable resources (which includes sysplexes, systems, CF's,
structures and connectors) across a sysplex-wide IPL. To XCF, a sysplex,
system, etc. that terminates and is reIPLed is an entirely new entity.
To the MR infrastructure, however, it is the same entity transitioning
between different states.  Therefore, XCF as of V1R9 needs to preserve
and reuse system slots whenever possible,
 regardless of the reply I to IXC405D, as we see. and
 We agree that the design change in R9 does make the actual system
capacity unpredictable. Also, the MAXSYSTEM write up in Setting up a
Sysplex needs to be cleaned up. Thus, a documentation update is in
order.
 
 The unpredictability of maxsystem is apparently addressed in 1.12 by
'fixing' SUG apar OA27634 which makes the GRS wait state message visible
by spitting out a readable message instead of wait stating the system. 
 
 So, if you have old and long gone systems in your sysplex CDS, happen
to get a lower maxsystem value in your CFRM CDS and end up in a wait0A3
- delete and reformat your sysplex CDS. That will 'fix' the problem that
has nothing whatsoever

Re: maxsystem in a sysplex - belated heads-up

2011-10-12 Thread Richards, Robert B.
Barbara,

Thank You! for the heads-up. I'll be on that migration path in the next 
twelve months or so.

Bob


-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@bama.ua.edu] On Behalf Of 
Barbara Nitz
Sent: Tuesday, October 11, 2011 11:53 PM
To: IBM-MAIN@bama.ua.edu
Subject: maxsystem in a sysplex - belated heads-up

When we migrated from a z9 to a z196 - big-bang swap - we had a lot of trouble 
getting one sysplex up. The first system came up fine, but the second or third 
system could not be ipl'd into that plex. It ended up on a GRS (!) wait0A3 
rsn9C. The accompanying ISG message would only have been readable in a 
standalone dump, but never on a real console. My colleagues had not taken an 
sadump. z/OS 1.10.

It turned out that there was nothing whatsoever wrong with the GRS lock 
structure. There was nothing wrong with GRS *at all*. The cause of this was an 
IBM design change in z/OS 1.9, where IBM unilaterally decided to give up on the 
concept of maxsystem determining how many systems can join a sysplex. Our 
sysplex CDS was formatted with maxsystem(5) (because there used to be 5 systems 
in that sysplex - 2 of them gone for more than a year, and both of them 
occupying the first two slots in the sysplex CDS). The CFRM CDS had to get 
reformatted for the big-bang replacement, and it got formatted with 
maxsystem(3), which reflected the true capacity of the sysplex.

Well, the capacity of *that* sysplex was exactly one, because every other 
system would get a 'CFRM CDS unusable', due to the fact that the sysplex CDS 
had a higher maxsystem value. In addition, it was clearly visible that the 
incoming system *had* established signalling connectivity with the system 
already in the sysplex, which it could only do by successfully!!! reading the 
CFRM CDS to get at the names of the signalling structures. In addition, the 
reply I to 're-initialize the sysplex' when the first system was IPL'd (plus 
the accompanying explanation in the docs that everything in the sysplex will be 
treated as residual) are wrong. *Nothing* is treated as residual.

Looking at the 1.12 documentation for the maxsystem parm (right, *everybody* 
looks at that book *every* time a canned job that existed since the dawn of 
sysplex is submitted) says this:
When formatting the couple data set to contain the CFRM policy data,  ensure 
that the value you specify for MAXSYSTEM matches the value for  MAXSYSTEM that 
was used when the sysplex couple data set was formatted. When coupling facility 
structures are used for XCF signaling, if the MAXSYSTEM value specified for the 
CFRM couple data set is less than that of the sysplex couple data set, systems 
might not be able to join the sysplex. For example, if MAXSYSTEM=16 is 
specified for the sysplex  couple data set and MAXSYSTEM=8 is specified for the 
CFRM couple data set, then only eight systems will be allowed in the sysplex.
This clearly implies that the lower of all the maxsystem values is the capacity 
of the sysplex. IT IS NOT. It is unpredictable, especially if your sysplex CDS 
is so old that it still has systems in it that are long gone. Which will be 
preserved along with all the junk that might have once been in the sysplex.

IBM told me (and I give a big thanks to the lady who actually went beyond the 
canned answer I first got and *looked* into this despite the fact that all I 
had in terms of docs was a syslog) that all of this is broken as designed:

Cluster MR support introduced the requirement to preserve information  about 
manageable resources (which includes sysplexes, systems, CF's,  structures and 
connectors) across a sysplex-wide IPL. To XCF, a sysplex, system, etc. that 
terminates and is reIPLed is an entirely new entity.  To the MR infrastructure, 
however, it is the same entity transitioning between different states.  
Therefore, XCF as of V1R9 needs to preserve and reuse system slots whenever 
possible,
regardless of the reply I to IXC405D, as we see. and
We agree that the design change in R9 does make the actual system capacity 
unpredictable. Also, the MAXSYSTEM write up in Setting up a Sysplex needs to be 
cleaned up. Thus, a documentation update is in order.

The unpredictability of maxsystem is apparently addressed in 1.12 by 'fixing' 
SUG apar OA27634 which makes the GRS wait state message visible by spitting out 
a readable message instead of wait stating the system. 

So, if you have old and long gone systems in your sysplex CDS, happen to get a 
lower maxsystem value in your CFRM CDS and end up in a wait0A3 - delete and 
reformat your sysplex CDS. That will 'fix' the problem that has nothing 
whatsoever to do with GRS. Do NOT be fooled by the fact that signalling is 
established and the CFRM CDS was usable for *that* - such inconsistencies we 
are not supposed to care about. 

In any case, we will never run into this again. Permanent part of our DR setup 
is now to always delete both the sysplex CDS

maxsystem in a sysplex - belated heads-up

2011-10-11 Thread Barbara Nitz
When we migrated from a z9 to a z196 - big-bang swap - we had a lot of trouble 
getting one sysplex up. The first system came up fine, but the second or third 
system could not be ipl'd into that plex. It ended up on a GRS (!) wait0A3 
rsn9C. The accompanying ISG message would only have been readable in a 
standalone dump, but never on a real console. My colleagues had not taken an 
sadump. z/OS 1.10.

It turned out that there was nothing whatsoever wrong with the GRS lock 
structure. There was nothing wrong with GRS *at all*. The cause of this was an 
IBM design change in z/OS 1.9, where IBM unilaterally decided to give up on the 
concept of maxsystem determining how many systems can join a sysplex. Our 
sysplex CDS was formatted with maxsystem(5) (because there used to be 5 systems 
in that sysplex - 2 of them gone for more than a year, and both of them 
occupying the first two slots in the sysplex CDS). The CFRM CDS had to get 
reformatted for the big-bang replacement, and it got formatted with 
maxsystem(3), which reflected the true capacity of the sysplex.

Well, the capacity of *that* sysplex was exactly one, because every other 
system would get a 'CFRM CDS unusable', due to the fact that the sysplex CDS 
had a higher maxsystem value. In addition, it was clearly visible that the 
incoming system *had* established signalling connectivity with the system 
already in the sysplex, which it could only do by successfully!!! reading the 
CFRM CDS to get at the names of the signalling structures. In addition, the 
reply I to 're-initialize the sysplex' when the first system was IPL'd (plus 
the accompanying explanation in the docs that everything in the sysplex will be 
treated as residual) are wrong. *Nothing* is treated as residual.

Looking at the 1.12 documentation for the maxsystem parm (right, *everybody* 
looks at that book *every* time a canned job that existed since the dawn of 
sysplex is submitted) says this:
When formatting the couple data set to contain the CFRM policy data,  ensure 
that the value you specify for MAXSYSTEM matches the value for  MAXSYSTEM that 
was used when the sysplex couple data set was formatted. When coupling facility 
structures are used for XCF signaling, if the MAXSYSTEM value specified for the 
CFRM couple data set is less than that of the sysplex couple data set, systems 
might not be able to join the sysplex. For example, if MAXSYSTEM=16 is 
specified for the sysplex  couple data set and MAXSYSTEM=8 is specified for the 
CFRM couple data set, then only eight systems will be allowed in the sysplex.
This clearly implies that the lower of all the maxsystem values is the capacity 
of the sysplex. IT IS NOT. It is unpredictable, especially if your sysplex CDS 
is so old that it still has systems in it that are long gone. Which will be 
preserved along with all the junk that might have once been in the sysplex.

IBM told me (and I give a big thanks to the lady who actually went beyond the 
canned answer I first got and *looked* into this despite the fact that all I 
had in terms of docs was a syslog) that all of this is broken as designed:

Cluster MR support introduced the requirement to preserve information  about 
manageable resources (which includes sysplexes, systems, CF's,  structures and 
connectors) across a sysplex-wide IPL. To XCF, a sysplex, system, etc. that 
terminates and is reIPLed is an entirely new entity.  To the MR infrastructure, 
however, it is the same entity transitioning between different states.  
Therefore, XCF as of V1R9 needs to preserve and reuse system slots whenever 
possible,
regardless of the reply I to IXC405D, as we see. and
We agree that the design change in R9 does make the actual system capacity 
unpredictable. Also, the MAXSYSTEM write up in Setting up a Sysplex needs to be 
cleaned up. Thus, a documentation update is in order.

The unpredictability of maxsystem is apparently addressed in 1.12 by 'fixing' 
SUG apar OA27634 which makes the GRS wait state message visible by spitting out 
a readable message instead of wait stating the system. 

So, if you have old and long gone systems in your sysplex CDS, happen to get a 
lower maxsystem value in your CFRM CDS and end up in a wait0A3 - delete and 
reformat your sysplex CDS. That will 'fix' the problem that has nothing 
whatsoever to do with GRS. Do NOT be fooled by the fact that signalling is 
established and the CFRM CDS was usable for *that* - such inconsistencies we 
are not supposed to care about. 

In any case, we will never run into this again. Permanent part of our DR setup 
is now to always delete both the sysplex CDS and the CFRM CDS and redefine them 
freshly in order to avoid this unpredictability.

Barbara Nitz

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at