Re: Looking for help with an obscure C integer problem

Lizette Koehler Sun, 28 Jul 2013 07:40:54 -0700

Charles,

First let me apologize if it seemed I was beating you up.  That was not my 
intent.  Unfortunately in email the reader cannot always determine how the 
words are being said.  It was more of a help a vendor out type commentary.  Not 
do it or else type commentary.  But if I did offend - I am sorry.

I am also sorry you have had some issues in the past.  I think most on this 
list can quote similar events.  Not with just IBM but other vendors as well.  
And as for IBM, they have to recreate the problem in a vanilla IBM only 
software.  They do not always have OEM software in their testing environment.  
Therefore, they rely on the reporter to help with the diagnosis sometimes.  And 
I guess the bottom line is - sometimes software will work and sometimes it has 
problems.  

When it doesn’t work then it becomes - how important is it to get it fixed and 
how repeatable the failure.

FIN (Fixed in next) is sometimes acceptable.  Fix it now is sometimes 
acceptable in others.  You have to decide which is more important at the time.

I usually will continue with the vendor until the resolution is found; I then 
determine which way I go.

I believe that all Vendor Support groups want to fix issues as quickly as they 
can. Based on priority, willingness of the customer to work with them, and the 
complexity of the code that needs to be validated and/or corrected; it can be 
an extraordinary effort.  Then they have to balance that with how many 
customers are reporting it, is it isolated or more common.  A lot of factors go 
into working on reported issues.  Having worked both sides - IBM level 2 
support and customer, I can understand some of the constraints IBM has with 
correcting problems.  Some are really easy to identify and fix.  Others are 
head scratchers.  And if there are OEM products involved, they have to get 
documentation from the customer and have the customer validate through parm 
changes or other data collection.

I just finished working on a problem with EMC on their storage array.  It took 
a couple of months and it turned out this particular issue when back a couple 
of code levels for them.  But they were not aware of it until I had my issue.  
I spent time sending in a lot of RMF and LOGREC data on my issue, but they 
found it and eventually fixed it.  They have  also retrofitted it to the older 
release.  But it was an effort on both of our parts to resolve this.

One more story.  I was support our DR testing when Top Secret would not come up 
at IPL time.  Took an S0C4-04.  This is around 5pm.  I looked at the summary 
dump and called in a Sev1 to CA.  They were able to quickly identify the issue 
and provide a work around (there was a ptf but we could not apply it at DR).  
So the secondary workaround - disable PDS Member Security.  It turned out that 
9 months before another shop running ACF2 had the same issue.  They ran the 
problem with CA for several months (I think about 4) before CA was able to 
determine that a new instruction was used that only ran on z9 and above.  We 
were running on a z890 at DR but a z9 in our home.  If that other shop had not 
pursued and fixed, our DR test would have failed.  So I am very grateful they 
did what they had to do to get it fixed.  

I understand that some shops do not have the time to pursue issues with 
vendors.  I always hope they have the foresight to open the case, collect some 
doc, and then if needed, defer to pursue.  Then there is a record in the 
vendor's shop and if more show up, they are likely to pursue more quickly.  Or 
if the support team is bored, they have something to work on that might be 
interesting.

I  hope your future interactions are better.

Lizette

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Charles Mills
Sent: Sunday, July 28, 2013 6:38 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Looking for help with an obscure C integer problem

Okay, everyone is beating me up for not reporting this problem. Let me tell you 
the story that put me off to PMRs.

In January of 2012 or so I sent a note to IBMMAIN saying that that CEE3DMP was 
printing garbage and then S0C4ing. You can probably find my post if you search. 
At the urging of this group I opened a PMR on February 3. 

Here was the first battle I had to fight. The program in question is a STC that 
installs exits and so I implemented Signal handling so that in the event of a 
problem it could shut down gracefully. To test that code I added an 
undocumented MODIFY command to force a S0C4. I encountered the problem while 
testing that intentional S0C4/signal handling code. That was how I could 
reproduce the problem. I went around and around with the level 1 folks saying 
"we found your problem -- you are trying to store into low core" and me saying 
"I am doing that on purpose to force this problem. Look at the PMR description. 
I know what causes the original S0C4 -- it's the garbage and the S0C4 in 
CEE3DMP that I am PMRing" and they would come back and say "your problem is you 
are trying to store into low core."

After I got past that IBM wanted dump after dump after dump from me. They did 
not reproduce the problem -- it was all "change this option, try it again, and 
send us the dump." I sent them a total of nine different tersed dumps and 
similar files over the course of three months. Not a trivial thing with an STC 
that is intertwined with the z/OS kernel. Finally they said "okay, we have this 
figured out, we're going to work on it" and then in OCTOBER they sent me a 
local fix to test. I tested it and confirmed that they had fixed it.

Then IBM came back to me and said "do you really, really, REALLY need this 
fixed?" and I said "no, of course not, I've been living for eight months with 
it not fixed." They said "good, because if we really issue a PTF for this, we 
need to do regression testing and everything which is a lot of work. How about 
if we just roll the fix into the next release of LE (z/OS 2.0!)." At that point 
I said "sure, whatever." Whereupon IBM said "by the way, there's no guarantee 
your local fix won't go away if we happen to issue some other PTF that impacts 
it."

Needless to say I am not very encouraged to open PMRs based on that experience.

I think those of you who say "don't you care about the customers?" have it 
totally backwards. Of course I care about the customers. If someone posted here 
"that CorreLog SIEM agent is okay but it seems to S0C4 from time to time" I 
would be all over it. I would track the OP down, find out what he was doing, 
duplicate the problem, and fix it. I would not wait until the customer jumped 
through some particular process hoop before I acknowledged the problem, or 
prove how much the S0C4 was hurting their production. I care about MY 
customers, and I care about IBM's customers, but you can't push a piece of 
string: I can't care about IBM's customers more than IBM does.

What IBM SHOULD be doing IMHO -- and I was shocked that for the first time in 
my experience they did it in the case of the error of this thread -- is have 
someone, an ombudsman, monitoring this list and with authority to open problems 
and get them fixed. I am very, very pleased to see that that is what is 
happening in this one particular case. 

Note that I was very forthcoming with information here even after my particular 
problem went away. I am not selfish. I posted the listings that enabled some 
very clever people to find the problem for IBM. I am very willing to contribute 
effort to solving problems on behalf of the community. I have just come to the 
conclusion that the PMR process is simply not sufficiently cost/benefit 
effective to work for me.

Thanks for listening.

Charles
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Paul Gilmartin
Sent: Saturday, July 27, 2013 10:06 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Looking for help with an obscure C integer problem

On Sat, 27 Jul 2013 14:59:15 -0500, Ed Gould wrote:
>
>This is in my opinion a fuzzy error. Where the C issue is a classic
>bug(probably) your SR Policy is a gray area and it is harder to define 
>a right/wrong answer (although probably you are right).
>Spelling & such is a issue but not a real bug. It is a soft issue, in 
>my opinion and a lot harder to get IBM interested.
> 
I certainly did not say that my "SR Policy" thread was motivated by the 
response to a report of a spelling error.
In fact, it was a report of the misbehavior I discussed in the
"Buffering: stdout ..." thread.  I suppose you're free to deem it cosmetic or 
not as you choose.

The good news is that IBM has reproduced the problem, even while prodding me 
for a statement of business impact (which I supplied, acknowledging that I was 
working on a PoC).

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Looking for help with an obscure C integer problem

Reply via email to