Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Seymour J Metz
Before CT and GTF, if they're running? Does yorr strategy changee between SRB 
and TCB?

--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3
עַם יִשְׂרָאֵל חַי
נֵ֣צַח יִשְׂרָאֵ֔ל לֹ֥א יְשַׁקֵּ֖ר


From: IBM Mainframe Discussion List  on behalf of Ed 
Jaffe 
Sent: Sunday, December 24, 2023 4:05 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: RETRY - was ARR and CSVQUERY

On 12/24/2023 9:35 AM, Tom Brennan wrote:
> Thanks Peter!  Yes, it was the surprise of an 0C4 when I expected 0C1.
> Sometimes when totally confusing things like that happen I first
> assume the computer itself is at fault, not the code I'm working on.
> And guess what, it's always the code :)

This is why the first thing I look at right after inspecting PSW and
registers in an SVC dump is the System Trace. I reiterate this on slide
12 of this presentation:

https://secure-web.cisco.com/15OJFhXZDjMhrQH_mFHa_s-M5l5D_KNgHnn34lYPQr2DLLw39aC-U8HA4M9HnDTWZyAINKRf4RF-ED2GMLttRFL2LoKo0--faVnohK9ntpwRffttftzWF1O2Xyb41Tj_LrcU84hJK8qtfDb9h34LCijAbH9G9X-Vid_-2gE3fdYYiQ-1eTd0JZhbTd7OmAjNb2NvZl5kh3Hbd9lWkd_Yd4mi6op2YbI7sp_U_KXuO9AV-oA24o4AQ3b52X4yUstf2BsSNER4GETGFJQG9fNQFoCzhOPCGly8oWC7UzAv6ZRR54Z3Cw1uOzrdjx9f-4cVArpbeGfQPzM6COtRLJPhykSHcqI3SgVWKPvRCap9pxw3fuLyoPfMzsEWHwUHC4VtFyoyn69oiRS_xpfRDfOPev3xM_3LNvv1_nSztEkl_KBFf7Z7xfHODpR_FHnL6rsG4/https%3A%2F%2Fphoenixsoftware.com%2Fftp%2Fdemo%2FThe%2520MVS%2520System%2520Trace.pdf

I search from the top for the word "before" and then look from that
point in the trace for "RCVY" -- or just an asterisk ("*") if "RCVY" not
found -- as part of my root cause analysis.

~95% of the time, I see in the trace exactly what I expect to see based
on the problem description. The other ~5% of the time I see something
unexpected that provides unique insight.

I don't know if my approach is a common or prevailing one, but it has
served me well in recent years...


--
Phoenix Software International
Edward E. Jaffe
831 Parkview Drive North
El Segundo, CA 90245
https://secure-web.cisco.com/1l-dkjgajolIV_jdPjvV0NyBcIxIl26Yd2Gs4naE1T6K3zAJD6EumPhpvzHpuo6wjUpU8d4bHxQ8yoJlwuCx3_mYbR0sBFuIjKu8W9sY5MyVltZpG2xdf-8QniHvSa5OD6sZqM21vQEDm7gCPKP-LQFCGYe8s7oemCIkvxk_mVAYTYRyn36aYzs09fV_mRBFJ9QY9D8CPseVZAeAe3B8TIEBENJcO5XOGjbFZPB9VS_gCSdeZ_SvbW1StOyxL1z2rDw8zFgv4lBknazK1VtUeOLZWW0YnkFFjEcK2Bpvq9pLh0Ww6r4MNMzT6TWdUnKH-nNtWFAsA5ojy7lLgzmaYztqbJ_5GHiZl6G3ZzFqyE7uuAshJ_NTP7mLscSsOUJ2cHZPz-ucx2QfQp0nzsDTUu6JbbhAy4-CXKJ41PftkZGxy3q4HmGMW3dWCdOqtOeNC/https%3A%2F%2Fwww.phoenixsoftware.com%2F



This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Ed Jaffe

On 12/24/2023 9:35 AM, Tom Brennan wrote:
Thanks Peter!  Yes, it was the surprise of an 0C4 when I expected 0C1. 
Sometimes when totally confusing things like that happen I first 
assume the computer itself is at fault, not the code I'm working on.  
And guess what, it's always the code :)


This is why the first thing I look at right after inspecting PSW and 
registers in an SVC dump is the System Trace. I reiterate this on slide 
12 of this presentation:


https://phoenixsoftware.com/ftp/demo/The%20MVS%20System%20Trace.pdf

I search from the top for the word "before" and then look from that 
point in the trace for "RCVY" -- or just an asterisk ("*") if "RCVY" not 
found -- as part of my root cause analysis.


~95% of the time, I see in the trace exactly what I expect to see based 
on the problem description. The other ~5% of the time I see something 
unexpected that provides unique insight.


I don't know if my approach is a common or prevailing one, but it has 
served me well in recent years...



--
Phoenix Software International
Edward E. Jaffe
831 Parkview Drive North
El Segundo, CA 90245
https://www.phoenixsoftware.com/



This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Binyamin Dissen
Or the hardware.

Certain machine checks while masked will force a check-stop.

On Sun, 24 Dec 2023 12:58:06 -0600 Jon Perryman  wrote:

:>Hi Tom,

:>I think Peter misinterpreted your question because you provided too much 
information.

:>> Peter Relson wrote:
:>> I'm now thinking you just meant that you were surprised that the recovery 
routine did not complete successfully.

:>I think you are asking the academic question if there is a time when RTM is 
disabled or inactive. Or if you can encounter a situation where RTM is disabled 
or inactive. Maybe you can form your question in terms of RTM behavior that you 
want to understand.

:>To clarify LPAR DISABLED WAIT state, I believe it's part of RTM. While it's a 
drastic form of recovery, the LPAR should never be left running random 
instructions. My point was that I believe that RTM always provides some sort of 
recovery even if the recovery is not actual recovery.

--
Binyamin Dissen 
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Jon Perryman
Hi Tom,

I think Peter misinterpreted your question because you provided too much 
information.

> Peter Relson wrote:
> I'm now thinking you just meant that you were surprised that the recovery 
> routine did not complete successfully.

I think you are asking the academic question if there is a time when RTM is 
disabled or inactive. Or if you can encounter a situation where RTM is disabled 
or inactive. Maybe you can form your question in terms of RTM behavior that you 
want to understand.

To clarify LPAR DISABLED WAIT state, I believe it's part of RTM. While it's a 
drastic form of recovery, the LPAR should never be left running random 
instructions. My point was that I believe that RTM always provides some sort of 
recovery even if the recovery is not actual recovery.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Jay Maynard
Doggone computers...durn things always do what you tell them to.

On Sun, Dec 24, 2023 at 11:35 AM Tom Brennan 
wrote:

> Thanks Peter!  Yes, it was the surprise of an 0C4 when I expected 0C1.
> Sometimes when totally confusing things like that happen I first assume
> the computer itself is at fault, not the code I'm working on.  And guess
> what, it's always the code :)
>
> On 12/24/2023 5:58 AM, Peter Relson wrote:
> > Tom B wrote
> > 
> > I was referring to my experience with a JES2 exit which setup its own
> > recovery routine.  In that code you could see it free any getmain'd
> > memory, etc. like you mentioned.  But also in that code was an error
> > that caused an 0C4.  So when the x'00' I added for temporary debugging
> > ran that user-coded recovery routine, I was surprised to get an 0C4
> > instead and had to fix the recovery routine.
> >
> > So of course JES2 had its own recovery routine in place that handled
> > the 0C4 and we got a dump and JES2 went on its merry way, perhaps after
> > disabling that exit (I can't remember).
> > 
> >
> > I took a weird view of what I suspect you really meant by "0C4 instead".
> I'm now thinking
> > you just meant that you were surprised that the recovery routine did not
> complete successfully.
> > But in case you were thinking of what happened to come to my mind,
> here's some info:
> >
> > When the x'00' "instruction" was executed, it would have gotten an
> operation exception
> > and the most recently established recovery routine (see "special-case"
> below) would have gotten control for the 0C1.
> > Its SDWA would have shown that. And TCBCMPC would be x'0C1000'.
> >
> > If that recovery routine then took some exception that resulted in an
> 0C4, a newer recovery routine (established by this recovery routine) or, in
> the absence of such, the next-oldest recovery routine would have gotten
> control for the 0C4. Its SDWA would have shown that . TCBCMPC would now be
> x'0C4000'.
> >
> > Special-Case: if you have established SPIE/ESPIE for a program
> interrupt, that exit will get control even if there is a newer-established
> ESTAE-type recovery routine.
> >
> > Peter Relson
> > z/OS Core Technology Design
> >
> >
> > --
> > For IBM-MAIN subscribe / signoff / archive access instructions,
> > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
> >
> >
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>


-- 
Jay Maynard

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Tom Brennan
Thanks Peter!  Yes, it was the surprise of an 0C4 when I expected 0C1. 
Sometimes when totally confusing things like that happen I first assume 
the computer itself is at fault, not the code I'm working on.  And guess 
what, it's always the code :)


On 12/24/2023 5:58 AM, Peter Relson wrote:

Tom B wrote

I was referring to my experience with a JES2 exit which setup its own
recovery routine.  In that code you could see it free any getmain'd
memory, etc. like you mentioned.  But also in that code was an error
that caused an 0C4.  So when the x'00' I added for temporary debugging
ran that user-coded recovery routine, I was surprised to get an 0C4
instead and had to fix the recovery routine.

So of course JES2 had its own recovery routine in place that handled
the 0C4 and we got a dump and JES2 went on its merry way, perhaps after
disabling that exit (I can't remember).


I took a weird view of what I suspect you really meant by "0C4 instead". I'm 
now thinking
you just meant that you were surprised that the recovery routine did not 
complete successfully.
But in case you were thinking of what happened to come to my mind, here's some 
info:

When the x'00' "instruction" was executed, it would have gotten an operation 
exception
and the most recently established recovery routine (see "special-case" below) 
would have gotten control for the 0C1.
Its SDWA would have shown that. And TCBCMPC would be x'0C1000'.

If that recovery routine then took some exception that resulted in an 0C4, a 
newer recovery routine (established by this recovery routine) or, in the 
absence of such, the next-oldest recovery routine would have gotten control for 
the 0C4. Its SDWA would have shown that . TCBCMPC would now be x'0C4000'.

Special-Case: if you have established SPIE/ESPIE for a program interrupt, that 
exit will get control even if there is a newer-established ESTAE-type recovery 
routine.

Peter Relson
z/OS Core Technology Design


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-24 Thread Peter Relson
Tom B wrote

I was referring to my experience with a JES2 exit which setup its own
recovery routine.  In that code you could see it free any getmain'd
memory, etc. like you mentioned.  But also in that code was an error
that caused an 0C4.  So when the x'00' I added for temporary debugging
ran that user-coded recovery routine, I was surprised to get an 0C4
instead and had to fix the recovery routine.

So of course JES2 had its own recovery routine in place that handled
the 0C4 and we got a dump and JES2 went on its merry way, perhaps after
disabling that exit (I can't remember).


I took a weird view of what I suspect you really meant by "0C4 instead". I'm 
now thinking
you just meant that you were surprised that the recovery routine did not 
complete successfully.
But in case you were thinking of what happened to come to my mind, here's some 
info:

When the x'00' "instruction" was executed, it would have gotten an operation 
exception
and the most recently established recovery routine (see "special-case" below) 
would have gotten control for the 0C1.
Its SDWA would have shown that. And TCBCMPC would be x'0C1000'.

If that recovery routine then took some exception that resulted in an 0C4, a 
newer recovery routine (established by this recovery routine) or, in the 
absence of such, the next-oldest recovery routine would have gotten control for 
the 0C4. Its SDWA would have shown that . TCBCMPC would now be x'0C4000'.

Special-Case: if you have established SPIE/ESPIE for a program interrupt, that 
exit will get control even if there is a newer-established ESTAE-type recovery 
routine.

Peter Relson
z/OS Core Technology Design


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Jon Perryman
On Sat, 23 Dec 2023 21:02:18 +0200, Binyamin Dissen 
 wrote:

>On Fri, 22 Dec 2023 15:07:33 -0800 Tom Brennan 
>wrote:
>
>:>So are you implying that in z/OS there are environments where I can run
>:>a program without any built-in basic recovery?
>
>Yes. Most batch jobs run that way.

Recovery in batch jobs is not obvious. Consider step termination (e.g. 
DISP=(NEW,CATLG,DELETE), free user storage, in-flight database transactions and 
probably more),

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Jon Perryman
On Sat, 23 Dec 2023 15:54:38 +, Peter Relson  wrote:

>I view their being two main reasons for recovery (and not necessarily in the 
>order I show):

Everyone ignores the third main reason which is stopping abends from becoming 
catastrophic. IBM knows this is ignored and plans accordingly. IBM uses 
multiple techniques that are not obvious. 

For instance, consider how IBM handles the most dangerous z/OS user exit which 
is the last one anyone thinks. It's not allocation, job, interpreter nor any 
other exits that people think.

It's the message user exit. To stop people from doing something dumb, IBM 
requires a user exit for a specific message id (see SETPROG). IBM knows a 
single message user exit would invite catastrophic situations because it's far 
too complicated and most people won't understand the recovery requirements. You 
can code specific IOS, JES, z/OS and more message exits but the concept is that 
recovery only affects a small portion of the system.

IBM allows messages thru the SSI but few people venture onto the SSI because of 
the complexity. The expectation is that anyone using the SSI will use advanced 
techniques for messages that can be issued from almost any environment. SSI 
programmers must understand how and when to handle locks, FRR, ESTAE, running 
disabled and everything else that can affect SSI code.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Binyamin Dissen
Only if you do tricky stuff.

Such as playing with the PFLIH. If you get a program check there you may get a
disabled wait. The FLIH will recognize unexpected recursion.

Don't know if there is a "standard" IBM supported way to do this, though.

On Sat, 23 Dec 2023 10:20:58 -0800 Tom Brennan 
wrote:

:>Thanks Peter!  I always appreciate your responses and also the responses 
:>from others at IBM.  But I was trying to ask a question that I may not 
:>be able to ask correctly.  Let me try anyway:
:>
:>I was referring to my experience with a JES2 exit which setup its own 
:>recovery routine.  In that code you could see it free any getmain'd 
:>memory, etc. like you mentioned.  But also in that code was an error 
:>that caused an 0C4.  So when the x'00' I added for temporary debugging 
:>ran that user-coded recovery routine, I was surprised to get an 0C4 
:>instead and had to fix the recovery routine.
:>
:>So of course JES2 had it's own recovery routine in place that handled 
:>the 0C4 and we got a dump and JES2 went on its merry way, perhaps after 
:>disabling that exit (I can't remember).
:>
:>So my question to Jon was, is there any environment in z/OS where there 
:>is absolutely no recovery routine?  And if a program interrupt occurs I 
:>get no abend processing whatsoever and (I guess) a disabled wait. 
:>That's what I thought Jon was implying, that there are environments 
:>where I MUST code a recovery routine just to keep the system running at 
:>all.  I don't think there is such an environment.
:>
:>And after all this typing I still am not sure I can relay my question 
:>correctly.  So ignore it if I'm not making sense :)  It's academic anyway.
:>
:>On 12/23/2023 7:54 AM, Peter Relson wrote:
:>> Tom B wrote
:>>> So are you implying that in z/OS there are environments where I can run
:>>> a program without any built-in basic recovery?
:>> 
:>> To be a bit snide, you "can" run a program without any recovery, of course.
:>> Whether you should or not is an entirely different question.
:>> 
:>> I view their being two main reasons for recovery (and not necessarily in 
the order I show):
:>> 
:>> First, to deal with resources that might otherwise be left in an undesired 
state if you don't have recovery.
:>> Maybe that's storage you obtained or an ENQ or lock that you hold or any 
number of other things
:>> (perhaps even that you prefer to return to your caller with a return/reason 
code in that case rather than
:>> an abend). But if you know that the system will release the resource in 
question in a timely fashion, maybe you don't care.
:>> For example, suppose you know that you are the jobstep program and you 
obtain private
:>> storage in a jobstep or task-related subpool and blow up,
:>> Maybe you don't bother freeing it because you know that the task will 
terminate and the system will free the storage
:>> (in your mainline you would probably free the storage for cleanliness 
reasons, but maybe you take the cheap way out in an abend case).
:>> But if you might be called by something else, that's a different ballgame. 
In that case,
:>> you do not know that the task will terminate - the caller might have 
recovery that retries.
:>> 
:>> Second, to capture serviceability data such as what was running and what 
was going on in order to help diagnosticians.
:>> That might be information in the SDWA and your use of recording to logrec;
:>> it could be a message written to the job log (but calling almost any 
service out of
:>> recovery might mention having recovery to protect something bad happening 
within that flow).
:>> It could be a dump of some type. In the "freeing storage" case,
:>> maybe the recovery isn't so much about freeing the storage but more about 
capturing data to help someone figure out what went wrong
:>> 
:>> Peter Relson
:>> z/OS Core Technology Design
:>> 
:>> 
:>> --
:>> For IBM-MAIN subscribe / signoff / archive access instructions,
:>> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
:>> 
:>> 
:>
:>--
:>For IBM-MAIN subscribe / signoff / archive access instructions,
:>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
Binyamin Dissen 
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Binyamin Dissen
On Fri, 22 Dec 2023 15:07:33 -0800 Tom Brennan 
wrote:

:>So are you implying that in z/OS there are environments where I can run 
:>a program without any built-in basic recovery?

Yes. 

Most batch jobs run that way.

--
Binyamin Dissen 
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Binyamin Dissen
On Fri, 22 Dec 2023 15:09:59 -0600 Jon Perryman  wrote:

:>On Fri, 22 Dec 2023 10:26:41 -0800, Tom Brennan  
wrote:

:>>But I think it's overkill for a recovery routine to have it's own
:>>recovery routine (if that's even possible in a JES2 exit environment).

:>z/OS exits have built in recovery, diagnostics and recursive abend handling. 
Since IBM did the work for you in exits, there's no need to duplicate that part 
of recovery.

SRB's w/o TCB percolation fail quite silently. The IBM recovery is simply to
clean up the SRB. So if you put recovery in a SRB and the recovery fails, it
will be silent.

Best to have the main recovery routine do the full stuff and attempt to retry
to finish what is needed.

The recovery routine for the recovery routine will dump, mark things in CSA
and percolate..

--
Binyamin Dissen 
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Tom Brennan

Yes, and I'd add:
if you get 4096 - free 4096

Don't free 1024 like I did once.  Code like that tests just fine but 
then dies 8 hours later when the address space runs out :)


On 12/23/2023 8:12 AM, Colin Paice wrote:

Expanding on what Peter said.  It is horses for courses.
If you are writing a program which can be running for months before restart
you need to clean up everything - for example ensure any storage obtained
is released.
Someone gave me some guidance
if you open it - close it
if you get it - free it
if you lock it, - unlock it
if you change something (which is not yours) unchange it
Leave nothing but footprints (trace entries)
You need to release locks (in the correct order) to maintain transactional
and data integrity.  Think of two phase commit, and with database updates -
all updates need to occur or none.

In some of the code of the product I was involved in - sometimes there was
the same amount or more code in the recovery routine than in the mainline
code!

Colin

On Sat, 23 Dec 2023 at 15:55, Peter Relson  wrote:


Tom B wrote

So are you implying that in z/OS there are environments where I can run
a program without any built-in basic recovery?


To be a bit snide, you "can" run a program without any recovery, of course.
Whether you should or not is an entirely different question.

I view their being two main reasons for recovery (and not necessarily in
the order I show):

First, to deal with resources that might otherwise be left in an undesired
state if you don't have recovery.
Maybe that's storage you obtained or an ENQ or lock that you hold or any
number of other things
(perhaps even that you prefer to return to your caller with a
return/reason code in that case rather than
an abend). But if you know that the system will release the resource in
question in a timely fashion, maybe you don't care.
For example, suppose you know that you are the jobstep program and you
obtain private
storage in a jobstep or task-related subpool and blow up,
Maybe you don't bother freeing it because you know that the task will
terminate and the system will free the storage
(in your mainline you would probably free the storage for cleanliness
reasons, but maybe you take the cheap way out in an abend case).
But if you might be called by something else, that's a different ballgame.
In that case,
you do not know that the task will terminate - the caller might have
recovery that retries.

Second, to capture serviceability data such as what was running and what
was going on in order to help diagnosticians.
That might be information in the SDWA and your use of recording to logrec;
it could be a message written to the job log (but calling almost any
service out of
recovery might mention having recovery to protect something bad happening
within that flow).
It could be a dump of some type. In the "freeing storage" case,
maybe the recovery isn't so much about freeing the storage but more about
capturing data to help someone figure out what went wrong

Peter Relson
z/OS Core Technology Design


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Tom Brennan
Thanks Peter!  I always appreciate your responses and also the responses 
from others at IBM.  But I was trying to ask a question that I may not 
be able to ask correctly.  Let me try anyway:


I was referring to my experience with a JES2 exit which setup its own 
recovery routine.  In that code you could see it free any getmain'd 
memory, etc. like you mentioned.  But also in that code was an error 
that caused an 0C4.  So when the x'00' I added for temporary debugging 
ran that user-coded recovery routine, I was surprised to get an 0C4 
instead and had to fix the recovery routine.


So of course JES2 had it's own recovery routine in place that handled 
the 0C4 and we got a dump and JES2 went on its merry way, perhaps after 
disabling that exit (I can't remember).


So my question to Jon was, is there any environment in z/OS where there 
is absolutely no recovery routine?  And if a program interrupt occurs I 
get no abend processing whatsoever and (I guess) a disabled wait. 
That's what I thought Jon was implying, that there are environments 
where I MUST code a recovery routine just to keep the system running at 
all.  I don't think there is such an environment.


And after all this typing I still am not sure I can relay my question 
correctly.  So ignore it if I'm not making sense :)  It's academic anyway.


On 12/23/2023 7:54 AM, Peter Relson wrote:

Tom B wrote

So are you implying that in z/OS there are environments where I can run
a program without any built-in basic recovery?


To be a bit snide, you "can" run a program without any recovery, of course.
Whether you should or not is an entirely different question.

I view their being two main reasons for recovery (and not necessarily in the 
order I show):

First, to deal with resources that might otherwise be left in an undesired 
state if you don't have recovery.
Maybe that's storage you obtained or an ENQ or lock that you hold or any number 
of other things
(perhaps even that you prefer to return to your caller with a return/reason 
code in that case rather than
an abend). But if you know that the system will release the resource in 
question in a timely fashion, maybe you don't care.
For example, suppose you know that you are the jobstep program and you obtain 
private
storage in a jobstep or task-related subpool and blow up,
Maybe you don't bother freeing it because you know that the task will terminate 
and the system will free the storage
(in your mainline you would probably free the storage for cleanliness reasons, 
but maybe you take the cheap way out in an abend case).
But if you might be called by something else, that's a different ballgame. In 
that case,
you do not know that the task will terminate - the caller might have recovery 
that retries.

Second, to capture serviceability data such as what was running and what was 
going on in order to help diagnosticians.
That might be information in the SDWA and your use of recording to logrec;
it could be a message written to the job log (but calling almost any service 
out of
recovery might mention having recovery to protect something bad happening 
within that flow).
It could be a dump of some type. In the "freeing storage" case,
maybe the recovery isn't so much about freeing the storage but more about 
capturing data to help someone figure out what went wrong

Peter Relson
z/OS Core Technology Design


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Colin Paice
Expanding on what Peter said.  It is horses for courses.
If you are writing a program which can be running for months before restart
you need to clean up everything - for example ensure any storage obtained
is released.
Someone gave me some guidance
if you open it - close it
if you get it - free it
if you lock it, - unlock it
if you change something (which is not yours) unchange it
Leave nothing but footprints (trace entries)
You need to release locks (in the correct order) to maintain transactional
and data integrity.  Think of two phase commit, and with database updates -
all updates need to occur or none.

In some of the code of the product I was involved in - sometimes there was
the same amount or more code in the recovery routine than in the mainline
code!

Colin

On Sat, 23 Dec 2023 at 15:55, Peter Relson  wrote:

> Tom B wrote
> >So are you implying that in z/OS there are environments where I can run
> >a program without any built-in basic recovery?
>
> To be a bit snide, you "can" run a program without any recovery, of course.
> Whether you should or not is an entirely different question.
>
> I view their being two main reasons for recovery (and not necessarily in
> the order I show):
>
> First, to deal with resources that might otherwise be left in an undesired
> state if you don't have recovery.
> Maybe that's storage you obtained or an ENQ or lock that you hold or any
> number of other things
> (perhaps even that you prefer to return to your caller with a
> return/reason code in that case rather than
> an abend). But if you know that the system will release the resource in
> question in a timely fashion, maybe you don't care.
> For example, suppose you know that you are the jobstep program and you
> obtain private
> storage in a jobstep or task-related subpool and blow up,
> Maybe you don't bother freeing it because you know that the task will
> terminate and the system will free the storage
> (in your mainline you would probably free the storage for cleanliness
> reasons, but maybe you take the cheap way out in an abend case).
> But if you might be called by something else, that's a different ballgame.
> In that case,
> you do not know that the task will terminate - the caller might have
> recovery that retries.
>
> Second, to capture serviceability data such as what was running and what
> was going on in order to help diagnosticians.
> That might be information in the SDWA and your use of recording to logrec;
> it could be a message written to the job log (but calling almost any
> service out of
> recovery might mention having recovery to protect something bad happening
> within that flow).
> It could be a dump of some type. In the "freeing storage" case,
> maybe the recovery isn't so much about freeing the storage but more about
> capturing data to help someone figure out what went wrong
>
> Peter Relson
> z/OS Core Technology Design
>
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-23 Thread Peter Relson
Tom B wrote
>So are you implying that in z/OS there are environments where I can run
>a program without any built-in basic recovery?

To be a bit snide, you "can" run a program without any recovery, of course.
Whether you should or not is an entirely different question.

I view their being two main reasons for recovery (and not necessarily in the 
order I show):

First, to deal with resources that might otherwise be left in an undesired 
state if you don't have recovery.
Maybe that's storage you obtained or an ENQ or lock that you hold or any number 
of other things
(perhaps even that you prefer to return to your caller with a return/reason 
code in that case rather than
an abend). But if you know that the system will release the resource in 
question in a timely fashion, maybe you don't care.
For example, suppose you know that you are the jobstep program and you obtain 
private
storage in a jobstep or task-related subpool and blow up,
Maybe you don't bother freeing it because you know that the task will terminate 
and the system will free the storage
(in your mainline you would probably free the storage for cleanliness reasons, 
but maybe you take the cheap way out in an abend case).
But if you might be called by something else, that's a different ballgame. In 
that case,
you do not know that the task will terminate - the caller might have recovery 
that retries.

Second, to capture serviceability data such as what was running and what was 
going on in order to help diagnosticians.
That might be information in the SDWA and your use of recording to logrec;
it could be a message written to the job log (but calling almost any service 
out of
recovery might mention having recovery to protect something bad happening 
within that flow).
It could be a dump of some type. In the "freeing storage" case,
maybe the recovery isn't so much about freeing the storage but more about 
capturing data to help someone figure out what went wrong

Peter Relson
z/OS Core Technology Design


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Jon Perryman
On Fri, 22 Dec 2023 18:35:52 -0800, Tom Brennan  
wrote:
>Never mind, my question wasn't clear and I don't know how to better explain it.
>>
>>> So are you implying that in z/OS there are environments where I can run
>>> a program without any built-in basic recovery?

Sorry I misunderstood that you are asking if there is a user exit that is 
called when all abend recovery is disabled.

What I intended to say was "IBM supplying useful recovery". For instance, 
recovery can range from causing machine wait states to fully recoverable 
environments like dynamic exits which rarely cause disruptions.

I suspect that all user exits run when PSA PROGRAM CHECK PSW is enabled so some 
type of recovery should be active although it may not be useful. If you want to 
know this, then you'll need to ask Peter Relson.



 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Tom Brennan
Nevermind, my question wasn't clear and I don't know how to better 
explain it.


On 12/22/2023 5:25 PM, Jon Perryman wrote:

On Fri, 22 Dec 2023 15:07:33 -0800, Tom Brennan  
wrote:


So are you implying that in z/OS there are environments where I can run
a program without any built-in basic recovery?


I don't condone omitting recovery but CBTTAPE.ORG has many exits that do not 
include any recovery. The vast majority of exit points have built-in recovery 
which does a great recovery job. Someone mentioned JES exits but even those 
even those exits mostly ask (not require) you simply set a return code.  IBM 
avoids detrimental situations by disabling an exit if it has abended too often.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Jon Perryman
On Fri, 22 Dec 2023 15:07:33 -0800, Tom Brennan  
wrote:

>So are you implying that in z/OS there are environments where I can run
>a program without any built-in basic recovery?

I don't condone omitting recovery but CBTTAPE.ORG has many exits that do not 
include any recovery. The vast majority of exit points have built-in recovery 
which does a great recovery job. Someone mentioned JES exits but even those 
even those exits mostly ask (not require) you simply set a return code.  IBM 
avoids detrimental situations by disabling an exit if it has abended too often.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Tom Brennan
So are you implying that in z/OS there are environments where I can run 
a program without any built-in basic recovery?


On 12/22/2023 1:09 PM, Jon Perryman wrote:

On Fri, 22 Dec 2023 10:26:41 -0800, Tom Brennan  
wrote:


But I think it's overkill for a recovery routine to have it's own
recovery routine (if that's even possible in a JES2 exit environment).


z/OS exits have built in recovery, diagnostics and recursive abend handling. 
Since IBM did the work for you in exits, there's no need to duplicate that part 
of recovery.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Jon Perryman
On Fri, 22 Dec 2023 10:26:41 -0800, Tom Brennan  
wrote:

>But I think it's overkill for a recovery routine to have it's own
>recovery routine (if that's even possible in a JES2 exit environment).

z/OS exits have built in recovery, diagnostics and recursive abend handling. 
Since IBM did the work for you in exits, there's no need to duplicate that part 
of recovery.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Tom Brennan
I remember adding X'00' to the instruction stream of a JES2 exit so it 
would abend on a test box, in order to dump data at that point.  I was 
very confused because it got an 0C4 instead.  Turned out the previous 
owner apparently never tested the recovery routine.


But I think it's overkill for a recovery routine to have it's own 
recovery routine (if that's even possible in a JES2 exit environment).


On 12/21/2023 9:21 PM, Jon Perryman wrote:

All recovery risks secondary abends and at the very least must issue a message 
that recovery failed.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Tony Harminc
On Fri, 22 Dec 2023 at 12:16, Jon Perryman  wrote:
[...]

> You missed my point where I was trying to get the OP to understand some
> fundamental mistakes made with ARR's and PC routines. Maybe you can help
> clarify what I did not make clear. My comment about SDWA and alet 2 (HOME)
> was prefaced with the code provided. It was not to say alet 2 should be
> used to reference SDWA but to point out duplication of alets.
>
> The OP's objective is to enhance CBTTAPE csect GRECOV which is recommended
> as a one size fits all recovery routine (e.g. ARR, FRR, ESTAEX, SRB and
> more). I would hope we could give him a little guidance to avoid confusing
> unsuspecting soles who decide to use this as their recovery concept. At the
> moment, GRECOV simply displays a few diagnostic messages. He's being
> secretive as you've noted but he's mentioned CSVQUERY, PRIMARY, SECONDARY,
> HOME and a few SDWA fields.
>
[...]

In the context of "unsuspecting soles" and recovery, a few lines from the
opening of Julius Caesar seem somehow appropriate:

[...]
MARULLUS.
But what trade art thou? Answer me directly.

SECOND CITIZEN.
A trade, sir, that, I hope, I may use with a safe
conscience, which is indeed, sir, a mender of bad soles.

MARULLUS.
What trade, thou knave? Thou naughty knave, what trade?

SECOND CITIZEN.
Nay, I beseech you, sir, be not out with me; yet,
if you be out, sir, I can mend you.

MARULLUS.
What mean'st thou by that? Mend me, thou saucy fellow!

SECOND CITIZEN.
Why, sir, cobble you.

FLAVIUS.
Thou art a cobbler, art thou?

SECOND CITIZEN.
Truly, Sir, all that I live by is with the awl; I meddle with
no tradesman's matters, nor women's matters, but with awl.
I am indeed, sir, a surgeon to old shoes; when they are in
great danger, I re-cover them.

[...]

Tony H.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-22 Thread Jon Perryman
On Thu, 21 Dec 2023 13:37:21 +, Peter Relson  wrote:

>>Jon P wrote
>>The SDWA can be referenced by alet 0 or alet 2.

>if this was meant to indicate that you could choose either, that wouldn't be 
>true for an ESTAEX or ARR or IEAARR established for primary <> home.

Hi Peter,

You missed my point where I was trying to get the OP to understand some 
fundamental mistakes made with ARR's and PC routines. Maybe you can help 
clarify what I did not make clear. My comment about SDWA and alet 2 (HOME) was 
prefaced with the code provided. It was not to say alet 2 should be used to 
reference SDWA but to point out duplication of alets.

The OP's objective is to enhance CBTTAPE csect GRECOV which is recommended as a 
one size fits all recovery routine (e.g. ARR, FRR, ESTAEX, SRB and more). I 
would hope we could give him a little guidance to avoid confusing unsuspecting 
soles who decide to use this as their recovery concept. At the moment, GRECOV 
simply displays a few diagnostic messages. He's being secretive as you've noted 
but he's mentioned CSVQUERY, PRIMARY, SECONDARY, HOME and a few SDWA fields.

Problem 1: He believes primary, secondary and home are predictable to GRECOV 
without understanding the specific PC definition. For GRECOV, secondary and 
home should not be used because they are completely unpredictable (PC 
definable). Primary is somewhat predictable where executable code, SDWA and PC 
storage obtain can be referenced. 

For example, the op mentioned SDWARBAD which is a pointer to the RB address. 
This address requires GRECOV analyze the PC definition to determine which (if 
any) ALET has access to the RB address.  

Possible problem 2: Does an abend in the PC routine percolate outside the PC 
routine? E.g. an ESTAE prior to the PC call. Either way, it requires special 
handling by the users of GRECOV.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-21 Thread Jon Perryman
On Wed, 20 Dec 2023 00:56:14 -0500, Tony Harminc  wrote:

>On Wed, 20 Dec 2023 at 00:48, Jon Perryman  wrote:
>> I locate the base reg and verify the branch around the module eyecatcher

>The trick is to not abend when you try looking at the eyecatcher. 
> A wild branch can easily destroy what you think is the base

There is no trick. All recovery risks secondary abends and at the very least 
must issue a message that recovery failed. This just adds one more use case 
that you can easily add to your secondary recovery. Secondary abends are very 
rare but you must always plan for them. For ARR and SETFRR routines, it's a 
little more complicated than ESTAEX but none the less you must plan how they 
will be handled.

>Where do you get the base reg?

You design according to what you know (e.g. abends, base regs, module 
eyecatchers and more). 

1. My product has a unique savearea (R13) eyecatcher. If it matches, the base 
reg is a known register. My code is what people call baseless but I chose to 
place the constants (including program eyecatcher with PTF) before the 
executable code. This makes it very IPCS friendly. There are other options you 
can use for baseless code which can also provide eyecatcher with offset but I 
won't delve into that here.

2. Base reg is always less than the abend address.

3. Base reg is always within a reasonable offset for assembler (e.g. 3 regs on 
using is 12K). 

4. More but I can't be bothered to list them here.

All the products I've worked on use this method because creating simple / 
useful diagnostics because it greatly simplifies problem solving for the 
customer and vendor. Consider when CSVQUERY.works giving you an offset into a 
LOAD MODULE name. You need to map the load module, find the CSECT at the module 
offset, subtract that offset to determine the CSECT offset and look up the PTF 
level for the CSECT. When CSVQUERY fails, I need you to send a dump to get this 
information. It's far easier for customers to simply cut and paste the 
diagnostics messages which has the CSECT and offset. 

This process is the same for user programs but realize when you get annoyed 
with this process, you are the one to blame.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-21 Thread Peter Relson
Jon P wrote

The SDWA can be referenced by alet 0 or alet 2.


Perhaps this was referring to the specific example.

But in general, if this was meant to indicate that you could choose either, 
that wouldn't be true for an ESTAEX or ARR or IEAARR established for primary <> 
home.
The SDWA is in the primary address space of the recovery routine given control 
(for an FRR, it's in common storage, so the ALET is not important).

Peter Relson
z/OS Core Technology


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-20 Thread Peter Relson
If you haven't changed the data or conceivably the environment, retrying the 
instruction will get the same result.
That shouldn't be a surprise. If you were to try, you'd likely have to decipher 
the instruction so that you could figure out what data and/or regs it was using 
and make sure to change something for "the next time".

I believe long ago it used to be the case that the PL/I compiler (this was 
before "C" was anything more than the 3rd letter of the alphabet), would get 
control for a page fault or segment fault on data that it knew about, set up 
the data so that a retry of the instruction would work, and do so.
Thinking about this now, we have no idea how it could do that in a predictable 
way (since it would get control for "normal" page faults too).

Of course, the architecture accommodates and allows retrying the instruction. 
That is used by the system when (for example) a page fault occurs. The system 
stops the work unit, asynchronously pages in the page (when the page actually 
was valid but paged out) and resets the work unit to re-execute the 
instruction. And the program old PSW's instruction address for a page fault 
doesn't even need to be backed up to get to the right place to re-execute.

Peter Relson
z/OS Core Technology Design


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-19 Thread Tony Harminc
On Wed, 20 Dec 2023 at 00:48, Jon Perryman  wrote:

> On Tue, 19 Dec 2023 21:20:29 -0500, Joseph Reichman 
> wrote:
> [...]
> >If you are looking for entry point modname if primary CSVQUERY would give
> you that
>
> CSVQUERY will not always work which is why IBM diagnostic messages do not
> include the load module name and entry point. This is why I locate the base
> reg and verify the branch around the module eyecatcher (including length).
> I use this same standard in my modules to be compatible.
>

The trick is to not abend when you try looking at the eyecatcher. Where do
you get the base reg? A wild branch can easily destroy what you think is
the base (and that may well be why you abended in the first place), and
then if you reference what you think is the eyecatcher you may have a
second failure to recover from.

Tony H.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-19 Thread Jon Perryman
On Tue, 19 Dec 2023 21:20:29 -0500, Joseph Reichman  
wrote:

>It seems to me that SDWA has values from the home address space such as 
>SDWARBAD 

For the CBT recovery example, the PC definition has PRIMARY (alet 0) and HOME 
(alet 2) point to the PC owner ASID. Secondary (alet 1) is the PC caller 
address space. The SDWA can be referenced by alet 0 or alet 2. SDWARBAD is in 
the PC caller address space which must use alet 1 which is not HOME.

All this can change depending upon the PC def or if it's called from another 
PC. Referencing each of the SDWA fields is dependent upon the environment.
  
>If you are looking for entry point modname if primary CSVQUERY would give you 
>that 

CSVQUERY will not always work which is why IBM diagnostic messages do not 
include the load module name and entry point. This is why I locate the base reg 
and verify the branch around the module eyecatcher (including length). I use 
this same standard in my modules to be compatible.  

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-19 Thread Jon Perryman
On Tue, 19 Dec 2023 23:13:46 GMT, esst...@juno.com  wrote:
.   
>I have been following the thread on ARR and CSVQUERY, and started   
>thinking about a retry routine.

The discussion is about a CBT file that has been referred to as "generic 
recovery". I took a 2 minute look at it. It does not do any type of recovery 
and I spotted multiple bugs. It only displays some diagnostic data which is not 
the hard part. 
If you are doing this for a specific project, then you should ask questions and 
describe what you want to accomplish. There are many approaches to solving 
recovery problems.

>When I hear the words "retry", I interpret that as: re-execute the  
>failing instruction.

Completely false interpretation. RETRY is a word that IBM chose because there 
was not a better word that describes the true functionality.  

Retry is the address where you want to continue. Consider IBM's use of abend 
recovery. For example, S0C4 abend is used to verify the existence of an 
address. 

In my case with a product, I close the current unit of work and then dispatch 
the next unit of work.
 
>Whether it's an ARR or ESTAE, it is my understanding that a Recovery
>Routine may be capable of retrying a failed instruction. 

You can choose to retry the failed instruction but remember the definition of 
insanity is trying the same thing over and over again while expecting different 
results. 
 
>Dose anyone have any experience in developing a ESTAEX or ARR
>where they actually retry the failing instruction ?  

Simply retrying the failed instruction serves no useful purpose.

> How should a recovery routine determine which register or data area is 
> invalid ?

You design your code to specifically handle this situation. In IBM's S0C4 case, 
they know the failing register or they look at the failing instruction. 
 
  
>How many times should a recovery routine try to re-execute the same
>failing instruction ? 

Your design dictates how many times you should retry.

>I guess I'm looking for a strategy for correcting the failing instruction. 
> 

It's very rare that you need to retry a failed instruction. Why don't you use a 
strategy that doesn't need to retry the failing instruction?

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-19 Thread Joseph Reichman
It seems to me that SDWA has values from the home address space such as 
SDWARBAD 

If you are looking for entry point modname if primary CSVQUERY would give you 
that 

> On Dec 19, 2023, at 6:35 PM, Binyamin Dissen  
> wrote:
> 
> Retry means retry the UOW, not (necessarily) the failed instruction.
> 
> For example, if the code is aware that a control block / address space may be
> gone while it wou;d be referenced, retrying means branching to the logic where
> the item was known to be gone.
> 
> Although I actually had a case where the failed instruction was reexecuted. It
> was trying to run existing  code on a ZIP/ZAP (all licensed) and the code did
> something invalid in SRB mode. In that case I had the code "retry" the failing
> instruction in TCB mode (more complicated than that) and continue on a CP.
> 
> On Tue, 19 Dec 2023 23:13:46 GMT "esst...@juno.com"  wrote:
> 
> :>*
> :> Hello,  
> :>.   
> :>I have been following the thread on ARR and CSVQUERY, and started   
> :>thinking about a retry routine. 
> :>Weather it's an ARR or ESTAE, it is my understanding that a Recovery
> :>Routine may be capable of retrying a failed instruction.
> :>.   
> :>When I hear the words "retry", I interpret that as: re-execute the  
> :>failing instruction. For a Recovery and/or Retry routine to re-execute  
> :>a failing instruction, I would think this logic would be a tad   
> :>complicated and suspect would be necessary for the recovery routine 
> :>and retry routine to have inherited dependence/knowledge of the register
> :> contents of the failing program at the time of failure.
> 
> :>.
> :>.
> :>Dose anyone have any experience in developing a ESTAEX or ARR
> :>where they actually retry the failing instruction ?  
> :>.
> :>To begin with -  
> :> How should a recovery routine determine which register or data area is 
> invalid ?
> :>.   
> :>I suspect the main program, could periodically save it registers   
> :>in a structure anchored in the Recovery Routines Parameter List.
> :>or some variation - 
> :>Then using those saved registers to compare against the registers   
> :>in the SDWA at the time of the abend -  
> :>or something like that -
>
> :>.
> :> .
> :>How many times should a recovery routine try to re-execute the same
> :>failing instruction ? 
> :>. 
> :>I guess I'm looking for a strategy for correcting the failing instruction.  
> 
> :>. 
> :>Are there any Tech Notes, documentation, methodology, procedures,   
> :>suggestios on how to accurately re-execute the failing instruction in  
> :>an ESTA or ARR? 
> :>.
> :>.
> :>
> :>paul dangelo 
> :>. 
> :>. 
> :>.   
> :> 
> :>
> :>--
> :>For IBM-MAIN subscribe / signoff / archive access instructions,
> :>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
> 
> --
> Binyamin Dissen 
> http://www.dissensoftware.com
> 
> Director, Dissen Software, Bar & Grill - Israel
> 
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: RETRY - was ARR and CSVQUERY

2023-12-19 Thread Binyamin Dissen
Retry means retry the UOW, not (necessarily) the failed instruction.

For example, if the code is aware that a control block / address space may be
gone while it wou;d be referenced, retrying means branching to the logic where
the item was known to be gone.

Although I actually had a case where the failed instruction was reexecuted. It
was trying to run existing  code on a ZIP/ZAP (all licensed) and the code did
something invalid in SRB mode. In that case I had the code "retry" the failing
instruction in TCB mode (more complicated than that) and continue on a CP.

On Tue, 19 Dec 2023 23:13:46 GMT "esst...@juno.com"  wrote:

:>*
:> Hello,  
:>.   
:>I have been following the thread on ARR and CSVQUERY, and started   
:>thinking about a retry routine. 
:>Weather it's an ARR or ESTAE, it is my understanding that a Recovery
:>Routine may be capable of retrying a failed instruction.
:>.   
:>When I hear the words "retry", I interpret that as: re-execute the  
:>failing instruction. For a Recovery and/or Retry routine to re-execute  
:>a failing instruction, I would think this logic would be a tad   
:>complicated and suspect would be necessary for the recovery routine 
:>and retry routine to have inherited dependence/knowledge of the register
:> contents of the failing program at the time of failure.  
  
:>.
:>.
:>Dose anyone have any experience in developing a ESTAEX or ARR
:>where they actually retry the failing instruction ?  
:>.
:>To begin with -  
:> How should a recovery routine determine which register or data area is 
invalid ?
:>.   
:>I suspect the main program, could periodically save it registers   
:>in a structure anchored in the Recovery Routines Parameter List.
:>or some variation - 
:>Then using those saved registers to compare against the registers   
:>in the SDWA at the time of the abend -  
:>or something like that -  
 
:>.
:> .
:>How many times should a recovery routine try to re-execute the same
:>failing instruction ? 
:>. 
:>I guess I'm looking for a strategy for correcting the failing instruction.
  
:>. 
:>Are there any Tech Notes, documentation, methodology, procedures,   
:>suggestios on how to accurately re-execute the failing instruction in  
:>an ESTA or ARR? 
:>.
:>.
:>
:>paul dangelo 
:>. 
:>. 
:>.   
:> 
:>
:>--
:>For IBM-MAIN subscribe / signoff / archive access instructions,
:>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
Binyamin Dissen 
http://www.dissensoftware.com

Director, Dissen Software, Bar & Grill - Israel

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


RETRY - was ARR and CSVQUERY

2023-12-19 Thread esst...@juno.com
*
 Hello,  
.   
I have been following the thread on ARR and CSVQUERY, and started   
thinking about a retry routine. 
Weather it's an ARR or ESTAE, it is my understanding that a Recovery
Routine may be capable of retrying a failed instruction.
.   
When I hear the words "retry", I interpret that as: re-execute the  
failing instruction. For a Recovery and/or Retry routine to re-execute  
a failing instruction, I would think this logic would be a tad   
complicated and suspect would be necessary for the recovery routine 
and retry routine to have inherited dependence/knowledge of the register
 contents of the failing program at the time of failure.

.
.
Dose anyone have any experience in developing a ESTAEX or ARR
where they actually retry the failing instruction ?  
.
To begin with -  
 How should a recovery routine determine which register or data area is invalid 
?
.   
I suspect the main program, could periodically save it registers   
in a structure anchored in the Recovery Routines Parameter List.
or some variation - 
Then using those saved registers to compare against the registers   
in the SDWA at the time of the abend -  
or something like that -
   
.
 .
How many times should a recovery routine try to re-execute the same
failing instruction ? 
. 
I guess I'm looking for a strategy for correcting the failing instruction.  

. 
Are there any Tech Notes, documentation, methodology, procedures,   
suggestios on how to accurately re-execute the failing instruction in  
an ESTA or ARR? 
.
.

paul dangelo 
. 
. 
.   


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN