Re: S106 abends after copying into LINKLIST

John Eells Mon, 08 Oct 2018 12:44:22 -0700

I do not know whether LLA keeps a pointer to the first text record(though it might), but it would certainly need the preceding associatedcontrol and ESD records to be cached as well if the first read done isfor a text record. I would expect that, since the ESD and controlrecords encode their own length, they are read with the SLI bit on inthe CCW, so that incorrect length does not cause any sort of I/O error,logical or otherwise. The same goes for RLD records, and it might alsoapply to others.

Based on some research I did a long time ago, here is how I believethings work:

The control record contains a CCW fragment to be used in constructingthe Read CCW for the next text record, unless it's the last. PCIprocessing is used to chain onto the channel program to get the entiremodule in one shot unless the system is so busy the PCI can't beserviced in time to add to the chain and the I/O operation terminates.In that case, I believe it's restarted where it left off.

The read CCW for the text record should be constructed using thespecific length stored in the control record, and I would not expect theSLI bit to be used for that CCW. On that basis, I would agree that ifthe first "text" record you read does not have the expected length thatthe unexpected status back from the device would likely result in a"logical I/O error." However, it's possible that SLI is used for theread (I have not read the code), and that would make other reasons(empty track, no record at that location on a track, additional extents,etc.) more likely culprits for ABEND106-F RC40. For performancereasons, though, I would expect that SLI is not set. This code wasoriginally written before control unit cache existed and was designed tobe really good at avoiding unecessary disk latency. And, of course, wemight change details in the code at any time (though why we would everwant to is a good question!).

The text records themselves are of variable length. They have a minimumlength of 1024 bytes, and a maximum length of the track length or blocksize, whichever is smaller. The Binder (and COPYMOD) try to write theminimum possible number based on those limits. They issue TRKBAL tofind out how much space is left on the track, and write records onfollowing tracks as needed to finish writing a load module. (This iswhy 32760 is the best block size for load libraries.)

Because the directory pointers to PDS members are TTR pointers, everyload module does not generally happen to start on a new track. Thismeans that large block sizes rarely if ever result in uniform textrecord lengths. They do result in fewer text records if the modules'lengths exceed a lower block size, though.

All the above applies to load modules. I have no idea how this worksunder the covers for program objects, but Program Management AdvancedFacilities documents load module records.

Just some random additional info to reinforce the "except under narrowcircumstances, with sufficient advance reflection, and malice--er, riskacceptance-aforethought, don't update running systems' data sets" othershave already expressed.


Michael Stein wrote:
<snip>


It's been a while but from what I remember about program fetch
here's a guess.

Looking up S106 RC 0F reason code 40:

    either an uncorrectable I/O error occurred or an error in the
    load module caused the channel program to fail.

Well, lets assume the hardware is work so this isn't a "real" I/O
error caused by some hardware problem.  And there are no dataset
extent changes, only the overwriting the dataset to empty it
out and then copying in the new modules.

Well the EOF pointer for the dataset got moved toward the front after
the directory.  This caused the new modules to be written starting at
the new EOF over the old modules.

And LLA still has the directory entries for the old modules, not the new
ones.  These now point into the new modules.  LLA's information includes
specific information on the first block of text of each old module:

   - the TTR of the first block of text
   - the length of the first block of text
   - the linkage editor assigned origin of first block of text

This allows program fetch to start with reading first text block,
rather than having to start at the beginning of the module.   Fetch can
build a CCW to directly read the first block since it knows the TTR of
the block and it's length and also the storage address (storage area +
block origin).

Since the old modules were overwritten, it's certain that the block at
the old location isn't the expected one.  There might not be a block there
giving no record found, there might be an EOF or there might be different
length block causing fetch's channel program to end with incorrect length.

This would explain the S106 RC 0F reason code 40.

This isn't that bad.  The length of the wrong block/module might
have matched.  I wonder if program fetch could successfully load the
wrong module.

Now with a blocksize of 32760, possibly each module will fit in one block
and they likely have different sizes so this wrong module case might
be unlikely.  Or something else might prevent loading the wrong module
(what?)  Or it may be possible to have a successful program fetch with
the wrong module.  And then attempt to execute it with the parameters
and environment of the old module.

What would that cause?  Program checks?  Mangled data?


--
John Eells
IBM Poughkeepsie
ee...@us.ibm.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: S106 abends after copying into LINKLIST

Reply via email to