I do not know whether LLA keeps a pointer to the first text record (though it might), but it would certainly need the preceding associated control and ESD records to be cached as well if the first read done is for a text record. I would expect that, since the ESD and control records encode their own length, they are read with the SLI bit on in the CCW, so that incorrect length does not cause any sort of I/O error, logical or otherwise. The same goes for RLD records, and it might also apply to others.

Based on some research I did a long time ago, here is how I believe things work:

The control record contains a CCW fragment to be used in constructing the Read CCW for the next text record, unless it's the last. PCI processing is used to chain onto the channel program to get the entire module in one shot unless the system is so busy the PCI can't be serviced in time to add to the chain and the I/O operation terminates. In that case, I believe it's restarted where it left off.

The read CCW for the text record should be constructed using the specific length stored in the control record, and I would not expect the SLI bit to be used for that CCW. On that basis, I would agree that if the first "text" record you read does not have the expected length that the unexpected status back from the device would likely result in a "logical I/O error." However, it's possible that SLI is used for the read (I have not read the code), and that would make other reasons (empty track, no record at that location on a track, additional extents, etc.) more likely culprits for ABEND106-F RC40. For performance reasons, though, I would expect that SLI is not set. This code was originally written before control unit cache existed and was designed to be really good at avoiding unecessary disk latency. And, of course, we might change details in the code at any time (though why we would ever want to is a good question!).

The text records themselves are of variable length. They have a minimum length of 1024 bytes, and a maximum length of the track length or block size, whichever is smaller. The Binder (and COPYMOD) try to write the minimum possible number based on those limits. They issue TRKBAL to find out how much space is left on the track, and write records on following tracks as needed to finish writing a load module. (This is why 32760 is the best block size for load libraries.)

Because the directory pointers to PDS members are TTR pointers, every load module does not generally happen to start on a new track. This means that large block sizes rarely if ever result in uniform text record lengths. They do result in fewer text records if the modules' lengths exceed a lower block size, though.

All the above applies to load modules. I have no idea how this works under the covers for program objects, but Program Management Advanced Facilities documents load module records.

Just some random additional info to reinforce the "except under narrow circumstances, with sufficient advance reflection, and malice--er, risk acceptance-aforethought, don't update running systems' data sets" others have already expressed.

Michael Stein wrote:
<snip>

It's been a while but from what I remember about program fetch
here's a guess.

Looking up S106 RC 0F reason code 40:

    either an uncorrectable I/O error occurred or an error in the
    load module caused the channel program to fail.

Well, lets assume the hardware is work so this isn't a "real" I/O
error caused by some hardware problem.  And there are no dataset
extent changes, only the overwriting the dataset to empty it
out and then copying in the new modules.

Well the EOF pointer for the dataset got moved toward the front after
the directory.  This caused the new modules to be written starting at
the new EOF over the old modules.

And LLA still has the directory entries for the old modules, not the new
ones.  These now point into the new modules.  LLA's information includes
specific information on the first block of text of each old module:

   - the TTR of the first block of text
   - the length of the first block of text
   - the linkage editor assigned origin of first block of text

This allows program fetch to start with reading first text block,
rather than having to start at the beginning of the module.   Fetch can
build a CCW to directly read the first block since it knows the TTR of
the block and it's length and also the storage address (storage area +
block origin).

Since the old modules were overwritten, it's certain that the block at
the old location isn't the expected one.  There might not be a block there
giving no record found, there might be an EOF or there might be different
length block causing fetch's channel program to end with incorrect length.

This would explain the S106 RC 0F reason code 40.

This isn't that bad.  The length of the wrong block/module might
have matched.  I wonder if program fetch could successfully load the
wrong module.

Now with a blocksize of 32760, possibly each module will fit in one block
and they likely have different sizes so this wrong module case might
be unlikely.  Or something else might prevent loading the wrong module
(what?)  Or it may be possible to have a successful program fetch with
the wrong module.  And then attempt to execute it with the parameters
and environment of the old module.

What would that cause?  Program checks?  Mangled data?

--
John Eells
IBM Poughkeepsie
ee...@us.ibm.com

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to