Re: Fw: Dataspace versus common area above the bar
I have never found comparing instruction speeds to be a fair gauge of performance. It's not the choice of instructions (unless the original choices were very poor) that affect performance but algorithms. As has been pointed out, I have never seen any evidence that converting an algorithm using data spaces and alets to one using 64 bit instructions and shared memory objects would result in any measurable (2+%as an example) difference in performance. However, if the change afforded a way to significantly reduce the working set size or a way to search less frequently, this can often yield significant reductions in overhead. Some things are very difficult to quantify. For example, there is significant argument over the advantages of transactional memory versus locks. On the surface, locking is more efficient but at a cost to throughput. Transactional memory can use more cycle but improve throughput. So how do you quantify this? Almost 30 years ago, I developed a non-traditional storage manager that does not use chains. As a result, it does not experience storage fragmentation. It's path length varies slightly from the 1st to the millionth call. As a resut, it outperforms chained storage manager that require locks by many factors. And as the number of calls grow, the performance factor increases. Again I have never seen significant gains from using the same algorithms and simply changing the instructions. Whereas, I have seen x-fold performance reductions by improving algorithms. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jim Mulder Sent: Tuesday, January 21, 2014 7:25 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Fw: Dataspace versus common area above the bar > > AMODE does not affect performance. Can you explain which instructions > you think are faster than some functional equivalent, and why you > think they are faster? > > > and it may be that what we have here is a misunderstanding of my > language. Let me begin with a little history. On System/360 models > above the model 30, L was faster than LH because they had [at least] > four-byte fetch widths and had to 'throw away' half of what they > fetched for LH. > > In my experience, and I have made many measurements, the same > principle continues to apply mutatis mutandis today. > > I, for example, have a pair of assembly-language glb-seeking binary > search routines > that search the same table of quadword elements. One of these > routines is AMODE(31) and one AMODE(64).The table---The same > assembled table is always used---contains 63 elements. The usual 127 > searches are performed, each 256 times. In the upshot the AMODE(64) > routine is measurably, 2.1201%, faster. > > I have performed similar tests using searches of ordered lists of > 10(10)200 elements. They are more addressing-intensive, and the > superiority of the AMODE(64) routine increases almost linearly with > table size, from 2.0897% for a list of 10 elements to 2.3311% for a > list of 200 elements. > > Now it may be that what you mean by "AMODE does not affect > performance" is different from what I mean. If so, I should be > pleased to have you clarify the ways in which our uses of this word > are different. From a hardware design engineer: All hardware instructions perform at the same speed in 64-bit mode or 31-bit mode. I assume the AMODE(31) and AMODE(64) he is referring to only affects the addressing mode, but the exact same instruction sequences are used in both cases. If different code sequences are being used, then all bets are off. My first statement applies to the exact same code sequence in 64-bit addressing mode versus 31-bit addressing mode. A few millicoded instructions do have slightly different path lengths depending on addressing mode, but even that is not common. If you can send me the listings of the exact code that you are measuring, I might be able to analyze the difference that you are measuring. There certainly have been cases over the years where some processors required extra cycles to perform operand extension, especially when involves sign bit propagation. For specific instructions on a specific processor, I can ask the engineers if that is the case (as long as it is a recent enough processor that the engineers are still here). Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dataspace versus common area above the bar
Because I've used memory objects for so long, I have not had a reason for IARVSERV. I read both the description in the macro reference and in the authorized assembler guide and there seems to be a ton of restrictions and quirks (such as TPROT). The most notable restriction is the sharing limit of 16 pages for an unauthorized address space. However, this limit can be changed. But because of the ESQA considerations created because the page tables can map different virtual address for the shared pages, I'm not sure what would be a practical limit. It does appear to address guard and to some extent page protection . It also offers the ability to share 31 bit storage with 24 bit applications (a key point). Shared and common memory objects do not have any of IARVSERV restrictions and do not change my conclusion that performance is NOT the reason to convert to a memory object. It's the advanced functionality. One reason I use a common memory object is so I can avoid using CSA and SQA particularly for code. With the 16 page restriction it would be impractical to share code with IARVSERV. And common data spaces cannot execute code. There are no limits to the flexibility offered by memory objects. I can share any number of pages. With shared memory objects I can determine which address spaces have access and which do not. With common memory I can create my own CSA and even SQA with some restrictions. As Jim affirmed, there is probably little if any performance difference between data spaces and memory objects. Chose the one best suited to your architecture. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jim Mulder Sent: Monday, January 20, 2014 4:13 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Fw: Dataspace versus common area above the bar >> Memory objects are much more flexible than data spaces. Data spaces >> are limited to 2GB. Memory objects are only limited by the auxiliary storage. >> Memory objects can be guarded and can also be page protected. Data spaces >> cannot. Code can execute in memory object but not in data spaces. I started >> using memory objects 10 years ago and have nearly forgotten how to >> use a >> data space. > Guard pages and protected pages can be created in data spaces using >IARV64 with TAGET_VIEW=HIDDEN and TARGET_VIEW=READONLY I meant IARVSERV, not IARV64 Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dataspace versus common area above the bar
Almost 10 years ago, I converted an application using 7 data spaces into one using a single shared memory object. As Gord has pointed out, the CPU advantage was negligible though I feel it is very difficult to benchmark the effect. The real advantage was a reduction in error rates because of the management of the 7 alets and the inadvertent use of an alet where it was not needed. As far as isolation is concerned, memory objects are just as isolated and much more manageable than data spaces. Basing isolation of a common data space on the alet value is no more isolation than basing the isolation of a common memory object on the amode. Memory objects are much more flexible than data spaces. Data spaces are limited to 2GB. Memory objects are only limited by the auxiliary storage. Memory objects can be guarded and can also be page protected. Data spaces cannot. Code can execute in memory object but not in data spaces. I started using memory objects 10 years ago and have nearly forgotten how to use a data space. So the question is not a question of CPU performance but a question of do you have an application that is architected or can be architected to take advantage of the advanced features offered by memory objects? In my current application, I use local, shared and common memory objects. I place most (about 70%) of the code in one these common memory objects and page protect them. I can't think of any instance where I would chose a data space over a memory object. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Gord Tomlin Sent: Monday, January 20, 2014 10:23 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dataspace versus common area above the bar On 2014-01-20 04:38, John Blythe Reid wrote: > I just wanted to sound people out about converting a dataspace to a > common area above the bar. The main interest is the effect it would > have on CPU usage. > > To put it into context, the dataspace is used for a set of tables > which are used by the application programs. There are around eight > thousand tables currently occupying about a gigabyte of virtual > storage. This is a large installation with excess of 700 million > transactions per month plus a heavy batch load. The application programs make extensive use of these tables. > > Whenever an application program needs an element of one of the tables > it calls a standard assembler module which uses access register mode > to search the table in the dataspace and then returns the requested > element to the application program. > > If the set of tables were placed above the bar then access register > mode would not be needed as the tables would be directly addressable > in 64 bit addressing mode. > > It all seems much simpler so, at first sight, it would be expected to > use less CPU. A reduction in CPU would be the main justification for > doing the conversion. > > I would be very interested on anyone's opinion on this subject. > > Regards, > John. I did some tests of a very similar scenario, expecting to see a significant performance gain. The actual results showed a reduction in CPU usage of about 1-2%. We decided that the gain was small enough that we were better off continuing to enjoy the data isolation provided by the data space. -- Regards, Gord Tomlin Action Software International (a division of Mazda Computer Corporation) Tel: (905) 470-7113, Fax: (905) 470-6507 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: APF authorization and JOBLIB DD card
> The short answer is that any module loaded by an authorized program > must come from an authorized library. I've been reading this post with interest since I've had to do a lot to deal with authorized services loading programs from unauthorized libraries. I have a utility that copies the joblib/steplib information and the load module information including its APF authorization from one address space and transmits the information via SRB to another which can load a copy of an unauthorized program (via IRB) from an unauthorized library into another address space for special testing. It uses the LOAD ADRNAPF which now also has an ADRNAPF64 parameter. Of course, this requires that the utility dynalloc the joblib/steplib in the IRB, open it, load, close it and unalloc it. It's a lot of code just to make a copy of a common program in another address space. The point being that an authorized program can load from an unauthorized library provided it has the code to manage it. It doesn't need to modify the APF setting for a library. Of course, the unauthorized program is still setup to be called unauthorized. This is done for special debugging functions used to isolate a common piece of code from other callers in other address spaces. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Gerhard Postpischil Sent: Thursday, December 19, 2013 12:57 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: APF authorization and JOBLIB DD card On 12/18/2013 7:58 PM, Blaicher, Christopher Y. wrote: > The short answer is that any module loaded by an authorized program > must come from an authorized library. Loaded modules don't have to be > authorized (AC=1), they just have to come from an authorized library. > Now it gets more complicated. I solved this problem a long time ago. First on OS/360 by having a special step account code, and on later (test) systems by having a utility program that authorizes the tasklib, then loads the needed program(s). RACF can keep it out of unwanted hands. It saves time and effort testing programs that need authorization, and it also has a ZAP function for testing. It's heavily modified code from Don Higgins that I found on the cbt tape, but I don't remember what he called it; his version only has the ZAP capability. The added code is: SPACE 1 APFSET ICM R7,15,TCBJLB TEST STEPLIB PRESENCE BZAPFQUIT NO STEPLIB USING IHADCB,R7 DECLARE IT L R7,DCBDEBAD LOAD DEB FOR STEPLIB N R7,=X'00FF' FIX HIGH BYTE USING DEBBASIC,R7 OIDEBFLGS1,DEBAPFIN TURN ON APF LIBRARY BIT Gerhard Postpischil Bradford, Vermont -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Intercept USS calls
Modifying the CVT to perform intercepts is definitely very easy but also extremely risky. Modifying the CVT affects the entire system. All it takes is the mishandling of a single caller, particularly one critical to an address space and all hell breaks loose. I tried it once. I modified the PC number in the SVT for a key system PC. A simple programming error caused system wide havoc. I'll never do anything that has global system affects again. Any intercept must be designed to provide isolation, at least for testing. On the other hand, PCs are managed at the address space level by Z/Architecture. So provided you have the capabilities to create the necessary PC data structures required by the hardware in real, fixed storage, you can intercept PC calls. It takes a lot of code and definitely not recommended for faint of heart. Once a PC intercept is created, its simple to pass the call to the original PC routine by simply branch entering the original code with the state set by the PC call. You already have the stacked entry . If you require both a front and back end intercept, this can easily be accomplished by creating "bypass" PC definitions that mimic the original Pc definition. But from experience, unless you're willing to write and debug a lot of code, I'd get what I need from SMF. Kenneth From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Tony Harminc Sent: Tuesday, December 17, 2013 2:25 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Intercept USS calls On 17 December 2013 14:38, Don Poitras wrote: > I don't see why someone couldn't install their own table in place of > the pointed to by the CVT. See > > http://pic.dhe.ibm.com/infocenter/zos/v2r1/index.jsp?topic=%2Fcom.ibm. > zos.v2r1.bpxb100%2Fbpx2cr_Example.htm Sure - I agree that that's not hard. But, as with SVC screening, you have to eventually pass control on to the real routine (or conceivably fail the call or implement a different version yourself). If all you want to do is log the calls, well it's probably not too hard, though you might have to be aware of the caller's environment. If you want to do all this without introducing security or integrity exposures, you may have to analyze each call you want to capture. It may also be the case that some software "just knows" the PC numbers for certain routines, and doesn't go through the CSR table at all. Not a good practice, but I'd be surprised if it doesn't exist. And who knows what recovery and repair there may be in the UNIX kernel, or if those tables are dynamically updated as a matter of routine. This would be fun to experiment with on your own private LPAR or zPDT, and I'm not saying it can't or even shouldn't be done, but is anyone really going to install such a change into their production systems? That's why I said it falls into the "not for the faint of heart" category. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I'm with you on patents. I came across an IBM patent yesterday that was dated 2009 describing a lock-free storage manager using cell pools to manage variable length storage. I invented my first one 30 years ago using CAS. It was writing a more sophisticated version 10 years ago that led to my research on PLO. I'm now on my fourth version of this storage manager, this one I wrote for me, and it's much more sophisticated than the patented algorithm with numerous more capabilities. Like music, every piece of software is based on what we've seen before. So my real question with software patents is how do you prove it's original? It's the chicken and the egg. Which came first? Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Thursday, November 14, 2013 6:40 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 14/11/2013 12:23 AM, Kenneth Wilkerson wrote: > If I read the article you sent correctly, this algorithm is using a > spin lock. It has provision for implementing a lock-free algorithm but > none of those are detailed. Most of the shared_ptr implementations in the wild, including z/OS, use a lock-free algorithm invented by Alexander Terekhov (IBM). An atomic reference counted smart pointer is a nice tool to have in your bag. > PLO has no restrictions other than the limits of my imagination. Or patents! I notice IBM have quite a few wrt PLO. This one for example http://www.google.com.mx/patents/US6636883. The US patent office seems to like patents related to multi-processing. Maybe they think it's novel. Software patents have a funky smell! > > However, in the last sentence of the performance section, the authors > state "Note that we make no claim that the atomic access approach > unconditionally outperforms the others, merely that it may be best for > certain scenarios that can be encountered in practice.". I think this > is the key to lock-free implementations. I have generalized primitive > functions to perform most of my PLO operations. But I design and tune > each to the specific use. I understand the desire to provide > generalized functionality for high level language use. However, I do > not accept the premise that all lists are the same. And I would > certainly use different algorithms for lists that I expected to get > "large" and lists that have more than a small percentage of update operations. While it may not be true on z/OS, most software developers these days use high level languages for multi-threaded applications and prefer abstractions because they are easy to use and reason about. Of course, that doesn't mean that they shouldn't understand what's happening under the covers. The trick is keeping the optimizer out of the mix which is where inline assembler comes in handy. There are many high-quality (free) libraries for lock-free data structures that are relatively easy to port to z/OS. Using advanced language features it's quite simple to configure a highly granular concurrent queue by using policies. The difficult part is testing the bloody things! typedef mpmc_queue< fixed_array, lock_free, smart_ptr > multi_queue; typedef spsc_queue< priority_list, mutex, unique_ptr > blocking_queue; typedef thread_pool< multi_queue, max_threads<8> > task_pool; socket_server server; > Kenneth > -- > -- > > > > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] > On Behalf Of David Crayford > Sent: Tuesday, November 12, 2013 11:45 PM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Serialization without Enque > > On 13/11/2013 12:34 PM, Kenneth Wilkerson wrote: >> Actually, the algorithm performs well for read-often, write-rarely >> list because the active chain count does not change and therefore >> there are relatively infrequent re-drives. The active chain count >> only changes on an add or delete. So if there are infrequent adds and >> deletes, there will be infrequent re-drives. And you are wrong, >> readers will not contend unless two or more tasks are referencing the >> exact same element simultaneously. And even then, the re-drive only > involves the update to the use count. > > Thanks, I get it now. Maybe IBM should have used PLO for the z/OS C++ > shared_ptr SMR algorithm which has both weak/strong reference counts + > the pointer. They use a lock-free algorithm using CS > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2674.htm. > shared_ptr implements proxy based garbage collection. > >> There are a lot of optimization that I do not describ
Re: Serialization without Enque
If I read the article you sent correctly, this algorithm is using a spin lock. It has provision for implementing a lock-free algorithm but none of those are detailed. There are certainly cases where spin-locks work very effectively particularly if the thread finds a lock is unavailable and it voluntarily relinquishes control. I try to avoid spin-locks because if they are performed in a high priority application they can cause detrimental system effects. The applications I write are system level and can be executed from anywhere meaning that applications that use it may be in a locked stated, in an SRB, a high priority, etc thus meaning that it behooves me to only use hardware provided methods for serialization. This is the primary reason I use PLO. PLO has no restrictions other than the limits of my imagination. However, in the last sentence of the performance section, the authors state "Note that we make no claim that the atomic access approach unconditionally outperforms the others, merely that it may be best for certain scenarios that can be encountered in practice.". I think this is the key to lock-free implementations. I have generalized primitive functions to perform most of my PLO operations. But I design and tune each to the specific use. I understand the desire to provide generalized functionality for high level language use. However, I do not accept the premise that all lists are the same. And I would certainly use different algorithms for lists that I expected to get "large" and lists that have more than a small percentage of update operations. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 11:45 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 13/11/2013 12:34 PM, Kenneth Wilkerson wrote: > Actually, the algorithm performs well for read-often, write-rarely > list because the active chain count does not change and therefore > there are relatively infrequent re-drives. The active chain count > only changes on an add or delete. So if there are infrequent adds and > deletes, there will be infrequent re-drives. And you are wrong, > readers will not contend unless two or more tasks are referencing the > exact same element simultaneously. And even then, the re-drive only involves the update to the use count. Thanks, I get it now. Maybe IBM should have used PLO for the z/OS C++ shared_ptr SMR algorithm which has both weak/strong reference counts + the pointer. They use a lock-free algorithm using CS http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2674.htm. shared_ptr implements proxy based garbage collection. > There are a lot of optimization that I do not describe in this > algorithm for simplicity. For example, when you do a PLO DCAS, the > condition code is set to indicate which compare failed. This can be used to optimize the re-drive. > There are many others optimizations with re-drives to make this more > efficient. And if you get away from traditional lists, there are even > more optimizations. Like what? What do you replace linked lists with, arrays with offsets instead of pointers? > Honestly, I provided this algorithm after reading the paper on hazard > pointers. The paper was written in 2002 and claimed there was no > atomic DCAS when PLO DCAS became available in 2001. So I took a much > simpler algorithm that I had and modified it to use a use count to > accommodate traditional storage managers to prove that a PLO could be > used to manage a conventional list using a traditional storage manager > and provide a SMR algorithm without the need for DU level management > structures. I don't use many conventional lists and I have a proprietary storage manager that does not use chains. > Most of my PLO operations are much simpler. > I would love to test this algorithm against any other SMR algorithm. > My point has been and remains, that PLO can be efficiently used to > serialize any list in a lock-free manner and even if it does take more > CPU this will be offset by increased throughput. > > And just because UNIX has issues with PLO doesn't mean the issue is > with PLO... UNIX doesn't have an issue with PLO. It clearly states that popping/pushing elements at the beginning of the queue is a good performer. Surely your algorithm would have the same problem if multiple producers/consumers were inserting/removing elements from the middle of a long list. > Kenneth > > -Original Message- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] > On Behalf Of David Crayford > Sent: Tuesday, November 12, 2013 8:39 PM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re:
Re: Serialization without Enque
The active count is incremented for every add and delete. It is never decremented so any update would result in a change. In my actual algorithms, all my lists are in shared or common memory objects so all the pointers are 64 bit and I use +2 variations on the PLO. In this case, I use a counter with 2 full words on a double word boundary. The first full word is the change count and it's always incremented. The second full word is element count and it's is incremented for each add and decremented for each deleted. I load the counter with a LG and then use ALG and AL or SL to manipulate the high or low word. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Shmuel Metz (Seymour J.) Sent: Wednesday, November 13, 2013 7:29 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque In <001801cee029$af0d2bc0$0d278340$@austin.rr.com>, on 11/12/2013 at 10:34 PM, Kenneth Wilkerson said: >Actually, the algorithm performs well for read-often, write-rarely list >because the active chain count does not change and therefore there are >relatively infrequent re-drives. What happens if the is an intervening add and also an intervening remove, leaving no net change in the active chain count even though the chain itself has changed? -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see <http://patriot.net/~shmuel/resume/brief.html> We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. The active chain count only changes on an add or delete. So if there are infrequent adds and deletes, there will be infrequent re-drives. And you are wrong, readers will not contend unless two or more tasks are referencing the exact same element simultaneously. And even then, the re-drive only involves the update to the use count. There are a lot of optimization that I do not describe in this algorithm for simplicity. For example, when you do a PLO DCAS, the condition code is set to indicate which compare failed. This can be used to optimize the re-drive. There are many others optimizations with re-drives to make this more efficient. And if you get away from traditional lists, there are even more optimizations. Honestly, I provided this algorithm after reading the paper on hazard pointers. The paper was written in 2002 and claimed there was no atomic DCAS when PLO DCAS became available in 2001. So I took a much simpler algorithm that I had and modified it to use a use count to accommodate traditional storage managers to prove that a PLO could be used to manage a conventional list using a traditional storage manager and provide a SMR algorithm without the need for DU level management structures. I don't use many conventional lists and I have a proprietary storage manager that does not use chains. Most of my PLO operations are much simpler. I would love to test this algorithm against any other SMR algorithm. My point has been and remains, that PLO can be efficiently used to serialize any list in a lock-free manner and even if it does take more CPU this will be offset by increased throughput. And just because UNIX has issues with PLO doesn't mean the issue is with PLO... Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 8:39 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for sharing your design Ken. It seems to me that PLO is best used for data structures like double-ended queues where elements can be inserted/removed from both ends of the queue atomically. In the case of a read-often-write-rarely list with multiple readers that traverse the list it doesn't seem optimal. Correct me if I'm wrong but readers will contend with each other. FYI, z/OS UNIX message queues can be configured to use PLO (with fallback to a latch). The documentation states that inserting/removing from the middle of a long list using PLO is a poor performer http://pic.dhe.ibm.com/infocenter/zos/v1r12/topic/com.ibm.zos.r12.bpxb100/qc t.htm#qct. There isn't a one-size-fits-all solution. It depends on the usage patterns. Hopefully HTM will solve that. Depending on how Intels Haswell TSX implementation performs in the wild we could see HTM in our cell phones as early as next year. On 12/11/2013 11:18 PM, Kenneth Wilkerson wrote: > I use cell pools. I also use a proprietary storage manager that > doesn't use chains. These methodology offer me capabilities well > beyond those found in traditional methods. Much of what I do is based > on these capabilities, but the algorithms could easily be adapted to > use a conventional storage manager that uses chains. > > Here is an algorithm I use that I've adopted for traditional storage > management. This algorithm will work for any list; LIFO, FIFO, > ordered or other , and for deletion from the head, middle or tail. > > Setup: Allocate a word in each element. Bit 0 is one for all active > elements. Bit 1 is one for all elements pending deletion. Bit 2 is > `reserved for an extension to this algorithm (such as garbage > cleanup). The remaining bits are a use count allowing for many more DUs than are supported by MVS. > > Note.: When PLO Compare and Swap (CS) or Double Compare and Swap > (DCAS) is referenced, the PLO uses the use count address as the lock > word. This will serialize all updates to the use counter for that > element. For the PLO Compare and Loads or PLO update, the lock word is the active chain counter. > > To search: > Step 1: Use the PLO Compare and Load on the active chain counter to > search the chain as before. If the PLO fails, re-drive the search. > > Step 2: Before examining the element, increment the use count with a > PLO Double Compare and Swap (DCAS). Load the first register pair with > the current chain counter. The swap value will also be the current > chain counter. Essentially, we're using the active chain count to > serialize increments to the use count to avoid accessing an area that > may have been released. The second register pair will contain the > current use count with
Re: Serialization without Enque
I use cell pools. I also use a proprietary storage manager that doesn't use chains. These methodology offer me capabilities well beyond those found in traditional methods. Much of what I do is based on these capabilities, but the algorithms could easily be adapted to use a conventional storage manager that uses chains. Here is an algorithm I use that I've adopted for traditional storage management. This algorithm will work for any list; LIFO, FIFO, ordered or other , and for deletion from the head, middle or tail. Setup: Allocate a word in each element. Bit 0 is one for all active elements. Bit 1 is one for all elements pending deletion. Bit 2 is `reserved for an extension to this algorithm (such as garbage cleanup). The remaining bits are a use count allowing for many more DUs than are supported by MVS. Note.: When PLO Compare and Swap (CS) or Double Compare and Swap (DCAS) is referenced, the PLO uses the use count address as the lock word. This will serialize all updates to the use counter for that element. For the PLO Compare and Loads or PLO update, the lock word is the active chain counter. To search: Step 1: Use the PLO Compare and Load on the active chain counter to search the chain as before. If the PLO fails, re-drive the search. Step 2: Before examining the element, increment the use count with a PLO Double Compare and Swap (DCAS). Load the first register pair with the current chain counter. The swap value will also be the current chain counter. Essentially, were using the active chain count to serialize increments to the use count to avoid accessing an area that may have been released. The second register pair will contain the current use count with a swap value incremented by 1 using an AL to avoid resetting the high bit. If the PLO DCAS fails, the previous PLO Compare and Load (Step 1) should be re-driven. Step 3: Use the PLO Compare and Load for the next element. Save the PLO status and decrement the use count with a SL using PLO CS. We don't need a DCAS because the use count is not 0 and this element can't be deleted. If this PLO CS fails, re-drive it. If PLO Compare and Load status is re-drive, then before re-driving the search, check the use count (in the register used to update it). If Bit 1 (pending delete) is set and the use count is 0, this task can release it. To delete: (this assumes the deleting task has already updated the use count in the process of finding the element to delete) Step1: Use a PLO update function to remove the element from the active chain to avoid any future references. Step2: If the PLO update to remove the element fails, decrement the use count but do NOT set bit 1 (pending delete) using a PLO CS. If the PLO CS fails, re-drive it . Step 3: If the PLO update to remove the element succeeds, decrement the use count, SET bit 1 (pending delete) and RESET Bit 0 (active bit) using a PL CS. If the PLO CS fails, re-drive it . Step 4: Whether the PLO update succeeded or failed, check the use count in the register used to update it: If bit 1 (pending delete) is set and the use count is not 0, then this task should exit . If bit 1 (pending delete) is set and the use count is 0, then this task can release it. Otherwise, this task must re-drive the search to find the element to be deleted You can work out the various scenarios yourself. But because the count is incremented/decremented after a PLO Compare and Load or update, the status of the PLO provides a decision point on whether an element may have been deleted. Using the use count address as the lock word insures that all use count updates for a specific use count occur serially. There are numerous extensions to this algorithm that are more than I want to describe. Things like adding pending deletes to a delete chain or having an asynchronous, periodic garbage collection task handle the deletes. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 11, 2013 9:38 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Could you provide an insight to a design that would handle the situation where an SSI program with non-trivial workload uses a very large list. This list is normally semi-static but there can be periods of time where entries are heavily deleted and added. Not freeing storage is an option because it could be a significant amount of storage. Thanks, Jon Perryman. > > From: Tony Harminc >To: IBM-MAIN@LISTSERV.UA.EDU >Sent: Monday, November 11, 2013 7:07 PM >Subject: Re: Serialization without Enque > > >On 11 November 2013 20:15, Jon Perryman wrote: >> L R2,QUEUE >> L R3,NEXT_ENTRY >> CS R2,R3,QUEUE New queue head While this >>seems bullet proof, it's not. If there is a long delay between between
Re: Serialization without Enque
>In PLO, the hardware locking occurs according to the lock word. The POM is not specific about the lock word other than a transformation occurs to generate a PLT logical address used to acquire a lock. However, this does not affect its application. The key point is that 2 or more processors executing PLO simultaneously can access or alter the values using a PLO instruction and the operation will occur as if one PLO followed the other. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Monday, November 11, 2013 7:09 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque >suggests that Hardware Transaction Memory may not be the panacea we all >expect it to be, and in some cases may actually increase CPU Of course it's true that if a transaction experiences too much contention and resorts to its fallback path, you have used more CPU than if you went directly to the fallback path. That is specifically why every use of non-constrained transactions ought to do analysis to determine if it is even theoretically beneficial. What I don't see mentioned in the article is zEC12's constrained transactions. By their very definition they need no fallback path. That is a huge benefit both in terms of complexity and development/test cost. >In PLO, the hardware locking occurs according to the lock word. You seem to be assuming that PLO implementation actually is truly locked according to the individual lock word. Maybe it is now. It definitely did not used to be. The machine would decide how to map the individually-specified lock word to (limited) hardware resources that were the true serialization mechanism. It was not necessarily one to one. >Transaction Memory sounds exciting but it's complex. IBM should put a >layer of abstraction on top with simple semantics. Maybe it's me, but I don't really find TBEGIN...TEND complex compared to other serializing techniques even when you factor in PPI while counting the number of attempts before taking the fallback path. The instructions within a transaction are typically less complex than the instructions you would need without a transaction, if you could even accomplish what you're trying to do outside of a transaction. For example, there is no need for CS, PLO. Just more straightforward "compare", "store", etc. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I read the article. It's still based on a CAS which I don't necessarily consider simpler than PLO . In this article, it states " Each hazard pointer can be written only by its associated thread, but can be read by all threads." This is exactly the provision I state in my example. It's a key element of any serialization algorithm that allows concurrent lock-free operations. Hazard pointers seem like an elegant general solution. I'll have to explore it but I've not had issues with any of the solutions I've written using other methods. I write a solution appropriate to the usage of the serialized resource. The key is a rigid protocol that has to always be followed. In my e-mail last night I meant transactional execution facility not transactional event facility. I've thoroughly read the POM on this facility but have not to date had access to a processor to experiment with it. From my reading, the transactional execution facility (TM) , appears to be a way to implement almost all requirements for concurrent read/write access to a serialized resource. Essentially, from my understanding of the POM, it's a CAS on as many operands as necessary within a hardware defined limit. In my very complex application, there are a couple scenarios where I cannot use PLO to serialize access. I have no problem using multi-step PLO operations for serialization as long as the integrity of the resource is guaranteed after each step. For example, a delete that consists of a PLO to remove from the active chain and a separate PLO to add the removed element to a free chain. However, some operations are too complex for a PLO CAS and triple store even if the operands are organized in storage such that you can modify 128 bits at a time. In these cases, I use a gate word and a spin lock. When available, the gate is 0. When in use the gate contains identification information for the gate owner. I very rarely have to use these. And if I had a TM like transactional execution facility, I would replace this spin lock with this facility. Normally, there are only a handful of instructions within the gate so this has never caused me any problems. In a senses, all methods, LL/CS, TM, etc. are spin locks. If they don't succeed, try, try again. All methods that I know of, the hardware must perform a memory serialization function. I use PLO instead of CS not only because of the increased functionality such as modifying noncontiguous areas and being able to modify up to 4 128 bit areas but also because I believe the PLO lock word is an advantage. In all hardware serialization methods that I know of, a memory serialization function is required during the LL/CS or TM. These serializing functions can be expensive. CS is not granular and the serialization proceeds without regard to accesses by other CPUs meaning the overhead occurs whether the function succeeds or fails. The PLO lock word "gates" access by all processors using the same lock word thus reducing the total number of serializations performed by "stopping" a processor performing a PLO using the same lock word. This requires careful selection of lock words and use of the same lock word by all processes that read/write to the same resource. This advantage can be negated in a queue that has a substantial percentage of write operations compared to read operations because write operations will necessarily result in a PLO failure. I believe this is the disadvantage referred to in your first paper on TM by Paul McKenney. I suspect that since IBM is using TM to replace a lock and that in most cases the lock was used to serialize storage alterations. In this case, CPU would increase but so would throughput. I believe this is a classic example of trading the "less expensive" CPU resource for the "more expensive" throughput. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Sunday, November 10, 2013 8:56 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 11/11/2013 10:36 AM, Kenneth Wilkerson wrote: > I read the article. This article is about transactional event facility > introduced in z/EC-12 and not PLO which is an LL/CS. I wish I had access to a > z/EC-12 with the transactional event facility to play with it and compare it > to PLO. The transactional event facility is much more comprehensive and not > as granular as a PLO. In PLO, the hardware locking occurs according to the > lock word. Transaction Memory sounds exciting but it's complex. IBM should put a layer of abstraction on top with simple semantics. > I've done a lot of testing with PLO. It can increase CPU, particularly in a > situation where updates are much higher percentage of the operations. But in > all applications that I'
Re: Serialization without Enque
I read the article. This article is about transactional event facility introduced in z/EC-12 and not PLO which is an LL/CS. I wish I had access to a z/EC-12 with the transactional event facility to play with it and compare it to PLO. The transactional event facility is much more comprehensive and not as granular as a PLO. In PLO, the hardware locking occurs according to the lock word. I've done a lot of testing with PLO. It can increase CPU, particularly in a situation where updates are much higher percentage of the operations. But in all applications that I've tested, it's CPU overhead is offset by higher throughput. In a traditional locking method, tasks end up serializing to the lock. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Sunday, November 10, 2013 6:50 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 11/11/2013 5:19 AM, Mark Zelden wrote: > On Sat, 9 Nov 2013 19:47:35 GMT, esst...@juno.com wrote: > >> I have been reading and following this thread sine PLO is not an instruction >> I use every day. >> It would be nice if someone would actually post some working code using a >> PLO instruction, to illustrate how one would add an element to a queue and >> remove an element from a queue. >> >> Paul D'Angelo > I've not been paying that close of attention, but I'm more curious > about what people did for these situations prior to PLO. They used smart algorithms using the atomic instructions they had, like RCU http://en.wikipedia.org/wiki/Read-copy-update. It's interesting that I have never seen any use of the PLO instruction in the zLinux kernel code. Paul McKenney, IBMs expert on these things, wrote a good article that suggests that Hardware Transaction Memory may not be the panacea we all expect it to be, and in some cases may actually increase CPU http://paulmck.livejournal.com/31285.html. > Mark > -- > Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS > mailto:m...@mzelden.com ITIL v3 Foundation Certified Mark's MVS > Utilities: http://www.mzelden.com/mvsutil.html > Systems Programming expert at > http://search390.techtarget.com/ateExperts/ > -- > For IBM-MAIN subscribe / signoff / archive access instructions, send > email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
For anyone interested. THIS IS AN EXAMPLE OF A THE COMPLETE PROCESS USED TO ADD A NEW DISPATCHABLE UNIT (DU) ENTRY TO A QUEUE OF ACTIVE DUS. FOR SIMPLICITY, A DU IS EQUIVALENT TO A TASK. THIS QUEUE CAN BE SIMULTANEOUSLY SEARCHED, NEW DUS ADDED AND TERMINATING DUS DELETED. ADDS AND DELETES ARE DONE VERY INFREQUENTLY (COMPARED TO SEARCHES) AND ONLY BY THE DU ITSELF. DU ENTRIES FOR OTHER DUS ARE NEVER ADDED OR DELETED FROM THE QUEUE BY ANOTHER DU. THE ADD AND DELETE RULE IS VERY IMPORTANT BECAUSE ONLY THE DU CAN OWN ITS ENTRY THUS INSURING THAT THE OWNER (CODE THAT REFERENCES THE ENTRY) IS THE ONLY CODE THAT CAN ADD OR DELETE THAT ENTRY. REGARDLESS OF WHAT TYPE OF QUEUE YOU ARE SERIALIZING WITH PLO, COMMON SENSE MUST APPLY. IF DELETES WERE ALLOWED WITH AN ENTRY THAT COULD STILL BE REFERENCED, THERE WILL BE UNDESIRABLE RESULTS. IN THOSE CASES. SUCH AS WORK QUEUE, THE ENTRY SHOULD BE DELETED FROM THE ACTIVE CHAIN OR A MECHANISM BE PROVIDED TO DEFINE OWNERSHIP THUS PREVENTING DELETES FROM "OWNED" ENTRIES. THE QUEUE POINTERS ARE KEPT IN COMMON STORAGE BUT NOT IN CSA. THIS CODE USES EITHER SHARED OR COMMON MEMORY OBJECTS. WHEN THIS CODE EXECUTES, IT IS GUARANTEED THAT THE MEMORY OBJECT IS AVAILABLE. THE FOLLOWING STRUCTURE IS ALWAYS USED FOR ALL QUEUES LIKE THIS ONE: QUEUE ... QUEUE__STARTDSD START OF CELLS USED FOR QUEUE *WHEN THE QUEUE IS INITIALIZED, THE FREE CHAIN IS 0 TO AVOID *ADDING FREE ELEMNTS TO THE WORKING SET OF THIS CODE. INSTAED, THE *HWM=START AND AVAILQ IS 0. CODE SHOW HOW THIS IS MANAGED * QUEUE_HWM DSD CURRENT HIGH WATER MARK OF CELLS QUEUE_END DSD END OF CELLS USED FOR THIS QUEUE * QUEUE_HEAD DSD CURRENT ACTIVRE HEAD QUEUE_TAIL DSD CURRENT ACTIVE TAIL QUEUE_COUNTERS DS0D LOCK WORD AND COUNTERS QUEUE_CHANGES DSA HIGH WORD IS CHANGES QUEUE_ENTRIES DSA LOW WORD IS ENTRIES IN CHAIN * QUEUE_AVAILQ DSD FREE CHAIN IS SINGLY LINKED QUEUE_AVCOUNTERS DS0D LOCK WORD AND COUNTERS FOR FREE QUEUE_AVCHANGESDSA HIGH WORD IS CHANGES QUEUE_AVENTRIESDSA LOW WORD IS ENTRIES IN CHAIN ... THE COUNTER IS A DOUBLEWORD CONSISTING OF 2 FULLWORD POINTERS. THE COUNTER IS ALWAYS USED AS THE PLO LOCK WORD AND IS ALWAYS LOADED WITH A LG (LOAD GRANDE). THE CHANGE COUNT IS ALWAYS THE HIGH WORD AND INCREMENT WITH INCCHANGES CONSTANT IN THE PROGRAM CONSTANTS. THE ENTRY COUNT IS ALWAYS THE LOW WORD AND INCREMENTED WITH AHI WHICH ONLY AFFECTS THE LOW WORD. I DON'T USE ANY OF THE SPECIAL HIGH WORD INST OR IMMEDIATE INSTR BECAUSE THEY ARE A FACILITY SO I KEEP THE CODE AT A FAIRLY OLD FACILITY LEVEL * CONSTANT IN PROGRAM DS 0D INCCHANGES DC F'1',F'0' THIS IS A SAMPLE ELEMENT LAYOUT. ESSENTIALLY, I ALWAYS KEEP THE PREV AND NEXT POINTERS AS THE FIRST DOUBLE WORDS IN AN ELEMENT. THE REMAINING AREAS OF THE ELEMENT HAVE NO BEARING ON THIS PROCESS. ELEMENT ... ELEMENT_PTRSDS0XL16 ELEMENT_PREVDSD ELEMENT_NEXTDSD ELEMENT_DATADS... ... ELEMENT_SIZEEQU *-ELEMENT_PREV ALL THE CODE IS REENTRANT AND THE FOLOWING WORK AREAS ARE REFERENCED: * IN WORKING STORAGE PLOWORK DSXL144 NEED FOR COMPARE AND SWAP AND STORES SEARCH_PTR DSD INITIALIZED TO 0 FOR START OF SEARCH * SET TO 0 FOR ADD OR ENTRY TO BE DELETED SEARCH_COUNTER DSD SET BY CALLER TO CURRENT COUNTER VALUE SO HERE IS THE SAMPLE EXTRACTED FROM A WORKING ALGORITHM. I CAN'T PUT THE ORIGINAL CODE IN THIS EXAMPLE FOR MANY REASONS. BUT I CAREFULLY EXTRACTED FROM A WORKING ALGORIGHM AND MADE IT INTO THIS EXAMPLE. THIS EXAMPLE SHOWS COMPARE AND LOAD, COMPARE DOUBLE AND SWAP AND SWAP AND DOUBLE STORE. IF YOU'RE NOT FAMILIAR WITH PLO TYPE SERIALIZATIONS, IT MAY TAKE A WHILE TO DIGEST ALL OF THIS. I DON'T USE THE PLO MNEMONICS. I'VE BEEN CODING IT A VERY LONG TIME AND I JUST CODE IT. * START OF A SERVICE CALL FROM A DU, SEE IF ITS ALREADY BEEN ADDED * KEY IS ASCB ADDRESS AND DU (TCB). THE ADD ONLY RUNS IN THE DU BEING * ADDED * PRIME THE SEARCH ARGUMENTS. SEARCH_VALUE HAS THE QUEUE_COUNTERS AT * START OF THE SEARCH AND THEY ARE NOT CHANGED UNLESS THE PROCESS HAS * TO BE REDRIVEN SEARCH_REDRIVE DS 0H XCSEARCH_PTR,SEARCH_PTR PRIME TO START AT HEAD MVC SEARCH_COUNTER,QUEUE_COUNTERS GET STARTING COUNTER * * SEARCH QUEUE FOR REQUIRED ELEMENT SEARCH_LOOP DS 0H LAR1,QUEUE_COUNTERS LOCK WORD LGR14,SEARCH_COUNTER LOAD CURRENT COUNTER ***
Re: Serialization without Enque
Yes, I was talking about all references using PLO. I was also assuming this was a "work" queue where "deletion" from the chain was the methodology for claiming ownership of the element. However, any serialization method that performs deletion must also have a method for claiming ownership before an element can be deleted. It doesn't matter whether deletion is an actual release of storage or the placement of the element into a free chain. If any process other than the owning process maintains a reference to an element without claiming it, the problem exists whether you use locks, PLO, CS, whatever. If the reference is nothing more than searching the chain, then PLO Compare and Load can solve that. If the reference is more than that, the problem is not storage overlays or 0c4s. The problem is a missing method for ownership. If this is true for this application, the chain is only serializable with a lock and the lock must be held throughout the period where the element is referenced before the element can be safely deleted. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Friday, November 08, 2013 1:11 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque The storage overlay does not pertain to the PLO. It pertains to the entire element not being immediately removed from any type of use. Just because you removed the element from the chain does not mean it's not in use somewhere. You can't even say how long the element may be in use (e.g. task does not get any CPU because of CPU load or swapped out address space in multi-address space serialization). Jon Perryman. >________ > From: Kenneth Wilkerson > > > >A storage overlay cannot occur in a properly implemented PLO with a >counter as long as the counter is properly maintained with every >process incrementing it by 1. Even in in a free chain implementation, >an improper PLO sequence can result in a circular or broken chain. > >-Original Message- >Behalf Of Jon Perryman > >This specific paragraph from Peter is about "FREE QUEUE PROTOCOL". > -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
A storage overlay cannot occur in a properly implemented PLO with a counter as long as the counter is properly maintained with every process incrementing it by 1. Even in in a free chain implementation, an improper PLO sequence can result in a circular or broken chain. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Friday, November 08, 2013 11:44 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque This specific paragraph from Peter is about "FREE QUEUE PROTOCOL". This is where elements on your chain are no longer needed. Peter recommends not freeing the element. Instead, you should use a queue of free elements that you reuse when you need a new element. Peter's concern is not the chaining. It's with the data each element represents and making sure it remains consistent in a multi-processing environment. A program using the element is not guaranteed the element hasn't been freed and possibly re-allocated. PLO only ensures serialization of the chain. To ensure the element is valid, header validation is not a enough. You should maintain a validation field that is occasionally verified. This greatly reduces the exposure but does not completely eliminate it. Peter's concern is valid and justified. S0C4 abends are visible so they can be dealt with. The real problem is storage overlays that are not immediately apparent. Even worse is when they affect unrelated programs. Jon Perryman. > > From: Donald Likens > > >Thank You for your help (all of you) but Peter's statement below does not make sense to me (maybe because I don't understand something). > >The reason that the free queue protocol needs a sequence number is >because even if the header "matches", the values that you need to put >into your new element for the "next" and/or "previous" may no longer be >correct due to a set of additional asynchronous operations. > >I use PLO to add an element to a chain. The chain only has forward pointers. I always add to the end of the chain. I use storage (not CPOOL) to get the element. It turns out that I haven't gotten the PLO instruction to work properly yet but in theory in my scenario it seem to me that if the pointer to the last element is pointing to a element (Not 0) I should be able to store the next pointer and the last pointer in one PLO CSDST. Here is the actual (not working code... It is not updating the chain properly): > >CSAMSEGL <== Last element on the chain >CSAMSEGF <== First element on the chain >R8 Address of element to add. >MSEGNEXT <== Pointer to next element in last control block MSEGCB <== >element name > >DOX078 DS 0H *C IF CSAMSEGL EQ 0 THEN (no elements on the >chain) > XR R4,R4 > XR R5,R5 > LR R2,R8 (R8: ADDRESS OF MSEGCB) > LR R3,R8 *C SET CSAMSEGL = CSAMSEGF = MSEGCB JUST >BUILT > CDS R4,R2,CSAMSEGF IF CSAMSEGF & CSAMSEGL = 0, STM >2,3,CSAMSEGF > BC 4,ELIFX076 *C ELSE > B EIFX076 >ELIFX076 DS 0H *C IF CSAMSEGL = CSAMSEGL (R2) *C SET CSAMSEGL = >POINTER_TO_NEW_MSEG (R8) *C SET MSEGNEXT = POINTER_TO_NEW_MSEG (R8) >CSDST EQU 16 > L R2,CSAMSEGL > LA R0,CSDST > LA R1,PLT > LR R3,R8 > LA R4,CSAMSEGL CSAMSEGL IS IN CSA > ST R4,OPERAND6 > LA R4,MSEGNEXT MSEGNEXT IS IN CSA > ST R4,OPERAND4 > ST R8,OPERAND3 > ST R8,OPERAND5 > PLO R2,CSAMSEGL,0,PL >* THE FIRST-OPERAND COMPARISON VALUE AND THE SECOND OPERAND ARE >* COMPARED. IF THEY ARE EQUAL, THE FIRST-OPERAND REPLACEMENT VALUE >* IS STORED AT THE SECOND-OPERAND LOCATION, AND THE THIRD OPERAND IS >* STORED AT THE FOURTHOPERAND LOCATION. THEN, THE FIFTH OPERAND IS >* STORED AT THE SIXTH-OPERAND LOCATION. > BNZ DOX078 >EIFX076 DS 0H > >PLT DS D PLO LOCK TOKEN PL DS 0F >PARAMETER LIST > ORG PL+60 >OPERAND3 DS A NEW MSEG ADDRESS > ORG PL+76 >OPERAND4 DS A ADDRESS OF CSAMSEGL > ORG PL+92 >OPERAND5 DS A NEW MSEG ADDRESS > ORG PL+108 >OPERAND6 DS A ADDRESS OF MSEGNEXT > >-- >For IBM-MAIN subscribe / signoff / archive access instructions, send >email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > > > -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
First, I'm not sure why you have chosen PLT as your lock word. It's very important the lock word resolve to the same REAL address no matter where the PLO executes. Since you are talking about multiple operations against the same chain, unless all the processes exist in the same shared program you can't be sure the same lock word is always used. The lock word contents are not altered just used to create a lock value(PLT) for serialization. From your code, I would chose CSAMSEGF. Second, you can't mix the CDS with PLO and expect consistent results. It would be very easy to convert the CDS to a PLO Compare Double and Swap (compare each full word and swap). Just be sure to use the same lock word which I still suggest as CSAMSEGF. Third, you're going to have to use a count. And since you are acquiring and releasing this storage, you REALLY have to use a counter. Just add a full word counter in CSA. In this case, I would choose the counter as the lock word. As long as you have a full word counter and recovery to treat a 0c4 as a re-drive (see my prior comments), this should work. I don't know if you can, but the choice of another methodology where storage is not acquired for each add and released for each delete would be recommended. However, you can make this work but not without a counter. Last, Peter's comment is very valid. Read the notes in CDS in the POM. I don't know why PLO doesn't reference them since they are just as applicable to PLO. I have not closely examined your logic so I don't know what the logic problem is in the code. I'm just commenting on the methodology. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Donald Likens Sent: Friday, November 08, 2013 10:22 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thank You for your help (all of you) but Peter's statement below does not make sense to me (maybe because I don't understand something). The reason that the free queue protocol needs a sequence number is because even if the header "matches", the values that you need to put into your new element for the "next" and/or "previous" may no longer be correct due to a set of additional asynchronous operations. I use PLO to add an element to a chain. The chain only has forward pointers. I always add to the end of the chain. I use storage (not CPOOL) to get the element. It turns out that I haven't gotten the PLO instruction to work properly yet but in theory in my scenario it seem to me that if the pointer to the last element is pointing to a element (Not 0) I should be able to store the next pointer and the last pointer in one PLO CSDST. Here is the actual (not working code... It is not updating the chain properly): CSAMSEGL <== Last element on the chain CSAMSEGF <== First element on the chain R8 Address of element to add. MSEGNEXT <== Pointer to next element in last control block MSEGCB <== element name DOX078 DS0H *C IF CSAMSEGL EQ 0 THEN(no elements on the chain) XRR4,R4 XRR5,R5 LRR2,R8(R8: ADDRESS OF MSEGCB) LRR3,R8 *C SET CSAMSEGL = CSAMSEGF = MSEGCB JUST BUILT CDS R4,R2,CSAMSEGF IF CSAMSEGF & CSAMSEGL = 0, STM 2,3,CSAMSEGF BC4,ELIFX076 *C ELSE B EIFX076 ELIFX076 DS0H *C IF CSAMSEGL = CSAMSEGL (R2) *C SET CSAMSEGL = POINTER_TO_NEW_MSEG (R8) *C SET MSEGNEXT = POINTER_TO_NEW_MSEG (R8) CSDSTEQU 16 L R2,CSAMSEGL LAR0,CSDST LAR1,PLT LRR3,R8 LAR4,CSAMSEGL CSAMSEGL IS IN CSA STR4,OPERAND6 LAR4,MSEGNEXT MSEGNEXT IS IN CSA STR4,OPERAND4 STR8,OPERAND3 STR8,OPERAND5 PLO R2,CSAMSEGL,0,PL * THE FIRST-OPERAND COMPARISON VALUE AND THE SECOND OPERAND ARE * COMPARED. IF THEY ARE EQUAL, THE FIRST-OPERAND REPLACEMENT VALUE * IS STORED AT THE SECOND-OPERAND LOCATION, AND THE THIRD OPERAND IS * STORED AT THE FOURTHOPERAND LOCATION. THEN, THE FIFT
Re: Serialization without Enque
-Original Message- From: Kenneth Wilkerson [mailto:redb...@austin.rr.com] Sent: Friday, November 08, 2013 8:46 AM To: 'IBM Mainframe Discussion List' Subject: RE: Serialization without Enque >I really don't see the big deal with an 0c4 in this scenario (should happen rarely) You misunderstood my point. You could use PLO to serialize a chain even if the areas are released as they are deleted provided you always use PLO Compare and Load to load the pointers and recovery sets a retry to re-drive the process. As long as the count is updated each time you do a delete (and release), if the delete occurs while some other management function is being performed, the PLO Compare and Load will force the redrive either by CC or 0c4. If the storage had been reallocated but still in the same key, the PLO would fail because the count has changed. PLO may fetch operands before the lock and memory serialization so an 0c4 can occur on the PLO Compare and Load. Either way, the PLO does not store so there is never an overlay. I would never design an application to use PLO in this fashion. However, if I had an existing application, I find this methodology more desirable than getting a CMS lock everywhere I access the chain. I stand by my statement that you can serialize 99+% of all scenarios using PLO and that P:LO serialization is much more desirable than locks. If this were not the case, why bother creating the transactional execution facility? Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Friday, November 08, 2013 8:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque >applicable to 99%+ of all serialization scenarios you encounter To be frank, you might not have very complex serialization requirements. Also, using PLO when CS,CSG,CDS,CDSG would do is a significant waste of cycles. For the cases I have seen within our code, uses of PLO (in the cases where it is not better to use something simpler) are a tiny percentage of our serialization needs. >When the updating process wakes up S0C4! Using PLO to update a free queue, as is the case with CPOOL and its CDS-based free-queue protocol, requires that the queue elements *never* be freed (unless you like potentially blowing up or, worse, overlaying something you didn't intend to write into). Perhaps this is not well understood. >I really don't see the big deal with an 0c4 in this scenario (should happen rarely) That's a scary statement. If you get an 0C4 you could probably deal with it. The real risk is the case where you don't get an 0C4 because the storage was re-allocated and used for something else and now it does not program check but overlays something. >I think I figured out a solution: There are a lot of details missing in what was shown, but if you want my honest suspicion, it's that if this is a "free queue" type of protocol, it will not work. The reason that the free queue protocol needs a sequence number is because even if the header "matches", the values that you need to put into your new element for the "next" and/or "previous" may no longer be correct due to a set of additional asynchronous operations. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
>I really don't see the big deal with an 0c4 in this scenario (should happen rarely) You misunderstood my point. You could use PLO to serialize a chain even if the areas are released as they are deleted provided you always use PLO Compare and Load to load the pointers and recovery sets a retry to re-drive the process. As long as the count is updated each time you do a delete (and release), if the delete occurs while some other management function is being performed, the PLO Compare and Load will force the redrive either by CC or 0c4. If the storage had been reallocated but still in the same key, the PLO would fail because the count has changed. PLO may fetch operands before the lock and memory serialization so an 0c4 can occur on the PLO Compare and Load. Either way, the PLO does not store so there is never an overlay. I would never design an application to use PLO in this fashion. However, if I had an existing application, I find this methodology more desirable than getting a CMS lock everywhere I access the chain. I stand by my statement that you can serialize 99+% of all scenarios using PLO and that P:LO serialization is much more desirable than locks. If this were not the case, why bother creating the transactional execution facility? Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Friday, November 08, 2013 8:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque >applicable to 99%+ of all serialization scenarios you encounter To be frank, you might not have very complex serialization requirements. Also, using PLO when CS,CSG,CDS,CDSG would do is a significant waste of cycles. For the cases I have seen within our code, uses of PLO (in the cases where it is not better to use something simpler) are a tiny percentage of our serialization needs. >When the updating process wakes up S0C4! Using PLO to update a free queue, as is the case with CPOOL and its CDS-based free-queue protocol, requires that the queue elements *never* be freed (unless you like potentially blowing up or, worse, overlaying something you didn't intend to write into). Perhaps this is not well understood. >I really don't see the big deal with an 0c4 in this scenario (should happen rarely) That's a scary statement. If you get an 0C4 you could probably deal with it. The real risk is the case where you don't get an 0C4 because the storage was re-allocated and used for something else and now it does not program check but overlays something. >I think I figured out a solution: There are a lot of details missing in what was shown, but if you want my honest suspicion, it's that if this is a "free queue" type of protocol, it will not work. The reason that the free queue protocol needs a sequence number is because even if the header "matches", the values that you need to put into your new element for the "next" and/or "previous" may no longer be correct due to a set of additional asynchronous operations. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
If your application is not designed to use PLO for serialization, it'll definitely not work for you. I use PLO for serialization because of issues with locks that you are describing (system affects) and many others. All my code can run as SRBs but unlike what you describe I almost never acquire locks. I always use PLO except when interfacing with MVS services that require locks which I do rarely. Besides you can't mix CS with PLO meaning you would have to convert every CS affecting that chain. Sounds to me like you're stuck with the CMS lock. And if it’s a process done frequently, it will have a system impact; a significantly greater impact than a PLO serialized one. The thing I don't understand is you're statement "My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4!" If you mean that you're releasing the storage for the chain, the process doing the release could use a PLO to set the chain pointers to 0 and in that process update the swap word. The second process will then get a CC forcing re-drive and it'll discover the chain is now 0. In that case, that would probably mean that you would need to use PLO Compare and loads for every reference to those chain pointers or (my preference) you would have a retry point detect the 0c4 and realize the chain has been released and just continue . I really don't see the big deal with an 0c4 in this scenario (should happen rarely). The PLO process does not update anything until the PLO executes so no harm no foul. Again, if the application is not designed to use PLO, it won't work. And you're not slow. When I first started using PLO almost 10 years ago. I spent days writing test cases with special traces so I could see what was happening. The whole time I had the POM open to the PLO instruction reading it over and over. Now, I consistently code PLO without much thought. If you chose to spend time learning the PLO, I think you'll find it vastly superior to locks (no restrictions) and applicable to 99%+ of all serialization scenarios you encounter provided you design the application to use PLO from the start. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Donald Likens Sent: Thursday, November 07, 2013 8:33 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque It has taken me this long to mostly understand PLO... I must be slow. Now that I understand it (mostly) I am pretty sure it will not work for me. My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4! That is why I was looking at using locks. If the process updating the chain holds a lock and the process removing the chain needs that lock to update the pointers this would not happen. So back to my original question: My code must be able to run in SRB mode and with locks held. I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). I have attempted to use a combination of CS instructions to resolve this problem but it does not work. I know this will work if I use a CMS lock but I am concerned about affecting the whole system. Any advice? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Thank you for mentioning the issue with CS/CDS. I have always understood that if you use PLO anywhere to serialize access to an area, you must use it everywhere to serialize access to that area. It's nice to know that the transactional facility serializes it against CS as well. I wish I had access to a processor to play with it. I always understood that the reason PLO did a memory serialization at the end of a PLO was to insure that any processors that were referencing the swap word(s) in a compare and swap would get consistent results; either the swap value(32, 64 or 128) before the swap or after the swap. Since the swap value is always stored last and provided you always loaded the swap value first, you would either get the correct swap and store values and the subsequent PLO would succeed or you would get an outdated swap value with inconsistent store values and the subsequent PLO would return a CC to re-drive. I have written software traces for PLO and my observations have supported this understanding of the POM's description of PLO and memory serialization. Is this your understanding? I often use 128 bit operations with consecutive words and double words to perform some sophisticated operations. I've understood about double word consistency. I've always assumed that PLO 128 bit required quad-word alignment for 128 bit operations for the same reason. I always load the swap value (in any flavor compare and swap and store) first. I always use a LMG (even if its consecutive words) and I always insure that the primary counter is in the first double word. I've read the POM many times looking for any references to quad-word consistency. It doesn't really matter because of the way I design PLO compare and swaps but I was wondering if there was quad-word consistency as well? -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Wednesday, November 06, 2013 6:37 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque One of the shortcomings of PLO (unlike TBEGIN(C) ) is that PLO in general serializes only against other uses of PLO. It does not serialize against CS on the same storage, for example. However, cache considerations and doubleword consistency still come into play. A LM of 2 words of a doubleword is done with what is referred to as "doubleword consistency". That matters. If you need to load two consecutive words and you can arrange that those two words are in the same doubleword, it can be to your advantage. It's why in a doubleword serialized by CDS you do not typically (if ever) want to load both of the individual words consecutively (as you might get results that come half from one CDS and half from another CDS); you want to use LM. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
This is a complicated question that is very dependent on the design of your application. So if the design needs to use a PLO Compare and Load, by all means do so. I try to design these applications to avoid as many PLO Compare and Loads as possible. The memory serialization (once at the start and once at the end) are expensive but much less so than software locks. This means that I design the update operations so that loading the pointers during a PLO operation will; at worse, simply result in outdated information. As I explained in my first example, this is possible regardless of whether you use PLO or a lock. It's just a consequence of concurrent activity and the order in which these activities occur. In reality, I don't normally use PLO Compare and Swap and store (any flavor) for chains. They work fine for singly linked lists. Usually I use it (just wrote something today) to dynamically add an entry to a pre-allocated slot in a cell. In this case, the cell has pre-allocated slots with a high water mark pointer and a count, consecutive words in a double word. In this case, the high water mark and count are the 2nd operand and the lock word. Since the 2nd operand always updates last, any references would not even know of the new addition until the PLO completes. I fall back to my original provision. The use of PLO is heavily dependent on designing the application to use PLO. Just yesterday, I debugged a problem in a task that was not doing a PLO Compare and Load to get a count. Certainly, all references to the counts in a Compare and Swap Store should all be updated with a PLO and probably require a PLO Compare and Load. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Tuesday, November 05, 2013 10:50 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Sorry for the confusion but that's not the question that I was asking. I agree with you on guaranteeing the consistency using the count. I'm talking about TCB1 using PLO CSDST to store 2 adjacent words (4th & 6th PLO operands) and TCB2 using LM or LG for those same 2 words. There is a very small window between the 2 store's where TCB2 will pick up inconsistent inconsistent values. In other words, the first store has completed and the LM/LG occurs before the second store completes. This window is extremely small because PLO cannot be interrupted and the instruction was prepared before performing the stores. I think the window is so small that even under heavy usage, you would only see an error every couple of months but it does exist. I think TCB2 must also use the PLO compare and load to avoid this situation. Thanks for the great information, Jon Perryman. From: Kenneth Wilkerson > > >The order of stores is unpredictable except that according to the POM, >operand 2 (in this case, the count) is always stored last. > -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
The order of stores is unpredictable except that according to the POM, operand 2 (in this case, the count) is always stored last. " In those cases when a store is performed to the second- operand location and one or more of the fourth-, sixth-, and eighth-operand locations, the store to the second-operand location is always performed last, as observed by other CPUs and by channel programs." Page 7-290 right column top half of page in SA22-7832-09, 7-281 in SA22-7832-08 So it's impossible for the count to be updated before the stores. I've been using and relying on these techniques for years with exhaustive testing under high workloads with re-drive statistics to help me decide the algorithm that I use. It can't hurt to do the PLO Compare and Load. It just adds overhead that is probably more efficiently handled by a re-drive. But it's up to you. I suggest you add redrive counter to your test case and see for yourself. I would be extremely surprised if the re-drive percent were ever higher than a small fraction of 1% no matter how hard you drove the chain. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 04, 2013 3:31 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque As you say, PLO only locks CPU's using the same PLO lock word. For other CPU's not using the lockword, it is consider multiple unique instructions. So in the case of the 64 bit address, PLO CSDST, it is considered compare, store value1, store value2, store swap value. Although it's unlikely, it is possible for the LG instruction to occur after store value1 but before store value2. Or are the stores considered a single occurrance instruction to the other CPU's? Thanks, Jon Peryman. >____ > From: Kenneth Wilkerson >To: IBM-MAIN@LISTSERV.UA.EDU >Sent: Monday, November 4, 2013 1:06 PM >Subject: Re: Serialization without Enque > > >This is not correct. The choice to PLO compare and load is not required >since the count is always guaranteed to be swapped after the stores (my >last email). I only use PLO Compare and load for complex chain >manipulations. But do it if you want. The serialization performed by a >PLO forces serialization on the lock word for all processors. I try to >Avoid it for situations where a re-drive is less costly > >-Original Message- >From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] >On Behalf Of Jon Perryman >Sent: Monday, November 04, 2013 2:42 PM >To: IBM-MAIN@LISTSERV.UA.EDU >Subject: Re: Serialization without Enque > >Thanks for pointing out that it's required to do the PLO COMPARE >against the counter and FETCH of the value otherwise there is no >guarantee that value1 is consistent with the counter. > >I'm also hearing you say that programs that reference more than a >single word, must use PLO COMPARE and FETCH. In Kenneth's example where >he uses PLO to save 64 bit addresses (which is 2 words), he can't use >LG to reference the 64 bit address otherwise he risks using high and >low register values that do not match. Is that correct? > >Jon Perryman. > > > >>____ >> From: Binyamin Dissen >> >> >>That won't help if you fetch the new count and the old value1. >> >>On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson >> >>wrote: >> >>:>Yes, it is possible that the updates are not performed in any order. >>:>However, it is guaranteed that the updates are only performed if the >>swap :>can be done. Therefore, I use a simple rule. If the number of >>instructions :>needed to compute the new chain pointers are small (as >>is the case in my :>example). I don't incur the overhead of doing the >>extra 2 PLO (Compare and >>:>Load) operations. I simply re-drive the operation as shown in >>Binyamin's :>example. Even with the PLO Compare and Load, there is no >>guarantee the swap :>will succeed. It just lessens the likelihood. So >>the decision point is :>whether the overhead of 2 additional PLO >>instructions is less than the :>overhead of a re-drive. This can only >>be determined with testing. You can :>determine this by using a CS to >>update a counter for every re-drive. You :>already have an operation >>count, so you can then easily determine the :>percentage of re-drives. >>In my experience, even in very active chains, the :>PLO serialization >>process will incur a very small number of re-drives (much :>less than >>1 >percent). But only testing can reveal that. >>:> >>:
Re: Serialization without Enque
This is not correct. The choice to PLO compare and load is not required since the count is always guaranteed to be swapped after the stores (my last email). I only use PLO Compare and load for complex chain manipulations. But do it if you want. The serialization performed by a PLO forces serialization on the lock word for all processors. I try to Avoid it for situations where a re-drive is less costly -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 04, 2013 2:42 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for pointing out that it's required to do the PLO COMPARE against the counter and FETCH of the value otherwise there is no guarantee that value1 is consistent with the counter. I'm also hearing you say that programs that reference more than a single word, must use PLO COMPARE and FETCH. In Kenneth's example where he uses PLO to save 64 bit addresses (which is 2 words), he can't use LG to reference the 64 bit address otherwise he risks using high and low register values that do not match. Is that correct? Jon Perryman. > > From: Binyamin Dissen > > >That won't help if you fetch the new count and the old value1. > >On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson > >wrote: > >:>Yes, it is possible that the updates are not performed in any order. >:>However, it is guaranteed that the updates are only performed if the >swap :>can be done. Therefore, I use a simple rule. If the number of >instructions :>needed to compute the new chain pointers are small (as >is the case in my :>example). I don't incur the overhead of doing the >extra 2 PLO (Compare and >:>Load) operations. I simply re-drive the operation as shown in >Binyamin's :>example. Even with the PLO Compare and Load, there is no >guarantee the swap :>will succeed. It just lessens the likelihood. So >the decision point is :>whether the overhead of 2 additional PLO >instructions is less than the :>overhead of a re-drive. This can only >be determined with testing. You can :>determine this by using a CS to >update a counter for every re-drive. You :>already have an operation >count, so you can then easily determine the :>percentage of re-drives. >In my experience, even in very active chains, the :>PLO serialization >process will incur a very small number of re-drives (much :>less than 1 percent). But only testing can reveal that. >:> >:>-Original Message- >:>From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] >On :>Behalf Of Binyamin Dissen >:>Sent: Monday, November 04, 2013 11:15 AM >:>To: IBM-MAIN@LISTSERV.UA.EDU >:>Subject: Re: Serialization without Enque :> :>My understanding is >with multi-threading it is possible that the updates to :>the fields >may be out of order and thus it is possible to fetch the updated >:>counter with the unupdated value1. PLO serializes it. >:> >:>On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman wrote: >:> >:>:>Thanks Binyamin. Also a great example but it brings me to another >:>question. What is the advantage of using PLO compare and fetch? Is it >just :>saving CPU time in the case where the counter has changed? Is >there another :>advantage that I'm not thinking about? >:>:> >:>:>Jon Perryman. >:>:> >:>:> >:>:> >:>:>> >:>:>> From: Binyamin Dissen :>> :>> :>> >:>>If you :>truly need a triple compare and swap then PLO will not help >you. But if :>:>>you need a disjoint double compare and swap, you use >the compare-and-swap :>:>>field as a counter and then you con do a compare swap and double store. >:>:>> >:>:>>Example: >:>:>> >:>:>> Fetch counter >:>:>>A PLO compare-and-fetch value1 >:>:>> CC>0, go to A >:>:>> PLO compare-and-fetch value 2 :>:>> CC>0, go to A :>:>> >calculate new value1 and 2 :>:>> Add one to fetched counter :>:>> >PLO CSDST fetched-counter new-fetched-counter, new value1, >:>new-value2 :>> CC>0, go to A :>> :>> :>> :> >:>:>--- >--- :>:>For IBM-MAIN subscribe / signoff / archive access instructions, >:>send :>email to lists...@listserv.ua.edu with the message: INFO >IBM-MAIN > >-- >Binyamin Dissen >http://www.dissensoftware.com > >Director, Dissen
Re: Serialization without Enque
I'm glad you brought that up because I knew what I have been doing for years was correct but I hadn't taken the time to read the manual on PLO in some time. The order of stores is unpredictable except that according to the POM, operand 2 (in this case, the count) is always stored last. " In those cases when a store is performed to the second- operand location and one or more of the fourth-, sixth-, and eighth-operand locations, the store to the second-operand location is always performed last, as observed by other CPUs and by channel programs." Page 7-290 right column top half of page in SA22-7832-09, 7-281 in SA22-7832-08 Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Monday, November 04, 2013 2:02 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson wrote: :>Yes, it is possible that the updates are not performed in any order. :>However, it is guaranteed that the updates are only performed if the swap :>can be done. Therefore, I use a simple rule. If the number of instructions :>needed to compute the new chain pointers are small (as is the case in my :>example). I don't incur the overhead of doing the extra 2 PLO (Compare and :>Load) operations. I simply re-drive the operation as shown in Binyamin's :>example. Even with the PLO Compare and Load, there is no guarantee the swap :>will succeed. It just lessens the likelihood. So the decision point is :>whether the overhead of 2 additional PLO instructions is less than the :>overhead of a re-drive. This can only be determined with testing. You can :>determine this by using a CS to update a counter for every re-drive. You :>already have an operation count, so you can then easily determine the :>percentage of re-drives. In my experience, even in very active chains, the :>PLO serialization process will incur a very small number of re-drives (much :>less than 1 percent). But only testing can reveal that. :> :>-Original Message- :>From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :>Behalf Of Binyamin Dissen :>Sent: Monday, November 04, 2013 11:15 AM :>To: IBM-MAIN@LISTSERV.UA.EDU :>Subject: Re: Serialization without Enque :> :>My understanding is with multi-threading it is possible that the updates to :>the fields may be out of order and thus it is possible to fetch the updated :>counter with the unupdated value1. PLO serializes it. :> :>On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman wrote: :> :>:>Thanks Binyamin. Also a great example but it brings me to another :>question. What is the advantage of using PLO compare and fetch? Is it just :>saving CPU time in the case where the counter has changed? Is there another :>advantage that I'm not thinking about? :>:> :>:>Jon Perryman. :>:> :>:> :>:> :>:>> :>:>> From: Binyamin Dissen :>> :>> :>> :>>If you :>truly need a triple compare and swap then PLO will not help you. But if :>:>>you need a disjoint double compare and swap, you use the compare-and-swap :>:>>field as a counter and then you con do a compare swap and double store. :>:>> :>:>>Example: :>:>> :>:>> Fetch counter :>:>>A PLO compare-and-fetch value1 :>:>> CC>0, go to A :>:>> PLO compare-and-fetch value 2 :>:>> CC>0, go to A :>:>> calculate new value1 and 2 :>:>> Add one to fetched counter :>:>> PLO CSDST fetched-counter new-fetched-counter, new value1, :>new-value2 :>> CC>0, go to A :>> :>> :>> :> :>:>-- :>:>For IBM-MAIN subscribe / signoff / archive access instructions, :>send :>email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Yes, it is possible that the updates are not performed in any order. However, it is guaranteed that the updates are only performed if the swap can be done. Therefore, I use a simple rule. If the number of instructions needed to compute the new chain pointers are small (as is the case in my example). I don't incur the overhead of doing the extra 2 PLO (Compare and Load) operations. I simply re-drive the operation as shown in Binyamin's example. Even with the PLO Compare and Load, there is no guarantee the swap will succeed. It just lessens the likelihood. So the decision point is whether the overhead of 2 additional PLO instructions is less than the overhead of a re-drive. This can only be determined with testing. You can determine this by using a CS to update a counter for every re-drive. You already have an operation count, so you can then easily determine the percentage of re-drives. In my experience, even in very active chains, the PLO serialization process will incur a very small number of re-drives (much less than 1 percent). But only testing can reveal that. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Monday, November 04, 2013 11:15 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque My understanding is with multi-threading it is possible that the updates to the fields may be out of order and thus it is possible to fetch the updated counter with the unupdated value1. PLO serializes it. On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman wrote: :>Thanks Binyamin. Also a great example but it brings me to another question. What is the advantage of using PLO compare and fetch? Is it just saving CPU time in the case where the counter has changed? Is there another advantage that I'm not thinking about? :> :>Jon Perryman. :> :> :> :>> :>> From: Binyamin Dissen :>> :>> :>> :>>If you truly need a triple compare and swap then PLO will not help you. But if :>>you need a disjoint double compare and swap, you use the compare-and-swap :>>field as a counter and then you con do a compare swap and double store. :>> :>>Example: :>> :>> Fetch counter :>>A PLO compare-and-fetch value1 :>> CC>0, go to A :>> PLO compare-and-fetch value 2 :>> CC>0, go to A :>> calculate new value1 and 2 :>> Add one to fetched counter :>> PLO CSDST fetched-counter new-fetched-counter, new value1, new-value2 :>> CC>0, go to A :>> :>> :>> :> :>-- :>For IBM-MAIN subscribe / signoff / archive access instructions, :>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Security exposure of zXXP was Re: zIIP simulation
Since an SRB can do a SCHEDIRB it can do whatever it likes. SRBs were designed for authorized code to overcome restrictions. If you're authorized, the gates open. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Monday, November 04, 2013 7:01 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Security exposure of zXXP was Re: zIIP simulation On Sun, 3 Nov 2013 16:15:56 -0800 Jon Perryman wrote: :>I think Itschak is saying that SRB's can't do I/O, therefore they can't write files to embed a virus or read confidential data. I think he's under the impression that SRB's can't get access to everything they desire. SRB's certainly can do I/O - they just need to do it at the metal level. -- Binyamin Dissen http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I have used PLO almost exclusively for serialization in multi-address space, multi-du code for almost 10 years. I use all 6 operations. Since everything I write is 64 bit mode, I generally use the +2 variant (64 bit length) but I like using the +3 variant (128 bit length) for some really cool stuff. As Rob pointed out, " The key to their use is to have a lock word counter that the caller increments and then prepares the new values in other regs.". But I add that the real key is designing the application to use PLO for serialization which is much more than just writing macros to do the guts of the processes (though I almost always use macros). Consider this example. I have a singly linked chain of 64 bit cell addresses on quad word boundaries. The chain pointers are a quad word at the start of the cell. The first double word is the head of the active chain. The second double word is the head of the free chain. The first quad word of each cell is a double word pointer to the next active entry followed by a double word pointer to the next free entry. The chain has a quad word counter on a quad word boundary. The first double word of the counter is a change count. The second double word is an active element count. The algorithm always adds new cells to the head of the list. I can add a new cell by using a LMG to load the counters, increment each counter by 1, compute the new head, compute the old head's new previous and then use a Compare and swap and double store 128 bit to add the new entry. Since every update increments the first double word counter by 1, the process only completes if no other process updated the counter. If the counter has changed, it needs to re-drive. By adding entries to the head, I can also have code simultaneously searching the chain while entries are added. Of course, if the new head is added before the search starts, it won't be found. But that's no different than using a lock. If the search acquires the lock before the add, it won't be found either. I can even add an element that requires a search for one that has already been added. In this case, I load the counters before the search. I search the chain. If not found, I increment the counter and perform the add. If the add fails, I have to re-drive the search. I can also delete entries from the chain. When I find the entry to be deleted, I save its previous entry. I can adjust the counts, re-compute the chain pointers and do a Compare and Swap and triple store to delete the entry and add it to the fee chain. I can still search the chain but I'll probably need to do a Compare and load to do so. I can avoid the PLO compare and load by not actually deleting the cell but using the low half byte of the active next pointer as a deleted flag. But that has disadvantages as well. This also adds a little more logic to the add, since I now need to add using the free chain if one exits or an add acquiring a new cell. There are a lot of details not given here for brevity. This example also uses an unordered single linked list to simplify the example. But properly designed PLO operations can be performed on ordered doubly linked lists as well. When I read the Principles of Operations on the Z/EC12 transactional execution facility, I think strongly of a PLO on steroids. The point is that PLO can almost be used exclusively for serialization. As far as overhead, I have done a lot of testing and the key is the proper choice of the lock word and the algorithm. In my research, the throughput advantages of PLO far outweigh its overhead. I would love some time with the transactional execution facility. From my reading, it eliminates the need for any serialization other than PLO or transactional execution. Though I understand that IBM has chosen a redrive limit as the determining factor as to whether to fall back to a lock. I believe the only limit to using PLO for serialization is the imagination. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Rob Scott Sent: Monday, November 04, 2013 2:49 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque PLO CSDST and CSTST are *extremely* useful for queue and linked list manipulation in multi-ASID multi-TCB environments. The key to their use is to have a lock word counter that the caller increments and then prepares the new values in other regs. When it comes time to actually atomically update the lock word, you can redrive the structure manipulation logic if the CC indicates that the lock word value has changed, otherwise the other fields are updated atomically. For actual practical uses, it is well worth putting all this inside some sort of macro or small stub service as you do not want to have to code the guts of it each time. I also think the uptake of PLO would be greater if there were some decent example code in the manuals - for instance a client adding a request to the tail of the queue whilst a
Re: Clarification of SAC7 Abend
Since you're using ALESERV EXTRACTH, I'm assuming you want to schedule and SRB into the home address space. IEAMSCHD is expecting the address of the STOKEN. So if you were to do this, LA R2, SRBSTOKEN IEAMSCHD EPADDR=SRBRTN@, PRIORITY=LOCAL, PARM=SRBPARM@, ENV=STOKEN, TARGETSTOKEN=(R2), change this line KEYVALUE=INVOKERKEY, FEATURE=NONE, SYNCH=YES, LLOCK=NO, RMTRADDR=RTMRTN@, FRRADDR=FRRRTN@ I'm also assuming the parms marked with an @ are the address of the actual parm. If you want to schedule an SRB into any address space, provided you know the ASCB address, you can acquire the STOKEN as follows: * R1 has ascb address of target address space for SRB L R1,ASCBASSB-ASCB(,R1) MVC SRBSTOKEN,ASSBSTKN-ASSB(R1) Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of esst...@juno.com Sent: Sunday, October 20, 2013 1:40 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Clarification of SAC7 Abend Hi, I need a better understanding of a SAC7 Abend when using IEAMSCHD to schedual an SRB to another program that resides in another Address Space. I issue this macro from my schedualing address space: IEAMSCHD EPADDR=SRBRTN@, PRIORITY=LOCAL, PARM=SRBPARM@, ENV=STOKEN, TARGETSTOKEN=SRBSTOKEN, KEYVALUE=INVOKERKEY, FEATURE=NONE, SYNCH=YES, LLOCK=NO, RMTRADDR=RTMRTN@, FRRADDR=FRRRTN@ DS 0DAlignment SRBWORK EQU *.SRB Routine Work Areas SRBRTN@ DS A.Address Of Target SRB Routine RTMRTN@ DS A.SRB Recovery Termination Routine FRRRTN@ DS A.SRB Functional Recovery Routine SRBPARM@ DS A.Parameters For SRB Routine SRBSTOKEN DS XL8 .Target/Home Address Space Token SRBRETCODE DS A .Return Code From IEAMSCHED DS A .Reserved * I would like to have the above IEAMSCHD schedual an SRB to another Program in another Address Space (the Target Address Space). The Target Address Space previously made available the SRBWORK structure (above) to the Scheduling Address Space: The Address Of SRB Routine SRBRTN@ was Previously loaded by the Target Address Space. TARGETSTOKEN SRBSTOKEN obtained by ALESERV EXTRACTH,STOKEN=SRBSTOKEN in the Target Address Space. The Recovery Routines (FMTADDR and FRRADDR) were previously loaded by the Target Address Space and contain the Address Of the Recovery Routines. When I issue the IEAMSCHED macro, and Abend SAC7 occurs without any dump. I looked up ABEND=SAC7 with REASON=00080001 in messages and codes. It refers to an inappropriate Space Token. So did I incorrectly obtain the Space Token Of The Traget Address Space ? I used the ALESERV EXTRACTH,STOKEN=SRBSTOKEN, I thought that was correct. Can anyone point me in the right direction to resolve this. Paul D'Angelo * -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: FRR Recovery Routine Environment
>Not a big fan of EUT FRRs Your right. I prefer to examine my environment and chose to use an ESTAEX instead of a FRR unless required. The FRR stack is limited to 2 uses. So when forced to use an FRR, I extend the FRR stack by replacing the prior FRR and restoring it upon exit. Using an EUT=YES FRR creates restraints for code that doesn't need them. If you need the FRR, you have the restraints anyway. But if you don't need it and use an EUT=YES FRR, you just placed a whole bunch of restrictions. And I never use SVCs in any PC code. >I know that R14 points to the EXIT service call Your right again. This is an oversimplification that I didn't bother to elaborate. >It’s a real pain when you want to retry into 64 bit addresses However, I support releases prior to 1.13 so I choose a method that works to the lowest common denominator. I retry to a 31 bit stub that loads the 64 bit address and does a BSM. Your point is well taken. I should refrain from expressing opinions in this forum. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Thursday, October 03, 2013 7:14 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: FRR Recovery Routine Environment >Not a big fan of EUT FRRs. That's a curious statement. As with most things, EUT FRRs solve a problem. If you have the problem, you need them. If you don't have the problem, you don't need them. It seems that it is really not a question of being a fan or not. Maybe you're not a fan of using FRRs in environments where you could use an ESTAEX for such reasons as you want to be able to issue an SVC from your recovery routine or you do not want your recovery to get control with locks obtained by the mainline in such cases. >I know that R14 points to the EXIT service call That is not true for FRRs. >It’s a real pain when you want to retry into 64 bit addresses Given that you have gone to the real pain of using RMODE 64 apparently for many cases that are not supported, I'm surprised to see this statement. You can set your 64-bit address with bit 63 on into the register 15 retry slot of the SDWA (SDWAG6415), use SETRP with RETRY=64,RETRY15=YES , and identify a retry address of CVTBSM0F. I'd call that a relatively minor inconvenience; perhaps that's a real pain to you. This has been available since z/OS 1.13 which is the first release where RMODE 64 of enabled code was tolerated. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: FRR Recovery Routine Environment
I use FRRs a lot because just about every PC routine I write can be called from an SRB. The PC routine defines a ESTAEX when called in task mode and examines the FRR stack and adds or replaces an FRR when called in an SRB routine or when a lock is set. I rarely use EUT=YES FRRS. Not a big fan of EUT FRRs. I know that the documentation states the R2 points to the 24 byte parm area but I've always retrieved it from the SDWAPARM field. In your dump R2 seems to point to the FRR extension. And yes, the parm address is in PSA (BE) looks right for the first entry in the stack). The FRR stack has to be specially managed by MVS and saved and restored each time a DU is dispatched. There's a lot of details missing from your email so I'm just going to give you the working linkage to a well tested FRR routine. First of all, I don't save the registers. I know that R14 points to the EXIT service call so I don't worry about the registers. Second, I always set the first 8 bytes of the 24 byte parm area to an eyecatcher. I always do that last in case something goes wrong before the FRR is fully setup. Beleive me, things can go wrong. Lastly, my FRR entry point is just a stub. After dealing with the FRR dependencies, it just jumps into the ESTAEX recovery routine with an entered from FRR flag set. The way I have everything setup, the processing of an abend doesn't matter once the starting linkage is addressed. One other thing, most of my routines are RMODE64. Therefore, the FRR exit is loaded into 31 bit storage. So most of my code uses the 64 bit instruction set and a 31 bit SDWA. It’s a real pain when you want to retry into 64 bit addresses but its well tested. * * FRR ENTRY TO HANDLE RELEASING LOCK. IT TRANSFERS CONTROL TO ABEND * RECOVERY ABOVE TO RECORD ERROR TDFPCLNK_FRRDS0H LARL R12,TDFPCLNK_ESTAEX USE ESTAEX ADDRESSABILITY USINGTDFPCLNK_ESTAEX,R12 USINGSDWA,R1 LLGT R8,SDWAPARM FRR Parmad address LLGTR R7,R0 SAVE FOR JUMP INTO ABEND RECOVERY LLGFR R5,R14 SAVE FOR JUMP INTO ABEND RECOVERY LLGTR R6,R1 SAVE FOR JUMP INTO ABEND RECOVERY CLC 0(8,R8),=CL8'eyecatch' GOT MY EYECATCHER? JETDFRECV_GOTPARMSYES - LETS RECORD ** * WITHOUT PARM AREA, WE CAN STILL SET ABEND CODES FOR PERCOLATION LLGTRR1,R6 RESET TO RETURN W/FAILURE LARL R2,retry_address_if_no_parms SETRPRC=4,REMREC=YES,RETADDR=(R2) LLGTRR14,R5 RESET TO RETURN W/FAILURE BR R14 ** * RESET MY PARM REGISTERS TDFRECV_GOTPARMS DS 0H Load parms from R8 And jump into ESTAEX RTM exit with called from FRR flag set -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Donald Likens Sent: Wednesday, October 02, 2013 3:47 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: FRR Recovery Routine Environment I do not understand what I am doing wrong. I setup the following recovery environment: SETFRR A,FRRAD=FRRA,EUT=YES,MODE=FULLXM,WRKREGS=(R1,R2), PARMAD=(R3) STR13,0(R3) I then caused my program to abend. When my recovery routine is entered I ABEND with an S0C4 because R2 is not as I expected. The following is the start of my FRR recovery routine: 013723*C START FRR RECOVERY ROUTINE 013724FRR DS0H 013725* ON ENTRY 013726* R15 ADDRESS OF THIS ROUTINE 013727* R14 RETURN ADDRESS 013728* R1 ADDRESS OF SDWA 013729* R2 ADDRESS OF PARAMETERS 013730 STM R0,R15,0(R13) 013731 USING FRR,R15 013733 L R3,0(R2) 013734 USING WKGSTG,R3 013735 LRR12,R15 013736 DROP R15 013737 USING FRR,R12 I thought perhaps R2 would be the address of the parameter list but the
Re: Memory For MSTJCL00 - Whose Is It?
If I were diagnosing this problem, I would take a console dump of ASID 1. If a resource manager is the culprit and it uses an eye catcher, I would expect to see a bunch of storage with that eye catcher. I was simply suggesting that an ASCB resource manager is a good possibility. Resource managers can be dynamically defined by the RESMGR service or statically defined at IPL. Chapter 18 in the MVS Programming: Authorized Assembler Service Guide has a section on resource managers including those statically defined at IPL time. I would look at the statically defined resource managers first particularly any that are defined to execute after the termination of all address spaces. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Martin Packer Sent: Monday, September 16, 2013 8:46 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Memory For MSTJCL00 - Whose Is It? Thank you. Care to name a few - not to point the finger, but so I have some idea what they are? Cheers, Martin Martin Packer, zChampion, Principal Systems Investigator, Worldwide Banking Center of Excellence, IBM +44-7802-245-584 email: martin_pac...@uk.ibm.com Twitter / Facebook IDs: MartinPacker Blog: https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker From: Kenneth Wilkerson To: IBM-MAIN@listserv.ua.edu, Date: 09/16/2013 02:38 PM Subject:Re: Memory For MSTJCL00 - Whose Is It? Sent by:IBM Mainframe Discussion List Address space resource managers execute in asid 1, *MASTER*. Unless they issue a message, you would never know they executed. If an ASCB RESMGR were not cleaning up after itself, it would account for accumulations. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Staller, Allan Sent: Monday, September 16, 2013 8:01 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Memory For MSTJCL00 - Whose Is It? General thoughts with no hard data behind them. I.E. SWAGs 1) MSTJCL00 (i.e. *MASTER*) has been flagged by WLM as Storage Critical. Check w/WLM development. 2) Turn on the VSM* parameters in SYS1.PARMLIB(DIAG*) for data gathering. View w/IPCS and/or RMF. Not sure what (if any) RSM* parameters are available. I believe your theory about MSTJCL00 being used as an anchor is reasonable, however, 2 GB of anchors seems excessive, even in a very large system. I do not believe this is a backing for anything that does not belong to a "system address space". FWIW, Almost a year ago in https://www.ibm.com/developerworks/community/blogs/MartinPacker/entry/bad_da ta_and_the_subjunctive_mood?lang=en I talked about seeing large numbers for memory in MSTJCL00. At the time I got no takers as to what it could be. So I'm trying again here... ... Is MSTJCL00 the anchor for something? Common Large Memory Objects perhaps? And are YOU seeing large memory numbers? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Memory For MSTJCL00 - Whose Is It?
Address space resource managers execute in asid 1, *MASTER*. Unless they issue a message, you would never know they executed. If an ASCB RESMGR were not cleaning up after itself, it would account for accumulations. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Staller, Allan Sent: Monday, September 16, 2013 8:01 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Memory For MSTJCL00 - Whose Is It? General thoughts with no hard data behind them. I.E. SWAGs 1) MSTJCL00 (i.e. *MASTER*) has been flagged by WLM as Storage Critical. Check w/WLM development. 2) Turn on the VSM* parameters in SYS1.PARMLIB(DIAG*) for data gathering. View w/IPCS and/or RMF. Not sure what (if any) RSM* parameters are available. I believe your theory about MSTJCL00 being used as an anchor is reasonable, however, 2 GB of anchors seems excessive, even in a very large system. I do not believe this is a backing for anything that does not belong to a "system address space". FWIW, Almost a year ago in https://www.ibm.com/developerworks/community/blogs/MartinPacker/entry/bad_da ta_and_the_subjunctive_mood?lang=en I talked about seeing large numbers for memory in MSTJCL00. At the time I got no takers as to what it could be. So I'm trying again here... ... Is MSTJCL00 the anchor for something? Common Large Memory Objects perhaps? And are YOU seeing large memory numbers? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: "NSA foils much internet encryption"
-Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John Gilmore Sent: Thursday, September 05, 2013 2:43 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: "NSA foils much internet encryption" More Snowden documents have been reviewed by the New York Times, which this afternoon concluded that The agency has circumvented or cracked much of the encryption, or digital scrambling, that guards global commerce and banking systems, protects sensitive data like trade secrets and medical records, and automatically secures the e-mails, Web searches, Internet chats and phone calls of Americans and others around the world, the documents show. This is not very different from the standard informed conjectures about what the NSA and its counterparts elsewhere can do. It is important that the readers of airline magazines disabuse themselves of the notion that they can keep secrets from these agencies using off-the-shelf technology. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Questions about ESTAE(X)
I rarely use TERM=YES. I use RTM exits almost exclusively for error reporting and setting a retry. About the only time TERM=YES is used is in the primary driver task for a cross memory server so that its RTM exit can reset the PC services available flag to minimize D6 abends. But I don't even rely on that. I just code the calling interfaces to treat and report D6 abends as unexpected server terminations. I rely on the RTM to cleanup address space and task level resources. The real issue is common resources that are shared system wide or between 2 or more address spaces. For this, I prefer resource managers, particularly address space resource managers. And I know its authorized and probably only necessary for cross memory servers which by definition must be authorized. This may seem off topic but the topic is about cleaning up after a termination event (TERM=YES) such as a CANCEL. Address space resource managers are guaranteed to execute, even if an address space is forced. Consider an address space that has terminated because it has exhausted its memory. If you have anything but the simplest tasks to perform, there may not be storage to do cleanup. My experience is that in serious error conditions, the RTM exit may not be the best way to cleanup common resources. The only advice I have is that if you define an address space resource manager, be sure you have a timer exit to time it out should it go in a loop. This probably will never get used but a loop in an address space resource manager, which runs in the master address space is a non-trivial problem. Do with this information as you wish. But if you are considering TERM=YES for anything but the simplest resource cleanup, consider a resource manager. I rarely use TERM=YES. I prefer resource managers. And I only use resource managers to cleanup common resources. Most of the stuff I write uses neither. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Wednesday, August 28, 2013 8:21 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Questions about ESTAE(X) A recovery routine cannot change SDWACLUP (or most of the fields in the SDWA) and have such a change be useful to anything. If you are intended to change it, usually SETRP will let you do so, or it's a field relevant to retry or it's one of the "communication" fields. SDWACLUP is on not only for TERM=YES but also for all other retryable abends. >if I don't issue any ESTAE(X), then >*something* gets control on "normal" ABENDs That's called RTM, regardless of the type of abend. >Am I correct that you apparently can't issue an ABEND macro >(effectively) in >a recovery routine? I would so "no" but it depends what you mean by effectively. Once termination begins (think cancel) an ESTAE(X) without TERM=YES will not get control, but an ESTAE(X) with TERM=YES will. I think that applies to nested recovery too (a nested recovery routine is a recovery routine set within the ESTAE(X) routine itself). For TERM=YES, normal rules of nested recovery, percolation, and even retry apply (a nested recovery routine can retry back to the recovery routine that created it; it cannot retry back to the mainline). >if an ESTAE(X) TERM=YES is chained after an ESTAE(X) TERM=NO, is there >any way to get the chained recovery routine to percolate a >TERM=YES-type ABEND? I must not be understanding. The "chained recovery routine" in the sentence above appears to be the TERM=YES routine. It can of course "percolate" a "TERM=YES-type ABEND" (in fact it has no choice but to percolate). But when it percolates, the TERM=NO routine will not get control, specifically because it is TERM=NO and this was a "TERM=YES-type ABEND". So overall, I really don't know what is confusing. The basic point is "if you have nothing to clean up if the job is going to terminate due to the error, then you usually do not need TERM=YES". For example, if you might ordinarily freemain something, but if the system will do so upon job termination (as it will, in effect, do for region subpools) then you might choose not to worry about getting control in recovery for that termination case to do the freemain. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Hints needed on abend 0D6-027
Program calls (PC) is a Z/Architecture feature that has a z/OS server, PCAUTH, ASID 2 to administer it. The ETCON, ETDES, ETDIS, and ETDEF macros are the primary interfaces into that server. It's the first PC numbers defined in the system during IPL in the range of 0 to x. Chapter 5, Program Execution, Subchapter PC Number Translation in Principles of Operations has an excellent description of how PC numbers are defined. And you are right, PCAUTH maintains these system level control blocks (actually hardware defined) in the PCAUTH 31 bit LSQA (has to be fixed real storage). The hardware provides the ability to disable any lookup in these tables by setting the high order bit to 1 in any of these tables. When an address space terminates or it issues an ETDIS or ETDES for the LX, the PCAUTH server is called and it disables any references to the LX This results in a LSX-Translation exception, X'0027' which is translated to a 0D6-027. This is a very simplified explanation of what is in the POM. So why are you getting a 0D6-27? Because the PC defining address space has terminated or disconnected the LX. When the PC defining address space is terminating or just before issuing its LX disconnect, it could indicate so by setting some bit in a commonly addressable control block. This bit could then be turned off when reinitializes and it reconnects the LX. This greatly reduces the likelihood of the D6-027 but it is impossible to completely eliminate it entirely (though the probability is astronomically small). No matter how close to the call that you place the test, the caller could get suspended and the address space could terminate still resulting in a D6-027. So the code should also recognize a D6-027 as an indication that the served has terminated. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Robin Atwood Sent: Friday, August 23, 2013 6:47 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Hints needed on abend 0D6-027 Our application very occasionally (once every few months) abends 0D6-027, which means a PC instruction has caused a "Linkage second index translation exception". I am wondering exactly what this is telling me since the auxiliary ASID being PCed to had been active for some time and had processed several requests, ie, the PC had been working perfectly well up until the abend. So what has gone bad? My understanding is that the PC linkage information is kept in system control blocks so that should be OK. The PC number we are calling is X'00017F00' which implies an ET index of X'F00'; is that sane? Any hints gratefully received! Robin -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: ECTG usage
The management of the CPU timer is completely in the realm of the dispatcher/scheduler. Therefore, using ECTG when you're not in an disabled state during the entire timing process will not produce the results you want. I have always used TIMEUSED to get CPU time. It's been many years since I've had a need for TIMEUSED and it has certainly changed. It appears ECTG was written to improve its performance. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Richard Verville Sent: Thursday, July 25, 2013 8:47 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: ECTG usage I'm trying to benchmark cputime (under CICS) with pieces of code I'm changing, ECTG before and ECTG after. I'm zeroing out operand 1 before the ECTG thus I get a negative value in GPR0(because ETCG subtracts the operand 1 with the timer value) after I'm doing a LCR of GPR0 to get the positive timer value. If the cputimer went negative during the test (timer interrupt), the second ECTG is higher than the 1st one and since I don't know the "refeed" value of the CPUTIMER, I can't tell how much cputime was spend. I know I could use CICS internal values or statistics) but since they made ECTG as non-privilege I figured I'd give it a try. So... I'm missing something in the concept (the refeed value and how many times the interrupt occured ?) Richard -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a "reverse bits" hardware instruction?
The macro at the end of my reply will generate a reverse translate table. I tested it enough to see that it looked right but it has nothing to do with my original point. I've found this discussion interesting and it has given me reason to play with something other than the complex process I'm working on now. My original point, as was taken by Charles, was that I prefer automatic processes to generate anything but the simplest translate tables. I prefer to do it in a program and copy it from a dump into a program because the table is static. I don't need to generate it every time I assemble. In regards to TROO, the original problem was stated as translating a 64 bit register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice. Though, I find your argument about the use of a TRTRE to simulate a FROGR interesting. It brings the whole bit reversal into perspective. I'm not a big fan of translate instructions (I use them often enough), particularly those that require facilities and facility enhancements, unless you're translating "long" strings and are willing to test the facility bits. I'm sure that the translation and parsing facilities exist on most customer's boxes by now but I emphasize "most". When I awakened this morning, I wrote an algorithm to do a load reverse bytes and bits using FLOGR to drive the process. I'm going to give the idea of a FROGR simulation more thought and continue this exercise later. MACRO &LABEL REVTABLE , * Construct reverse bits translate table LCLA &I,&J,&K,&L,&M,&N,&O LCLC &X &LABEL DS 0DLIKE EM DOUBLE WORD ALIGNED &I SETA 0 STARTING VALUE .TABLOOP ANOP LOOP UNTIL TABLE IS DONE &K SETA 1 NEED SIXTEEN ENTRIES PER LINE &X SETC 'AL1(' AGO.X16LP .X16NXT ANOP &X SETC '&X'.'&J'.',' .X16LP ANOP 16 ENTRY LOOP &J SETA 0 STARTING RESULT &L SETA 1 STARTING ADDEND &M SETA 1 8 BITS PER BYTE &N SETA 128 STARTING COMPARAND X'80' &O SETA &ICOPY CURRENT BYTE TO REVERSE .BYTELP ANOP AIF (&O LT &N).BYTEFT LESS THAN CURRENT - 0 &O SETA &O-&N &J SETA &J+&L .BYTEFT ANOP &L SETA &L*2 NEXT ADDEND &N SETA &N/2 NEXT COMPARAND &M SETA &M+1 NEED EIGHT BITS AIF (&M LE 8).BYTELP &I SETA &I+1 &K SETA &K+1 AIF (&K LE 16).X16NXT &X SETC '&X'.'&J'.')' DC &X AIF (&I LT 256).TABLOOP MEND -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John Gilmore Sent: Wednesday, July 24, 2013 10:29 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a "reverse bits" hardware instruction? The construction of arbitrary translation tables can be error-prone, and when it is it is better done procedurally. I use the HLASM macro language, which is entirely adequate to such tasks; mais à chacun son goût. Here, however, we have for a TROO only the 256 permutations taken two at a time of the sixteen hexadecimal digits, viz., 0==>0, 1==>8, 2==>4, 3==>c, 4==>2, 5==>a, 6==>6, 7==>e, 8==>1, 9==>9, a==>5, b==>d, c==>3, d==>b, e==>7, f==>f and they can be enumerated readily, by program or manually (and certainly without resort to cut-and-paste from a dump). Symmetries can also be exploited, and the whole thing can be arithmetized, but to do either would put mathematics dropouts at a disadvantage. The problem CAN be addressed with left circular shifts/left rotations, but they must be nested (and iterated for long bit strings). The TROO turns out to be faster, particularly for those long bit strings. The problem of bit-string reversal has its own interest, but if its purpose is in effect to simulate a FROGR using a FLOGR, then other approaches are possible. Specifically, a TRTRE, Translate and Test Reverse Extended, can be used. It proceeds from right to left, high to low storage address, in a byte string only until it finds a non-zero value in its table that corresponds to the current byte's rank. Permutations taken two at a time of the hexadecimal digit-codes ==>, 0 0001==>0001, 1 0010==>0010, 2 0011==>0001, 1 0100==>0011, 3 0101==>0001, 1 0110==>0010, 2 0111==>0001, 1 1000==>0100, 4 1001==>0001, 1 1010==>0010, 2 1011==>0001, 1 1100==>0011, 3 1101==>0001, 1 1110==>0010, 2 ==>0001, 1 in which zero indicates the absence of a one bit and a non-zero value indicates both the presence of a one bit and its one-origin offset from the rightmost bit position. The permutations/code points x'N0', N = 1, 2, . . . , f need 'special' treatment, 8 must be added to the values shown above for them. Unsurprisingly, this turns out to be faster than reversal followed by FLOG. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INF
Re: Is there a "reverse bits" hardware instruction?
I can't imagine any instruction sequence in any language performing a "Load Reversed with Mirrored Bytes" more efficiently in the Z/Architecture than a STG, TR for eight bytes and LRVG. Even though, the TR is probably micro-coded (I don't know about the LRVG), I can't see any loop that shifts and manipulates the data and repeats up to 63 times (assuming a very dense register) could outperform this. I wrote an algorithm using a FLOGR but except in the best cases (all 0s or many leading 0s), I can't imagine this running faster. And with negative numbers (-1 being the worst case), you would probably want to exclusive or with foxes before and after the operation to make the value more sparse. However, in your initial post you talked about the above sequence involving the TR being complex. I assume you're talking about the translate table itself. When I need translate tables that are not "simple" and particularly error prone, I write a program to create it. I would quadword align the origin and result tables, do the tests and sets (in this case X'80' to 'X01', ... X'01' to X'80'), load the address of the result table in a register, DC H'0' to get an 0c1. I would set a slip and run the job. I could then format the dump and cut and paste (with a little manipulation) the table into an assembler source. In this case, if the first and last 16 bytes of the table are correct, the its probably 100% correct. I find the half hour I use doing this for "error prone" translate tables can save me hours debugging later. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Wednesday, July 24, 2013 7:31 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a "reverse bits" hardware instruction? Thanks all. You're right, "just how fast DOES this code need to be?" And the answer is I should know, but I don't. I don't want to waste the customer's cycles. I am smart enough to know that I am too dumb to know how fast it needs to be. The right answer lies in profiling, and some other task has always been just a little higher priority than profiling. Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I hated it. I am very weak on the higher math relevant to programming. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Andrew Rowley Sent: Wednesday, July 24, 2013 8:17 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a "reverse bits" hardware instruction? How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit- that-is-set One answer also talks about clearing the lowest set bit. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
I relocate all non-PC code into 31 bit storage. To the code being called, it appears as if it's RMODE31. I do call PC routines above the bar, but it would be trivial to relocate them as well if it became necessary. I trust the Z/Architecture to handle the PC linkage. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Walt Farrell Sent: Thursday, July 11, 2013 9:06 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services On Thu, 11 Jul 2013 09:03:33 -0400, John Gilmore wrote: >As I read Kenneth Wilkerson's post he is arranging things so that an >RMODE(64) routine that needs system services unavailable to it as such >arranges to have a different, associated RMODE(31) routine request them >and make their results available to it. As I read it, that's not what Kenneth said, John. He said that he has an RMODE(31) stub that he uses as the address -of- the PC routine, and the stub then invokes the actual PC code that is RMODE(64). He specifically talked about using the system services from his RMODE(64) code, and if that's true then it's unsupported, as Peter mentioned. > >This scheme [or the alternative one in which an RMODE(31) routine hands >off functions to, or accesses data in, an RMODE(64) one] is entirely >viable and much used in IBM code. Are you perhaps confusing AMODE and RMODE, John? As far as I know, IBM does not make much use of RMODE(64) code. I believe the capability of RMODE(64) code was provided for DB2's use, and I suspect only DB2 is using it for much (though I have no real way of knowing, any more.) It is true that RMODE(31) routines are used often to handle AMODE(64) callers, of course. >In my own programming I >now often use AMODE(64) code in RMODE(31) routines to facilitate just >such exchanges. Right: AMODE(64) in RMODE(31) is just fine. But RMODE(64) code is rare, I believe, and has the restrictions that Peter mentioned. RMODE(64) support for code is documented to be for code that does not call system services. While system services may be documented to allow AMODE(64) callers, that does not mean that they will properly handle RMODE(64) callers. I presume IBM knows (or suspects) that some issues exist if RMODE(64) code were to call system services or they would not have made that restriction. But I suppose it is also possible that they are simply being cautious and avoiding a heavy testing and warranty expense by stating that restriction. -- Walt -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
I'm basing everything I'm doing on the Principles of Operations manual (POM). The Z/Architecture is the final authority for any program even MVS. The linkage for PC instructions are handled by the linkage stack which fully supports 128 bit PSW and 64 bit registers. If it didn't, nothing that I'm doing would work. I've had STORAGE abends (B78 etc) and they get communicated to my RTM exit and the exit handles the 64 bit retry by retrying into a 31 bit stub that BSMs to the actual retry. It's important that I mention that even in PC calls, I insure that all parameters are below the bar. And if a problem ever did occur, I would simply start relocating PC calls below the bar as well. This methodology works for BLDL so I can't see why it wouldn't work for STORAGE. I've been running RMODE64 since shortly after 1.13 became available without incident related to RMODE64 except program bugs. All of this probably works because the RMODe64 programs define a 31 bit RTM exit and 31 bit retries. I understand your objection to LOAD with ADDR=. The fact that the code is not identified can be problematic. But the nucleus solves this problem by using load tables (CVT, SVT, SFT, etc) and provides a service, NUCLKUP. I have simply adopted that methodology; a load table with a RESOLVE command that is also PC intelligent. It's not difficult to extract the LTOR and resolve the entry tables. The POM describes the architecture very thoroughly. I actually don't use ADDR64=. During server initialization, I have to load the code to verify whether or not the code has changed. Since all of the code is self-relocating (no ESDs or RLDs), if it has changed, I simply MVCL it into the common memory object. At the end of initialization, I also DAT protect the code area so that essentially, except for the fact that it can't be identified, it's just like an LPA above the bar. If there were a way to identify RMODE64 code, I would use it. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Thursday, July 11, 2013 6:28 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services RMODE64: Perhaps you are not aware that z/OS provides support for RMODE 64 routines only when they call no system services. If that was your case, then great. But apparently it isn't since you mentioned STORAGE and ESTAEX and CSVQUERY which certainly do not document that they support RMODE 64 invocation. You are taking a risk. Is it worth it? Presumably you are using LOAD with ADDR64 rather than LOAD with ADDR. Perhaps I misread your original post, but I thought it said LOAD with ADDR. I still fully stand by the statement that lOAD with ADDR= to common storage should not be used for programs any longer. LOAD with ADDR64, it is true, has no dynamic LPA equivalent so to the extent you have a routine that properly qualifies, there can be benefit. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
>since most of the stuff I write is RMODE64. >Really? Perhaps you meant AMODE 64. But I'm not sure what that has to do with PC routines. And it has a lot to do with PC routines sine the LPA is 24/31 bit storage. If you want to exploit RMODE64, you can't currently do that in the LPA. PC routines can be called from any amode, any Rmode and any environment (the new transaction environment excluded) except SACF 256 and 768. That is why PC routines are important. More than 2/3rds of the server (approximately 500K of code) executes in a common memory object. The code that does not is mostly involved in SVC screening, runs as IRBs, run as tasks in the server or are specialized services that make a lot of non-PC MVS service calls. Since most of the server including the API are designed to run in cross memory mode, there are no SVC calls. Since PC calls such as STORAGE, ESTAEX and CSVQUERY (which is all the services normally used by the server) are RMODE64 capable this presents no problem. Some of the API services branch call MVS services and even do I/O. Since all the code is ARCHLVL=2 and is self-relocating, I copy the guts (macro expansions) of these calls into a 31 bit work area enclosed in a BSM back to the RMODE64 code. My RTM exit recognizes abends in these relocated copies and report them accordingly. The 2 big issues were RTM exits and getting the PCAUTH to accept my 64 bit addresses. I got around these issues through a concept I call surrogation. I create a 31 bit stub program that contains the entry points to all the PC calls and their RTM exit (they all share the same RTM estaex or FRR exits). This code handles the redirection into the memory object. I could not find methods provided by MVS to do these things (I did not spend a whole lot of time searching). So I designed and wrote my own methods. I designed the server to run in RMODE64 from day 1 so when 1.13 was released, I was able (through a few macros and the surrogate program) to get many of the server programs above the bar in a single day. With time, I've moved most of the server including much of the UI above the bar. I even execute ISPF calls above the bar by replacing the CALL macro with a self-written macro. Again, my point is that I don't believe in designing servers to the lowest common denominator provided you are willing to write the code to fill in the gaps. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Tuesday, July 09, 2013 6:42 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services >You know who owns it because its defined as a PC and therefore has an >entry table assigned to it. I suspect that every diagnostician in the world disagrees with you about using LOAD with ADDR=. It is true of course that you could navigate from the PC number to the entry table to the target address for the PC. But then you want to know what module is at that target address. Having a "name" that has been provided by the module owner (presumably one that follows the module owner's naming conventions) makes that easiest. The same is true if you blow up at address "x" and want to find out in what module "x" is. Using dynamic LPA for things in common makes that easier. And has no significant downside. >since most of the stuff I write is RMODE64. Really? Perhaps you meant AMODE 64. But I'm not sure what that has to do with PC routines. >MVS is going to treat it as authorized simply because it's in the LPA. That means that it is accepted as the target of a LINK, LOAD, (etc) from an authorized requestor. It does not mean that it will get control in an authorized state from EXEC PGM=. That requires AC=1. >To say that you can't ever free a PC routine is untrue. Almost any >space switching PC will terminate as soon as the server that defined it >terminates. I carelessly omitted, but the thread had already established, that we were talking about non-space-switch PC's. >Certainly, any PC routine that is defined as non-space switching >system PC routine that can be called without any provided interface >probably cannot be freed. The only such "interface" that I can think of is one that increments a counter (or sets a flag if that suffices) before issuing the PC and freeing of the area is not allowed if the counter is non-0. Such counters/flags are notoriously problematic due to memterm considerations. >However, a new copy can be loaded and >redefined which is why I like reusable LXs. Everyone should like reusable LX's. But you still do have to get rid of the old one first so there's a window when neither is available. >In my book, PC routines are the only way to fly. I don't think anyone is disagreeing with you. I was only pointing out that LOAD with ADDR= is not the way to go. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive acc
Re: Dynamic LPA Services
A D6-22 is a linkage exception meaning the LX is not connected to the address space issuing the PC. For a system LX, this means the LX has not been connected by an ETCON, the LX has been disconnected by an ETDIS or ETDES, or the address space that connected the LX has terminated. For a non-system LX, it could mean the address space issuing the PC has not issued the ETCON to connect the non-system LX. If you do a SLIP COMP=0D6, you can use the IPCS Status display to list all the linkage tables in ascending LX order. Then you can visually whether the LX is connected or not. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Tuesday, July 09, 2013 3:29 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services You should be aware that ETDEF does not set a return code. It does inline instructions to build a single entry. The ETCRE/ETCON are the ones that make something happen. On Mon, 8 Jul 2013 22:28:59 GMT "esst...@juno.com" wrote: :>As the original Poster, I thank every one for there input. :>The various information provided has been excellent :>Thank You :>I am still getting the 0D6-022 Abend, and I am Not Understanding why. :> :>So Let me level set every one. :>I Am On a Z/oS 1.4 System :>I do not have LX Reuse on this system, I don't think that has anything to do with this issue. :> :>I use te CVT to determine LX Reuse. :> USING CVTMAP,R15 .INFORM ASSEMBLER :> L R15,X'10' .ADDRESS OF CVT :> TMCVTFLAG2,CVTALR.LX Reuse Available 01404522 :> BNO NO_LX_REUSE.NO EXISTANCE 01404622 :> :>I would like to understand the use of CR0 to determine this, if someone would post the code. :>. :>I am aware of Obtaining Storage in Common and Loading a routine into key 0 SP 241 or similar, I'm trying to gain a new skill by using LPA Dynamic Services. :> :>In a separate job I Dynamically Add a module to LPA using CSVDYLPA. :> :>Then I Start An Address Space and use CSVQUERY to obtain the entry Point Address of the Module I Added To LPA. :>The Entry Point Address returned from CSVQUERY is then used in an ETDEF SET macro that describes a Non Space Switching PC Routine. :> :>SET_ETD1 DS 0H 03340004 :> ETDEF TYPE=SET,ETEADR=ETD1,ROUTINE=(2),RAMODE=31, X03350004 :> STATE=SUPERVISOR,PC=STACKING,SSWITCH=NO, X03360004 :> SASN=OLD,ASCMODE=PRIMARY, X03370004 :> EK=8,PKM=OR, XX03380004 :> AKM=(8,9),EKM=(8) 03390004 :>* 0344 :> STR15,XMSRESP Save Response Code 03410004 :> BRAS R14,CHKRESP Go Check Response Code In Reg-15 :> :>The Address Space has Not terminated. :> :>Now I want to submit a Job which invokes this Routine via a PC instruction. :>the PC Number is D601. :>Where D6 is The LX :>01 Is the Second Entry in the Entry Table. :>However when I issue the PC instruction I get an 0D6 Abend... :> :>Thank You Again for all Your comments. :> :> :> :>-- :>For IBM-MAIN subscribe / signoff / archive access instructions, :>send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen http://www.dissensoftware.com Director, Dissen Software, Bar & Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
>The point is that SLIP LPAMOD=and the IPCS WHERE subcommand >will not be able to identify your module by name. So when someone needs to refer to your module on a SLIP command, they will need to manually determine the address of your module in order to use use ADDRESS= (and the >address could change every time your product starts, and is likely to be different of each member of a sysplex). >As a z/OS diagnosis expert, I view that as a serviceability issue. Since a server address space is required to define the PCs, the server provides an operator command such as resolve. The customer issues RESOLVE,pgmname+disp and the system returns the address and the instruction at that address for verification. The address can then be cut and paste into a SLIP command. I have my own WHERE facility that is PC intelligent. My point is that you don't have to design severs to the lowest common denominator as long as you are willing to provide services to fill in the gap. The diagnostic capabilities that the server I write provide are much greater than what currently exists. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jim Mulder Sent: Tuesday, July 09, 2013 12:16 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services > You know who owns it because its defined as a PC and therefore has an entry > table assigned to it. Looking in the entry tables for a program is > just as > common a practice as looking for "identified" programs. So finding PC > routines just requires different methods. Besides, if this is a > stacking PC > which is the only type I use, the linkage stack has everything needed > to associate the call with the PC routine including the PC number. The point is that SLIP LPAMOD=and the IPCS WHERE subcommand will not be able to identify your module by name. So when someone needs to refer to your module on a SLIP command, they will need to manually determine the address of your module in order to use use ADDRESS= (and the address could change every time your product starts, and is likely to be different of each member of a sysplex). As a z/OS diagnosis expert, I view that as a serviceability issue. > To say that you can't ever free a PC routine is untrue. Almost any > space switching PC will terminate as soon as the server that defined > it terminates. So these can be released and refreshed as needed every > time the > server recycles. Certainly, there are many PC routines that can't be freed. > But if a PC routine is designed to be called as part of an API or UI, then > API/UI recovery can easily recover the error and report it as the > server terminating. Certainly, any PC routine that is defined as > non-space switching system PC routine that can be called without any > provided interface probably cannot be freed. However, a new copy can > be loaded and > redefined which is why I like reusable LXs. Consider the case where the storage formerly occupied by the freed PC routine has been reassigned by VSM, and now contains data that happens to look like a valid instruction stream. So now your "PC routine" is executing unintended code, with the authority of the user and your PC routine. What will cause your API/UI recovery to get control, and if it does get control, how will it "easily recover the error"? How will it detect and repair any damage which occurred due to the execution of the unintended instructions? Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
Control Register 0 bit 44 contains the system setting for LXRES as defined in the POM in the chapter of control. I'm a Z/Architecture guy and I usually go to the architecture for settings instead of z/OS. I'm also pretty sure LX Reuse did not exist in 1.4 though I may be wrong. It was added sometime in the 1.4 to 1.6 time frame. In your description, I don't see any reference to LXRES, ETCRE or ETCON. ETDEF only creates the entry table needed to define PCs. LXRES reserves an LX (you probably need a system one) and returns a token. I assume that the D6 LX was acquired via an LXRES and it's a system LX. ETCRE creates a working copy of your entry tables in the PCAUTH address space and also returns a token. ETCON connects your entries via the LXRES and ETCON tokens to the address spaces that are allowed access to your PC routines. For system LXs, that's every address space. The PC numbers start at LX00 and go to LX## where 00 is assigned to the first entry in your entry table and ## is the last entry up to 255. If you define space switch P{C more setup is required. The Extended Addressability manual goes into all this. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of esst...@juno.com Sent: Monday, July 08, 2013 5:29 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Dynamic LPA Services As the original Poster, I thank every one for there input. The various information provided has been excellent Thank You I am still getting the 0D6-022 Abend, and I am Not Understanding why. So Let me level set every one. I Am On a Z/oS 1.4 System I do not have LX Reuse on this system, I don't think that has anything to do with this issue. I use te CVT to determine LX Reuse. USING CVTMAP,R15 .INFORM ASSEMBLER L R15,X'10' .ADDRESS OF CVT TMCVTFLAG2,CVTALR.LX Reuse Available 01404522 BNO NO_LX_REUSE.NO EXISTANCE 01404622 I would like to understand the use of CR0 to determine this, if someone would post the code. . I am aware of Obtaining Storage in Common and Loading a routine into key 0 SP 241 or similar, I'm trying to gain a new skill by using LPA Dynamic Services. In a separate job I Dynamically Add a module to LPA using CSVDYLPA. Then I Start An Address Space and use CSVQUERY to obtain the entry Point Address of the Module I Added To LPA. The Entry Point Address returned from CSVQUERY is then used in an ETDEF SET macro that describes a Non Space Switching PC Routine. SET_ETD1 DS 0H 03340004 ETDEF TYPE=SET,ETEADR=ETD1,ROUTINE=(2),RAMODE=31, X03350004 STATE=SUPERVISOR,PC=STACKING,SSWITCH=NO, X03360004 SASN=OLD,ASCMODE=PRIMARY, X03370004 EK=8,PKM=OR, XX03380004 AKM=(8,9),EKM=(8) 03390004 * 0344 STR15,XMSRESP Save Response Code 03410004 BRAS R14,CHKRESP Go Check Response Code In Reg-15 The Address Space has Not terminated. Now I want to submit a Job which invokes this Routine via a PC instruction. the PC Number is D601. Where D6 is The LX 01 Is the Second Entry in the Entry Table. However when I issue the PC instruction I get an 0D6 Abend... Thank You Again for all Your comments. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
Thank you Walt. I was remembering an issue incorrectly. I certainly am guilty of confusing how content supervision handles some aspects of authorization which is one reason I stick to PC routines. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Walt Farrell Sent: Monday, July 08, 2013 2:17 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services On Mon, 8 Jul 2013 07:55:46 -0500, Kenneth Wilkerson wrote: >And it doesn't >matter what the AC= is for a LPA program. MVS is going to treat it as >authorized simply because it's in the LPA. I discovered that the hard way. > That's not true, Kenneth. MVS will certainly consider any LPA-resident module to have been loaded from (i.e., resident in) an APF-authorized library, but that is not related to the AC=0/1 setting. Being resident in an APF-authorized library simply means that the system will allow another program that is already running authorized (APF, system key, or supervisor state) to load the module, and this is true for both AC=0 and AC=1 modules. If the module is not in an APF-authorized library and an authorized program tries to load it in the normal way, the load will fail. If the module does have AC=1, and it's resident in an APF-authorized library, then IF the module in invoked as the jobstep program by the initiator (or in a small handful of other ways) then the new jobstep will gain APF-authorization and run APF-authorized. If you have an LPA-resident module that is AC=0, and you run it via EXEC PGM= it will NOT run APF-authorized. It needs AC=1 for that. Many people (including some IBMers, and some writers of documentation) seem confused by the distinctions between APF-authorized libraries, AC=1, and running APF-authorized. -- Walt -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
You know who owns it because its defined as a PC and therefore has an entry table assigned to it. Looking in the entry tables for a program is just as common a practice as looking for "identified" programs. So finding PC routines just requires different methods. Besides, if this is a stacking PC which is the only type I use, the linkage stack has everything needed to associate the call with the PC routine including the PC number. I rarely use anything but PC routines anymore since most of the stuff I write is RMODE64. The IPCS status displays the entry tables for that reason. And it doesn't matter what the AC= is for a LPA program. MVS is going to treat it as authorized simply because it's in the LPA. I discovered that the hard way. To say that you can't ever free a PC routine is untrue. Almost any space switching PC will terminate as soon as the server that defined it terminates. So these can be released and refreshed as needed every time the server recycles. Certainly, there are many PC routines that can't be freed. But if a PC routine is designed to be called as part of an API or UI, then API/UI recovery can easily recover the error and report it as the server terminating. Certainly, any PC routine that is defined as non-space switching system PC routine that can be called without any provided interface probably cannot be freed. However, a new copy can be loaded and redefined which is why I like reusable LXs. In my book, PC routines are the only way to fly. They can be called in any amode, any rmode and any environment other than SACF 256 and 768 which are very rarely used. And there are as many ways to define and use them as they are people that define and use them. I was just making a suggestion. Peter is making another. I imagine Peter has trusted methods that work. I know that I do as well. If you're going to define a PC, I suggest you don't allow the old dogma to get in your way. Find the method that works for you. PC routines are where MVS has been headed for a long time. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Monday, July 08, 2013 6:45 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services 0D6 - 22: A linkage index (LX) translation exception occurred; the program interruption code is X'22'. This cannot have anything to do with the location of the target routine. Things added to dynamic LPA are part of LPA. They are built out of (E)CSA. What they are not part of are PLPA, MLPA, FLPA which are not built out of (E)CSA. The approach of using "directed load" is frowned upon. It does not buy anything and has detrimental RAS affects, since the storage area being used as the PC target is now not known by name and thus is harder for any diagnostician to determine who owns it. There is just about no reason any more to do LOAD with ADDR to CSA for code. P.S., do not use LOAD with GLOBAL=YES if your address space could ever terminate without wait-stating the system, as the system frees that storage upon such termination. It is true that someone "could" LINK to the name since there is a name, but that is never of concern to a properly written program. The LPA routine should not be marked as AC=1. By the way, there are extremely few cases where a PC routine can *ever* be freed without introducing a system integrity problem. Do you truly know (as opposed to just hope) that no one is within the routine at the time you want to free it? Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Dynamic LPA Services
I don't know what you're trying to do but I would never define a PC in the LPA for a lot of reasons. The most basic of these is that LPA routines are callable by the EP=or EPLOC= parameter on LOAD, LINK, XCTL and ATTACH services. When called from these services the traditional linkage is significantly different than PC linkage. Of course, you might want to be callable by a LOAD, LINK, XCTL and ATTACH service which means you would have a separate entry point defined for the PC routine. I use a jump table at the start of a program to define multiple entry points. To define system level PC routine, I normally acquire a CSA control block that can be found by a system level name/token. Another approach is to use a word in the ECVT that has been assigned to you. I don't remember the procedure but a vendor can get one word in the ECVT assigned by IBM to that vendor. You may use another method. But regardless of the method used to anchor the control block, the small CSA control block would contain the EPA/Length and PC assignments. I always define reusable LXs, so I keep the LX number in there as well. I think reusable LXs are simpler but you have to check control register 0 to make sure the feature is available. I then acquire key 0 SP=241 (CSA non-fetch protected) storage and I do a LOAD ADDR= into that CSA storage. I can now define the PC. Each time you need to refresh the PC routine, you'll need to release the old storage, load a fresh copy and redefine the PC. If you use a reusable PC number, the PC number (low 32 bits) will remain the same (unless you change it) but a the sequence number (high 32 bits architecturally passed in r15) will be incremented by one. For that reason, I always use R15 as the PC register and I save the sequence number and PC number in a double word so I can load it into R15 and do a PC 0(R15). Since you're getting a D6-22 and you're sure the PC is defined, I suspect that the defining address space has terminated. MVS has to have an address space to own a resource. When you acquire an LX and define a range of PC routines, tables are created in real storage and are assigned to the defining address space. The PCAUTH server defines your PC tables in the private SQA of the PCAUTH address space. They are disconnected and released from real storage whenever the address space terminates. If you want PC routines to persist for the duration of an IPL, you need to schedule an SRB into a system address space to define the required PCs. The choice of system address space is yours. The non-space switch PC won't execute In the system address space. It will execute under the DU control blocks (SRB or TCB) in the address space of the caller. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Chuck Arney Sent: Sunday, July 07, 2013 3:31 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services Did your dynamic LPA replace a module that was already established as the PC routine? The book says you can't do that, as the PC linkage tables are not updated by the dynamic LPA service. If you are replacing a module defined as a PC you would have to remove the PC and redefine it with the new module address. Chuck Arney Arney Computer Systems -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of esst...@juno.com Sent: Sunday, July 07, 2013 3:08 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services Chuck Areny "It should work just fine Paul" Well I tend to agree, I seem to get the old 0D6-22 Abend when I try To PC to the routine. I first thought PC number was incorrect however I listed my PC Numbers and respective PC number is correct. Thats why I posted this question. Thanks For the Response, I will recheck the code. Paul D'Angelo -- Original Message -- From: Chuck Arney To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Dynamic LPA Services Date: Sun, 7 Jul 2013 14:12:34 -0500 It should work just fine Paul. Chuck Arney Arney Computer Systems -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of esst...@juno.com Sent: Sunday, July 07, 2013 12:31 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Dynamic LPA Services i have been woking with the Dynamic LPA services of z/OS (CSVDYLPA). Im abel to add, delete, and invoke modules that were dynamically added to "LPA" using CSYDYLPA and CSVQUERY. However after re-reading the description of CSVDYLPA, its not really LPA, its more Common storage. So my question is - Should I be able to invoke a Dynamically Added Module as a Non Space Switching PC Routine. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subs
Re: Assember
TMI2REC+ISTAT-IREC,SDLET Is equivalent to: LA somereg,I2REC somereg is R1-R15 USING IREC,somereg TM ISTAT,SDLET DROP somereg Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Monday, June 24, 2013 12:31 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Assember If you (1) post the 2 or 3 instructions following the TM and (2) post the "object code" that appears in the listing to the left of the instruction then we can help you more. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Ron Thomas Sent: Monday, June 24, 2013 8:24 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Assember Hello. can some one pls let me know what this assembler code does? TMI2REC+ISTAT-IREC,SDLET how the above code work? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: New Software Tool for z/OS Developers Announced by Arney Computer Systems
TDF does not use traditional "intercept" technology. TDF never alters any user code other than user specified breakpoints and it never alters any MVS code in any way. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Chuck@arneycomputer Sent: Wednesday, April 10, 2013 7:54 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: New Software Tool for z/OS Developers Announced by Arney Computer Systems I think everyone is aware of that, but of course if not, they should understand it. That said, everyone should also know that there is no method of achieving the end result that is supported by IBM. Therefore, there is no choice if you need the function. This processing is done using the standard SVC screening facility that is provided by IBM but they do not stanchion some of the things that can be done with it. Keep in mind that this is a system level debugging product that should only be used in a development environment. We take extensive measures to ensure system integrity but it is a very powerful tool that can be misused. It should never be used in a production environment. It serves no function for production work. Chuck Arney On Apr 10, 2013, at 8:05 AM, Peter Relson wrote: >> wrap all content supervision (LOAD, LINK, XCTL and ATTACH), RTM exit >> (ESTAE(X), STAE, (E)SPIE and SETFRR) and selected schedule (such as >> IEAMSCHD) service calls. > > As all should understand, very little of this would be considered > supported in any way shape or form and if anything in this realm > caused a problem (or could conceivably have caused a problem), IBM > service might take a hard line about helping. > > Peter Relson > z/OS Core Technology Design > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, send > email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Use of the TRAPx Instructions
Setting up the trap environment requires authorization and can easily be done by a non-space switch PC setup by an authorized server. The execution of a TRAPx instruction is not authorized and executes under the state of the program being trapped. So, NO, there are no system integrity issues when debugging unauthorized programs. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Tom Marchant Sent: Wednesday, April 10, 2013 5:33 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Use of the TRAPx Instructions On Tue, 9 Apr 2013 16:17:25 -0500, Kenneth Wilkerson wrote: >You have to >be able to acquire key 0 to even examine the DUCT let alone modify the >DUCT to define the required trap control blocks. This means, of course, >that the application creating the trap environment must be authorized. Doesn't that mean that it is difficult at best to ensure system integrity when debugging non-authorized code? -- Tom Marchant -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Use of the TRAPx Instructions
Thanks. I forgot about that one. It is covered in Chapter 22 on Exits in the Auth Assembler Guide. This would also be classified as a branch entry intercept. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Gerhard Postpischil Sent: Tuesday, April 09, 2013 9:15 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Use of the TRAPx Instructions On 4/9/2013 7:03 PM, Kenneth Wilkerson wrote: > So now specifically to asynchronous exits. There are 3 ways to > schedule asynchronous exits that I know of, by STIMER(M), by SCHEDIRB > and by the old, SCHEDXIT. If there are other ways, please let me know. There is at least one other - see the CIRB macro. Before HASP, operators on an OS/360 system had to issue an explicit START RDR command to read a job stream. I had a parameterized facility (define command, alter, delete by unit) that caused an unsolicited interrupt to a defined device to trigger the appropriate command, thus obviating the need for the operator to start the reader. For MVS I have a version with more flexibility - I can set an ATI in designated UCBs, and issue any command in response to an interrupt. This is handy for activating CRT terminals on a different floor (not all are defined to VTAM). The MVS version does not use CIRB, nor does it use the Master Scheduler; instead it calls CVT0EF00 directly to schedule an IRB and IQE. Gerhard Postpischil Bradford, Vermont -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Use of the TRAPx Instructions
Peter, since you did such an excellent job of describing the problem, I've decided to describe to you how TDF is going to handle asynchronous exits. Before doing so, I need to clearly state that for the current release of the product the asynchronous exit debugging is a restriction. Months ago I looked into the problem and devised a solution but I didn't implement it because I had many other things on my plate. Second, all of the technology I'm about to describe exists in TDF right now. It's just a new implementation of the same intercept technology already being used. The TRAPx instruction alone is insufficient to accommodate full fledge debugging. As you mentioned, its anchored off the Dispatchable Unit Control Table (DUCT) so its scope is a dispatchable unit, a TCB or SRB. In a complex system involving multiple tasks and multiple address spaces, this scope is easily exceeded. So a mechanism must exist to automatically extend the scope of the trap environment as the application grows in scope. The mechanism used by TDF is called Dynamic Program Intercepts. Without going into great detail, suffice it to say that TDF intercepts content supervision (LOAD, XCTL, LINK and ATTACH) , RTM exit and key schedule service calls. So when a new task is attached, through commands, you can define whether TDF automatically sets up the trap environment for a new task. You actually attach a TDF program that sets up the environment and transfers control to your program with a newly setup trap environment. When you create an RTM exit, a small program in TDF actually gets control and it wraps your RTM exits. TDF can also wrap key schedule services like IEAMSCHD so that when an SRB is scheduled, it initializes the DUCT just as if the SRB had a hook macro assembled into it. The bottom line is that the MVS implementation (or lack of it) of the TRAP instruction can be compensated through intercept technology. So now specifically to asynchronous exits. There are 3 ways to schedule asynchronous exits that I know of, by STIMER(M), by SCHEDIRB and by the old, SCHEDXIT. If there are other ways, please let me know. Since I haven't really started coding this I haven't finished all my research. STIMER(M) is by SVC. The existing TDF SVC intercept can be easily extended to include STIMER (M) . SCHEDIRB and SCHEDXIT are branch entry calls. TDF provides a mechanism called Branch Entry intercepts which are very special TRAP breakpoints that redirect execution of Branch entry calls (like SETFRR for example) to an intercept similar to the way SVC and PC interception works. Regardless of the type of schedule service, TDF replaces your exit address with a program that would wrap your exit. The front end would "stack" the prior trap environment and the back end would "unstack" it. This is a simple explanation to a complex problem. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Tuesday, April 09, 2013 4:42 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Use of the TRAPx Instructions Looking as the DUCT, it also contains an indication of base address space or subspace. I would think that an ASYNC exit/IRB would not useethe basespace indicater of the current RB. On Tue, 9 Apr 2013 16:17:25 -0500 Kenneth Wilkerson wrote: :>You are certainly correct about the z/OS implementation of the TRAPx :>instruction. I often wondered why the hardware designers decided to :>implement it such that it inherits the user state and not a predefined state :>and why they didn't provide a service to "register" trap interfaces so they :>could be shared or the asynchronous exit issue could be solved. You have to :>be able to acquire key 0 to even examine the DUCT let alone modify the DUCT :>to define the required trap control blocks. This means, of course, that the :>application creating the trap environment must be authorized. :> :> But despite these z/OS limitations, in my mind, the asynchronous exit :>issue is just a restriction. It's no bigger a restriction then the fact that :>the TRAP can't be executed in HOME ASC (SAC 768) or Secondary ASC mode (SAC :>256) or in transaction processing. Your scenario involves the asynchronous :>exit also invoking a TRAPx instruction. So simply stated, asynchronous exits :>cannot be debugged using the TRAPx instruction without a service to :>determine if the TRAP control blocks are currently in use and perform a TRAP :>stack function. But even this problem can be solved with a little extra code :>(a PC routine) that stacks the current trap save area if it's in use. :> :>Certainly, the TRAPx instruction has fewer limitations than other available :>methods. Every method is going to have restrictions. In my asynchronous :>exits, I typically simply update a control block and post a task to proce
Re: Use of the TRAPx Instructions
You are certainly correct about the z/OS implementation of the TRAPx instruction. I often wondered why the hardware designers decided to implement it such that it inherits the user state and not a predefined state and why they didn't provide a service to "register" trap interfaces so they could be shared or the asynchronous exit issue could be solved. You have to be able to acquire key 0 to even examine the DUCT let alone modify the DUCT to define the required trap control blocks. This means, of course, that the application creating the trap environment must be authorized. But despite these z/OS limitations, in my mind, the asynchronous exit issue is just a restriction. It's no bigger a restriction then the fact that the TRAP can't be executed in HOME ASC (SAC 768) or Secondary ASC mode (SAC 256) or in transaction processing. Your scenario involves the asynchronous exit also invoking a TRAPx instruction. So simply stated, asynchronous exits cannot be debugged using the TRAPx instruction without a service to determine if the TRAP control blocks are currently in use and perform a TRAP stack function. But even this problem can be solved with a little extra code (a PC routine) that stacks the current trap save area if it's in use. Certainly, the TRAPx instruction has fewer limitations than other available methods. Every method is going to have restrictions. In my asynchronous exits, I typically simply update a control block and post a task to process whatever it is that I'm trying to process. Kenneth, TDF Architect -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Morrison, Peter Sent: Tuesday, April 09, 2013 3:37 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Use of the TRAPx Instructions I have had extensive experience with the use of the TRAPx (TRAP2/TRAP4) instructions in a z/OS environment. z/OS offers no support for setting up to enable them. Basically, you need to anchor things in the DUCT (Dispatchable Unit Control Table). There is one DUCT set up for each TCB and SRB. (Note that ONLY ONE DUCT is set up for a TCB - not one DUCT per RB level. THIS IS VERY IMPORTANT!) (preserving a specifically formatted DUCT is important, but is not relevant to the discussion below. Just be aware that there are issues associated with it) Generally, you can regard TRAPx as a 'fancy branch'. The target is the routine whose address is set up in the control blocks. The hardware saves all state in the 'trap save area' first. BUT, there is a very significant problem when using TRAPx with z/OS! When your trap routine gets control, system state is as before the TRAPx instructions was executed. This includes the fact that interrupts are probably enabled. Why does this matter? Because, in z/OS, in TCB mode, if an interrupt occurs, processing of that interrupt could involved de-scheduling the TCB, deciding to request a new RB level, and later, redispatching the TCB, so that the new RB level will get control. This can lead to the following scenario: 1: TCB-mode code executes a TRAPx instruction 2: The hardware saves all state (PSW/Regs) in the Trap Save Area 3: The registered Trap handler routine is given control and starts executing... 4: an interrupt occurs 5: During processing of the interrupt, the current TCB has a new RB level stacked over it 6: The TCB is resumed. Execution now is for the new RB level 7: The new code executes a TRAPx instruction 8: The hardware saves all state in the Trap Save Area. BAZINGA! The old information is overwritten! 9: The registered trap handler routine is given control and starts executing... Because the trap save area has been overwritten, the lower-level handler, when it resumes execution, is not using correct information. There is not even any way to know that this has occurred. While the situation CAN be circumvented by preventing asynchronous RB stacking (there is a bit in the TCB for this), this can play havoc with debugging as, for example, asynchronous exits to do with I/O won't execute... For the above reason, use of TRAPx instructions as a way to implement breakpoints in code that executes on z/OS in TCB mode is not a good idea... Peter Morrison Principal Software Architect CA 6 Eden Park Drive North Ryde NSW 2113 Tel: 02 8898 2624 Fax: 02 8898 2600 peter.morri...@ca.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: New Software Tool for z/OS Developers Announced by Arney Computer Systems
Howdy, My name is Kenneth and I'm the architect of TDF. I thought I would take a few minutes to clarify a little about TDF. This is my first time doing this. First, TDF is designed to be much more than an interactive debug tool. It wasn't designed to compete with any existing products. It's primary purpose is to expand the realm of debugging tools outside of the development scope into testing, maintenance and customer problem determination. The first release is primarily about the interactive component because it's the part that is currently tested to our standards and we feel can be used reliably. Consider, the example of locked code. TDF is carefully architected so you could interactively debug any locked code except disabled code and code in a CPU lock. One day TDF might actually support this mode but it would require more code than is currently justified. But TDF is designed to provide non-interactive data collection. Through panels to define 'traces', you can specify what states and data you want to collect up to 2K per trace point. Each trace point can be tailored to the specific needs of that trace point. So TDF can be used to provide a dynamic trace to any code without any modification. Now consider this. TDF has a scripting capability where all the commands needed to perform a trace or debug are recorded. A customer reports a problem. You design a set of traces to collect the needed data. You send the script to the customer. Since TDF does not require any code modifications, they start up a batch runtime component that executes the script against a test case. It collects the trace data which can be shipped back to the product developer for analyses. A fix is prepared and shipped to the customer. The same script can now be run again to verify the fix. That is what TDF is designed to do. It's an entirely different debug paradigm that expands debugging tools into the realm of maintenance and problem determination. Second, TDF has no boundaries. TDF is dynamic and can operate across any number of address spaces, tasks, SRBs, PC routines and RTM exits. It does this by using the TRAP instruction. This instruction can execute in almost all environments. Without going into details, essentially it can execute where ever a PC instruction can execute. Essentially, TDF uses what we call Dynamic Program Intercepts to wrap all content supervision (LOAD, LINK, XCTL and ATTACH), RTM exit (ESTAE(X), STAE, (E)SPIE and SETFRR) and selected schedule (such as IEAMSCHD) service calls. This list will grow as demand dictates. It also uses this same technology to wrap user 'identified' PC routines and common routines. It does this by making a copy of the identified code thus isolating it from any other callers. In fact, two or more sessions can debug the same PC concurrently. A future enhancement (still being tested) will allow the grouping of any number of tasks and address spaces into a debug group. This will become essential for problem determination in complex task or server/client scenarios as described in the runtime component in the second paragraph. Third, IBM is a hardware and software vendors. It has the luxury of pairing the hardware, Z/Architecture, and software, z/OS, architectures into one of the most powerful, if not the most powerful operating system. TDF is designed to exploit both architectures. The TRAP instructions are a simple example of that. The PC screening technology is another. In fact, TDF is architected more on the Z/Architecture that z/OS. It requires z/OS to execute but it is much more reliant on the Z/Architecture. TDF only uses 3 IBM services in the debugging of a dispatchable unit. Anyone that has any specific technical questions about TDF simply need to ask. Kenneth -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN