Re: Serialization without Enque
On Mon, 18 Nov 2013 22:34:57 -0800, Ed Jaffe wrote: On 11/10/2013 1:19 PM, Mark Zelden wrote: I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. ENQ/DEQ or Latch I can still vividly remember first reading PLO in the Pops- and thinking bloody hell - umpteen pages 2-up to describe (nominally) just one instruction ... Shane ... -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
It's not the number of instructions but what you do with them that counts. Or something like that... Rob Scott Lead Developer Rocket Software 77 Fourth Avenue . Suite 100 . Waltham . MA 02451-1468 . USA Tel: +1.781.684.2305 Email: rsc...@rs.com Web: www.rocketsoftware.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Shane Ginnane Sent: 19 November 2013 09:18 To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On Mon, 18 Nov 2013 22:34:57 -0800, Ed Jaffe wrote: On 11/10/2013 1:19 PM, Mark Zelden wrote: I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. ENQ/DEQ or Latch I can still vividly remember first reading PLO in the Pops- and thinking bloody hell - umpteen pages 2-up to describe (nominally) just one instruction ... Shane ... -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Long-winded and ugly but functionally adequate serialization machinery can be developed using TS alone PLO is convenient and should be more perspicuous; but this thread and several related predecessors strongly suggest that more often it is caviar to the general. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
jwgli...@gmail.com (John Gilmore) writes: Long-winded and ugly but functionally adequate serialization machinery can be developed using TS alone that was the argument that the POK favorite son operating system people used when attempt was made to add comapre-and-swap to 370. charlie had invented compare-and-swap while doing fine grain multiprocessing locking for cp67 at the cambridge science center. attempts to add it to 370 was met by opposition from the POK favorite son operating system (having used single TS spin-lock for entering os/360 360/65mp kernel). The owners of the 370 architecture then said in order to justify compare-and-swap for 370, non-multiprocessor use cases would be needed. thus was born the non-multiprocessor specific use cases that still appear in the principle of operations. TS use case was multiprocessor set a lock serizliation followed by some operation (critical section) followed by clearing the lock. compare-and-swap is single serialized non-interruptable, atomic operation. The compare-and-swap use cases include non-kernel, interruptable, multi-threaded (not necessarily multiprocessor) operation performing atomic serialized operations w/o the overhead of making kernel call to perform serialized operation. misc. past posts mentioning multiprocessing and/or compare-and-swap http://www.garlic.com/~lynn/subtopic.html#smp compare-and-swap was fairly quickly adopted by large multi-threaded applications like high throughput DBMS systems. compare-and-swap was so useful, numerous other hardware platforms adopted it also (or instructions with similar atomic semantics). ibm's 801/risc architecture was originally developed in the period around Future System (failed new machine architecture that was going to completely replace 360/370) ... and i've frequently claimed that there was a lot of 801 objectives to the extreme opposite of the FS complexity. One of the issues not to have cache consistency ... to avoid the enormous performance penalty paid by FS mutliprocessor (and even the extreme throughput penalty paid for 370 strong memory model multiprocessor cache consistency). misc. past posts mentioning future system http://www.garlic.com/~lynn/submain.html#futuresys No cache-consistency pretty much ruled out multiprocessor operation ... as well as philosophy of all instructions needed to complete in single cycle ... ruled out compare-and-swap instruction. however, the lack of compare-and-swap instruction ... put the RS/6000 at severe throughput disadvantage with open system RDBMS benchmarks compared to other platforms (open system RDBMS had fall-back to kernal call locking for few hardware platforms that didn't have compare-and-swap semantics). fairly early, compare-and-swap instruction emulation was added to the rs/6000 AIX system call FLIH ... within a couple instructions of entry to system call FLIH ... there was special case for compare-and-swap emulation that then immediately returned to application. While it wasn't useful for real multiprocessor operation, it did achieve the objective not being interruptable while emulation processing was in progress. misc. past posts mentioning 801/risc http://www.garlic.com/~lynn/subtopic.html#801 -- virtualization experience starting Jan1968, online at home since Mar1970 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I have often thought about that but I don't quite understand why it is needed. Would you give an example of a problem that is fixed by this method? Thanks! Lindy From: IBM Mainframe Discussion List [IBM-MAIN@LISTSERV.UA.EDU] on behalf of Anne Lynn Wheeler [l...@garlic.com] Sent: 19 November 2013 16:49 To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque charlie had invented compare-and-swap while doing fine grain multiprocessing locking for cp67 at the cambridge science center. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
It is always hard to make oneself understood in emotion-laden situations, but I did try. I prefaced my comment about the logical adequacy of TS with the words Long-winded and ugly . . . Perhaps an analogy will help. Boolean algebra is usually discussed using the binary operations disjunction (inclusive-or) and conjunction and the singulary operation negation. It is, however, possible to shown that one binary operation, either NOR or NAND suffices. This is intellectually interesting. It is not an argument for doing boolean algebra using only one imperspicuous operation. One instead introduces trivial definitions of disjunction and conjunction in terms of either NOR or NAND and then proceeds in the usual way. Analogously, both flavors of Compare and Swap and, now, PLO are very useful; and the fact that they are logically dispensable is NOT an argument for dispensing with them. Multiplication is dispensable too. It can be replaced by successive additions. This is not, however, an argument for eliminating multiply instructions. Small boys are likely to cut themselves when they start hacking about with Ockham's razor. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In CAE1XxDGQkTVXZVP3fD-4=jv1ufzxijwt8_dxmrvw+gra6tn...@mail.gmail.com, on 11/19/2013 at 07:29 AM, John Gilmore jwgli...@gmail.com said: Long-winded and ugly but functionally adequate serialization machinery can be developed using TS alone Turing Tar Pit: the situation where everything is possible but nothing is easy. I can't recall seeing any situation on a S/370 where CS and CDS weren't preferable to TS for serialization. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I've been having fun with this. Build an entire CPU, assembler, vm, operating system, etc from nothing but a Nand gate. I've gotten as far as building all the basic gates plus an ALU, all in a simplified HDL. CPU, RAM all that to follow. http://nand2tetris.org/ Lindy From: IBM Mainframe Discussion List [IBM-MAIN@LISTSERV.UA.EDU] on behalf of John Gilmore [jwgli...@gmail.com] Sent: 19 November 2013 19:11 To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Perhaps an analogy will help. Boolean algebra is usually discussed using the binary operations disjunction (inclusive-or) and conjunction and the singulary operation negation. It is, however, possible to shown that one binary operation, either NOR or NAND suffices. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11/10/2013 1:19 PM, Mark Zelden wrote: I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. ENQ/DEQ or Latch -- Edward E Jaffe Phoenix Software International, Inc 831 Parkview Drive North El Segundo, CA 90245 http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 14/11/2013 12:23 AM, Kenneth Wilkerson wrote: If I read the article you sent correctly, this algorithm is using a spin lock. It has provision for implementing a lock-free algorithm but none of those are detailed. Most of the shared_ptr implementations in the wild, including z/OS, use a lock-free algorithm invented by Alexander Terekhov (IBM). An atomic reference counted smart pointer is a nice tool to have in your bag. PLO has no restrictions other than the limits of my imagination. Or patents! I notice IBM have quite a few wrt PLO. This one for example http://www.google.com.mx/patents/US6636883. The US patent office seems to like patents related to multi-processing. Maybe they think it's novel. Software patents have a funky smell! However, in the last sentence of the performance section, the authors state Note that we make no claim that the atomic access approach unconditionally outperforms the others, merely that it may be best for certain scenarios that can be encountered in practice.. I think this is the key to lock-free implementations. I have generalized primitive functions to perform most of my PLO operations. But I design and tune each to the specific use. I understand the desire to provide generalized functionality for high level language use. However, I do not accept the premise that all lists are the same. And I would certainly use different algorithms for lists that I expected to get large and lists that have more than a small percentage of update operations. While it may not be true on z/OS, most software developers these days use high level languages for multi-threaded applications and prefer abstractions because they are easy to use and reason about. Of course, that doesn't mean that they shouldn't understand what's happening under the covers. The trick is keeping the optimizer out of the mix which is where inline assembler comes in handy. There are many high-quality (free) libraries for lock-free data structures that are relatively easy to port to z/OS. Using advanced language features it's quite simple to configure a highly granular concurrent queue by using policies. The difficult part is testing the bloody things! typedef mpmc_queue fixed_array, lock_free, smart_ptr multi_queue; typedef spsc_queue priority_list, mutex, unique_ptr blocking_queue; typedef thread_pool multi_queue, max_threads8 task_pool; socket_servertask_pool server; Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 11:45 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 13/11/2013 12:34 PM, Kenneth Wilkerson wrote: Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. The active chain count only changes on an add or delete. So if there are infrequent adds and deletes, there will be infrequent re-drives. And you are wrong, readers will not contend unless two or more tasks are referencing the exact same element simultaneously. And even then, the re-drive only involves the update to the use count. Thanks, I get it now. Maybe IBM should have used PLO for the z/OS C++ shared_ptr SMR algorithm which has both weak/strong reference counts + the pointer. They use a lock-free algorithm using CS http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2674.htm. shared_ptr implements proxy based garbage collection. There are a lot of optimization that I do not describe in this algorithm for simplicity. For example, when you do a PLO DCAS, the condition code is set to indicate which compare failed. This can be used to optimize the re-drive. There are many others optimizations with re-drives to make this more efficient. And if you get away from traditional lists, there are even more optimizations. Like what? What do you replace linked lists with, arrays with offsets instead of pointers? Honestly, I provided this algorithm after reading the paper on hazard pointers. The paper was written in 2002 and claimed there was no atomic DCAS when PLO DCAS became available in 2001. So I took a much simpler algorithm that I had and modified it to use a use count to accommodate traditional storage managers to prove that a PLO could be used to manage a conventional list using a traditional storage manager and provide a SMR algorithm without the need for DU level management structures. I don't use many conventional lists and I have a proprietary storage manager that does not use chains. Most of my PLO operations are much simpler. I would love to test this algorithm against any other SMR algorithm. My point has been and remains, that PLO can be efficiently used to serialize any list in a lock-free manner and even if it does take more CPU
Re: Serialization without Enque
I'm with you on patents. I came across an IBM patent yesterday that was dated 2009 describing a lock-free storage manager using cell pools to manage variable length storage. I invented my first one 30 years ago using CAS. It was writing a more sophisticated version 10 years ago that led to my research on PLO. I'm now on my fourth version of this storage manager, this one I wrote for me, and it's much more sophisticated than the patented algorithm with numerous more capabilities. Like music, every piece of software is based on what we've seen before. So my real question with software patents is how do you prove it's original? It's the chicken and the egg. Which came first? Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Thursday, November 14, 2013 6:40 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 14/11/2013 12:23 AM, Kenneth Wilkerson wrote: If I read the article you sent correctly, this algorithm is using a spin lock. It has provision for implementing a lock-free algorithm but none of those are detailed. Most of the shared_ptr implementations in the wild, including z/OS, use a lock-free algorithm invented by Alexander Terekhov (IBM). An atomic reference counted smart pointer is a nice tool to have in your bag. PLO has no restrictions other than the limits of my imagination. Or patents! I notice IBM have quite a few wrt PLO. This one for example http://www.google.com.mx/patents/US6636883. The US patent office seems to like patents related to multi-processing. Maybe they think it's novel. Software patents have a funky smell! However, in the last sentence of the performance section, the authors state Note that we make no claim that the atomic access approach unconditionally outperforms the others, merely that it may be best for certain scenarios that can be encountered in practice.. I think this is the key to lock-free implementations. I have generalized primitive functions to perform most of my PLO operations. But I design and tune each to the specific use. I understand the desire to provide generalized functionality for high level language use. However, I do not accept the premise that all lists are the same. And I would certainly use different algorithms for lists that I expected to get large and lists that have more than a small percentage of update operations. While it may not be true on z/OS, most software developers these days use high level languages for multi-threaded applications and prefer abstractions because they are easy to use and reason about. Of course, that doesn't mean that they shouldn't understand what's happening under the covers. The trick is keeping the optimizer out of the mix which is where inline assembler comes in handy. There are many high-quality (free) libraries for lock-free data structures that are relatively easy to port to z/OS. Using advanced language features it's quite simple to configure a highly granular concurrent queue by using policies. The difficult part is testing the bloody things! typedef mpmc_queue fixed_array, lock_free, smart_ptr multi_queue; typedef spsc_queue priority_list, mutex, unique_ptr blocking_queue; typedef thread_pool multi_queue, max_threads8 task_pool; socket_servertask_pool server; Kenneth -- -- -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 11:45 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 13/11/2013 12:34 PM, Kenneth Wilkerson wrote: Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. The active chain count only changes on an add or delete. So if there are infrequent adds and deletes, there will be infrequent re-drives. And you are wrong, readers will not contend unless two or more tasks are referencing the exact same element simultaneously. And even then, the re-drive only involves the update to the use count. Thanks, I get it now. Maybe IBM should have used PLO for the z/OS C++ shared_ptr SMR algorithm which has both weak/strong reference counts + the pointer. They use a lock-free algorithm using CS http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2674.htm. shared_ptr implements proxy based garbage collection. There are a lot of optimization that I do not describe in this algorithm for simplicity. For example, when you do a PLO DCAS, the condition code is set to indicate which compare failed. This can be used to optimize the re-drive. There are many others optimizations with re-drives to make this more efficient. And if you get away from traditional lists, there are even
Re: Serialization without Enque
http://www.techdirt.com/articles/20130723/09543223903/joel-spolsky-stackexchange-thwarts-broad-microsoft-patent-app-using-microsofts-own-prior-art.shtml Submit Prior Art to U.S. Patent office Appilications http://patents.stackexchange.com/ On Thu, Nov 14, 2013 at 7:19 AM, Kenneth Wilkerson redb...@austin.rr.com wrote: I'm with you on patents. I came across an IBM patent yesterday that was dated 2009 describing a lock-free storage manager using cell pools to manage variable length storage. I invented my first one 30 years ago using CAS. It was writing a more sophisticated version 10 years ago that led to my research on PLO. I'm now on my fourth version of this storage manager, this one I wrote for me, and it's much more sophisticated than the patented algorithm with numerous more capabilities. Like music, every piece of software is based on what we've seen before. So my real question with software patents is how do you prove it's original? It's the chicken and the egg. Which came first? Kenneth -- Mike A Schwab, Springfield IL USA Where do Forest Rangers go to get away from it all? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In 5284c4bc.5000...@gmail.com, on 11/14/2013 at 08:40 PM, David Crayford dcrayf...@gmail.com said: Or patents! I notice IBM have quite a few wrt PLO. Could they be defensive patents? The only really effective way to prevent someone else from patenting a technique that you're using is to file first. Unless and until the USPTO gets serious about excluding things that are obvious to a practitioner or are prior art, I expect to continue to see a flood of defensive patents, and not just from IBM. Note that even if the USPTO rejects a defensive patent as prior art, it will have served its purpose. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 14 November 2013 09:26, Shmuel Metz (Seymour J.) shmuel+ibm-m...@patriot.net wrote: at 08:40 PM, David Crayford dcrayf...@gmail.com said: Or patents! I notice IBM have quite a few wrt PLO. Could they be defensive patents? The only really effective way to prevent someone else from patenting a technique that you're using is to file first. If you don't want to patent, but also don't want others to patent your invention, you can publish in a place that gets reasonable exposure, such as an academic journal in the field. Or if you're big enough (like IBM), you can publish your own Technical Disclosure Journal just for this purpose. Of course if you want the additional benefit of having your competition not know of your publication, then you need to publish in the most obscure journal in the most obscure language that you can away with while still claiming that it's prior art. IIRC there was a case in the 1980s or 90s where a US company published in Chinese in the Acta Chimica Sinica, which the USPTO then claimed wasn't likely enough to ever be seen by US researchers. Subsequently Microsoft and other software companies have done the same, but since then it's become more reasonable to expect that Chinese journals will be noticed. Meanwhile, IBM has discontinued its TDJ, perhaps for other reasons, but perhaps because they're being more aggressive with patents as a matter of policy. Unless and until the USPTO gets serious about excluding things that are obvious to a practitioner or are prior art, I expect to continue to see a flood of defensive patents, and not just from IBM. IBM has patented a number of new instructions straight out of the Principles of Operation, with no attempt to make the claims general. Presumably this is just to make life harder for would-be suppliers of competitive hardware. For example Move with Optional Specifications got US patent 20070277014, although the claims are just a particular application of the age-old notion of calling a subroutine with arguments and doing different things based on those arguments. I can't imagine it standing up to a challenge, but that costs a lot of money, and when you have hundreds of instructions to patent... Note that even if the USPTO rejects a defensive patent as prior art, it will have served its purpose. Indeed, and it may perhaps be cheaper than other avenues. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In CAArMM9S8B0_nTv3XiVPv1JLtw2=HfMH5m7sNCFFo6Kd4=mAA=g...@mail.gmail.com, on 11/14/2013 at 03:26 PM, Tony Harminc t...@harminc.net said: If you don't want to patent, but also don't want others to patent your invention, you can publish in a place that gets reasonable exposure, The point of a defensive patent is to avoid expensive litigation, not to win it. Unless the USPTO searches the journal as part of processing patent applications, it just doesn't have the same practical effect as filing for a patent. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In 001801cee029$af0d2bc0$0d278340$@austin.rr.com, on 11/12/2013 at 10:34 PM, Kenneth Wilkerson redb...@austin.rr.com said: Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. What happens if the is an intervening add and also an intervening remove, leaving no net change in the active chain count even though the chain itself has changed? -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
The active count is incremented for every add and delete. It is never decremented so any update would result in a change. In my actual algorithms, all my lists are in shared or common memory objects so all the pointers are 64 bit and I use +2 variations on the PLO. In this case, I use a counter with 2 full words on a double word boundary. The first full word is the change count and it's always incremented. The second full word is element count and it's is incremented for each add and decremented for each deleted. I load the counter with a LG and then use ALG and AL or SL to manipulate the high or low word. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Shmuel Metz (Seymour J.) Sent: Wednesday, November 13, 2013 7:29 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque In 001801cee029$af0d2bc0$0d278340$@austin.rr.com, on 11/12/2013 at 10:34 PM, Kenneth Wilkerson redb...@austin.rr.com said: Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. What happens if the is an intervening add and also an intervening remove, leaving no net change in the active chain count even though the chain itself has changed? -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
If I read the article you sent correctly, this algorithm is using a spin lock. It has provision for implementing a lock-free algorithm but none of those are detailed. There are certainly cases where spin-locks work very effectively particularly if the thread finds a lock is unavailable and it voluntarily relinquishes control. I try to avoid spin-locks because if they are performed in a high priority application they can cause detrimental system effects. The applications I write are system level and can be executed from anywhere meaning that applications that use it may be in a locked stated, in an SRB, a high priority, etc thus meaning that it behooves me to only use hardware provided methods for serialization. This is the primary reason I use PLO. PLO has no restrictions other than the limits of my imagination. However, in the last sentence of the performance section, the authors state Note that we make no claim that the atomic access approach unconditionally outperforms the others, merely that it may be best for certain scenarios that can be encountered in practice.. I think this is the key to lock-free implementations. I have generalized primitive functions to perform most of my PLO operations. But I design and tune each to the specific use. I understand the desire to provide generalized functionality for high level language use. However, I do not accept the premise that all lists are the same. And I would certainly use different algorithms for lists that I expected to get large and lists that have more than a small percentage of update operations. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 11:45 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 13/11/2013 12:34 PM, Kenneth Wilkerson wrote: Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. The active chain count only changes on an add or delete. So if there are infrequent adds and deletes, there will be infrequent re-drives. And you are wrong, readers will not contend unless two or more tasks are referencing the exact same element simultaneously. And even then, the re-drive only involves the update to the use count. Thanks, I get it now. Maybe IBM should have used PLO for the z/OS C++ shared_ptr SMR algorithm which has both weak/strong reference counts + the pointer. They use a lock-free algorithm using CS http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2674.htm. shared_ptr implements proxy based garbage collection. There are a lot of optimization that I do not describe in this algorithm for simplicity. For example, when you do a PLO DCAS, the condition code is set to indicate which compare failed. This can be used to optimize the re-drive. There are many others optimizations with re-drives to make this more efficient. And if you get away from traditional lists, there are even more optimizations. Like what? What do you replace linked lists with, arrays with offsets instead of pointers? Honestly, I provided this algorithm after reading the paper on hazard pointers. The paper was written in 2002 and claimed there was no atomic DCAS when PLO DCAS became available in 2001. So I took a much simpler algorithm that I had and modified it to use a use count to accommodate traditional storage managers to prove that a PLO could be used to manage a conventional list using a traditional storage manager and provide a SMR algorithm without the need for DU level management structures. I don't use many conventional lists and I have a proprietary storage manager that does not use chains. Most of my PLO operations are much simpler. I would love to test this algorithm against any other SMR algorithm. My point has been and remains, that PLO can be efficiently used to serialize any list in a lock-free manner and even if it does take more CPU this will be offset by increased throughput. And just because UNIX has issues with PLO doesn't mean the issue is with PLO... UNIX doesn't have an issue with PLO. It clearly states that popping/pushing elements at the beginning of the queue is a good performer. Surely your algorithm would have the same problem if multiple producers/consumers were inserting/removing elements from the middle of a long list. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 8:39 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for sharing your design Ken. It seems to me that PLO is best used for data structures like double-ended queues where elements can be inserted/removed
Re: Serialization without Enque
Let's begin with queue-handling 101. A queue is a FIFO list. One adds elements to the rear of a queue and removes them from the front. Tony Harminc's implicit point, that these two operations can be serialized and done unproblematically, is correct beyond argument. There is never a requirement to use fallible, probabilistic methods to do so. This said, it is all but certain that Jon Perryman is trying to address OTHER problems too. It is clear from what he has said that he must sometimes find and remove an element that is not necessarily at the front of his queue. How is such an element identified? By address? By key? By a sequence number of timestamp? What other kinds of processing does he need to do? Does he, for example, sometimes need to insert an element at some position other than the rear of this queue? [Moving elements in the queue is equivalent to deleting them from one position and reinserting them elsewhere.] Once his functional requirements are understood control blocks and non-probabilistic code to meet them will be easy enough to devise. The appropriate use of pointers to pointers, for example, greatly simplifies many queue-management problems, making it unnecessary to distinguish removal operations positionally. One need not care whether one is removing an element at the front, rear, or in some intermediate position in a queue. Moreover, one should not. In addition to being inelegant/messy special casing is error-prone. We thus need more information if we are to be helpful. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I use cell pools. I also use a proprietary storage manager that doesn't use chains. These methodology offer me capabilities well beyond those found in traditional methods. Much of what I do is based on these capabilities, but the algorithms could easily be adapted to use a conventional storage manager that uses chains. Here is an algorithm I use that I've adopted for traditional storage management. This algorithm will work for any list; LIFO, FIFO, ordered or other , and for deletion from the head, middle or tail. Setup: Allocate a word in each element. Bit 0 is one for all active elements. Bit 1 is one for all elements pending deletion. Bit 2 is `reserved for an extension to this algorithm (such as garbage cleanup). The remaining bits are a use count allowing for many more DUs than are supported by MVS. Note.: When PLO Compare and Swap (CS) or Double Compare and Swap (DCAS) is referenced, the PLO uses the use count address as the lock word. This will serialize all updates to the use counter for that element. For the PLO Compare and Loads or PLO update, the lock word is the active chain counter. To search: Step 1: Use the PLO Compare and Load on the active chain counter to search the chain as before. If the PLO fails, re-drive the search. Step 2: Before examining the element, increment the use count with a PLO Double Compare and Swap (DCAS). Load the first register pair with the current chain counter. The swap value will also be the current chain counter. Essentially, were using the active chain count to serialize increments to the use count to avoid accessing an area that may have been released. The second register pair will contain the current use count with a swap value incremented by 1 using an AL to avoid resetting the high bit. If the PLO DCAS fails, the previous PLO Compare and Load (Step 1) should be re-driven. Step 3: Use the PLO Compare and Load for the next element. Save the PLO status and decrement the use count with a SL using PLO CS. We don't need a DCAS because the use count is not 0 and this element can't be deleted. If this PLO CS fails, re-drive it. If PLO Compare and Load status is re-drive, then before re-driving the search, check the use count (in the register used to update it). If Bit 1 (pending delete) is set and the use count is 0, this task can release it. To delete: (this assumes the deleting task has already updated the use count in the process of finding the element to delete) Step1: Use a PLO update function to remove the element from the active chain to avoid any future references. Step2: If the PLO update to remove the element fails, decrement the use count but do NOT set bit 1 (pending delete) using a PLO CS. If the PLO CS fails, re-drive it . Step 3: If the PLO update to remove the element succeeds, decrement the use count, SET bit 1 (pending delete) and RESET Bit 0 (active bit) using a PL CS. If the PLO CS fails, re-drive it . Step 4: Whether the PLO update succeeded or failed, check the use count in the register used to update it: If bit 1 (pending delete) is set and the use count is not 0, then this task should exit . If bit 1 (pending delete) is set and the use count is 0, then this task can release it. Otherwise, this task must re-drive the search to find the element to be deleted You can work out the various scenarios yourself. But because the count is incremented/decremented after a PLO Compare and Load or update, the status of the PLO provides a decision point on whether an element may have been deleted. Using the use count address as the lock word insures that all use count updates for a specific use count occur serially. There are numerous extensions to this algorithm that are more than I want to describe. Things like adding pending deletes to a delete chain or having an asynchronous, periodic garbage collection task handle the deletes. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 11, 2013 9:38 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Could you provide an insight to a design that would handle the situation where an SSI program with non-trivial workload uses a very large list. This list is normally semi-static but there can be periods of time where entries are heavily deleted and added. Not freeing storage is an option because it could be a significant amount of storage. Thanks, Jon Perryman. From: Tony Harminc t...@harminc.net To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 11, 2013 7:07 PM Subject: Re: Serialization without Enque On 11 November 2013 20:15, Jon Perryman jperr...@pacbell.net wrote: L R2,QUEUE L R3,NEXT_ENTRY CS R2,R3,QUEUE New queue head While this seems bullet proof, it's not. If there is a long
Re: Serialization without Enque
In 1384218943.19861.yahoomail...@web181001.mail.ne1.yahoo.com, on 11/11/2013 at 05:15 PM, Jon Perryman jperr...@pacbell.net said: Take for example: L R2,QUEUE L R3,NEXT_ENTRY CS R2,R3,QUEUE New queue head I'm not sure what you're trying to do. Did you mean this? LOOP L R2,QUEUE L R3,NEXT_ENTRY STR2,0(,R3) Forward link CS R2,R3,QUEUE New queue head BNE LOOP While this seems bullet proof, it's not. If there is a long delay between between the L instructions then next entry could have been freed causing an abend. It won't cause an ABEND in the code that you showed, and it is just as serious an error to free it *after* you add it to the queue. Or did you mean this? LOOP L R2,QUEUE USING ENTRY,R2 L R3,NEXT_ENTRY DROP R2 CS R2,R3,QUEUE New queue head BNE LOOP As with adding an entry, freeing the storage inappropriately is an error when the entry is still on the queue, not just during the CS loop. You need to think about the small details. Not just the serialization. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Thanks Kenneth for the explanation. It's really good stuff. All the serialization that I've seen was prior to the PLO instruction. I now see that it has certainly improved our ability to serialize. Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com I use cell pools. I also use a proprietary storage manager that doesn't use chains. These methodology offer me capabilities well beyond those found in traditional methods. Much of what I do is based on these capabilities, but the algorithms could easily be adapted to use a conventional storage manager that uses chains. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
The example was simply to show a point without using an involved sample. At a minimum, I should have included a USING for the queue entry. Sorry for the confusion. Jon Perryman. From: Shmuel Metz (Seymour J.) shmuel+ibm-m...@patriot.net In 1384218943.19861.yahoomail...@web181001.mail.ne1.yahoo.com, on 11/11/2013 at 05:15 PM, Jon Perryman jperr...@pacbell.net said: Take for example: L R2,QUEUE L R3,NEXT_ENTRY CS R2,R3,QUEUE New queue head I'm not sure what you're trying to do. Did you mean this? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
[mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 11, 2013 9:38 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Could you provide an insight to a design that would handle the situation where an SSI program with non-trivial workload uses a very large list. This list is normally semi-static but there can be periods of time where entries are heavily deleted and added. Not freeing storage is an option because it could be a significant amount of storage. Thanks, Jon Perryman. From: Tony Harminc t...@harminc.net To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 11, 2013 7:07 PM Subject: Re: Serialization without Enque On 11 November 2013 20:15, Jon Perryman jperr...@pacbell.net wrote: LR2,QUEUE LR3,NEXT_ENTRY CS R2,R3,QUEUENew queue head While this seems bullet proof, it's not. If there is a long delay between between the L instructions then next entry could have been freed causing an abend. If your code is referencing an item that may have been freed, then your design is wrong. While the delay is very unlikely, it is still a possiblity (e.g. swapout after first L). This example is just removing the first entry. If you try to remove an entry in the middle of the chain, you run a larger risk. Some of the techniques we use to minimize risk: 1. Abend recovery 2. don't overload the entry - only contains shared information. 3. Place entry on a timeout queue before placing on a free queue or freeing. 4. Copy the entry to workarea - reduces reference time of actual entry 5. Eyecatchers that can be verified. 6. Unique identifier for each entry that can be verified. 7. Turn off interrupts. Most of these are indicative of incorrect design rather than of careful attention to risk. If it's unclear whether a queue entry is eligible to be worked on or even to be looked at unless you do it in a hurry, then there is something quite wrong. By all means provide abend recovery, but don't make it part of the queue handling design. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. The active chain count only changes on an add or delete. So if there are infrequent adds and deletes, there will be infrequent re-drives. And you are wrong, readers will not contend unless two or more tasks are referencing the exact same element simultaneously. And even then, the re-drive only involves the update to the use count. There are a lot of optimization that I do not describe in this algorithm for simplicity. For example, when you do a PLO DCAS, the condition code is set to indicate which compare failed. This can be used to optimize the re-drive. There are many others optimizations with re-drives to make this more efficient. And if you get away from traditional lists, there are even more optimizations. Honestly, I provided this algorithm after reading the paper on hazard pointers. The paper was written in 2002 and claimed there was no atomic DCAS when PLO DCAS became available in 2001. So I took a much simpler algorithm that I had and modified it to use a use count to accommodate traditional storage managers to prove that a PLO could be used to manage a conventional list using a traditional storage manager and provide a SMR algorithm without the need for DU level management structures. I don't use many conventional lists and I have a proprietary storage manager that does not use chains. Most of my PLO operations are much simpler. I would love to test this algorithm against any other SMR algorithm. My point has been and remains, that PLO can be efficiently used to serialize any list in a lock-free manner and even if it does take more CPU this will be offset by increased throughput. And just because UNIX has issues with PLO doesn't mean the issue is with PLO... Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 8:39 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for sharing your design Ken. It seems to me that PLO is best used for data structures like double-ended queues where elements can be inserted/removed from both ends of the queue atomically. In the case of a read-often-write-rarely list with multiple readers that traverse the list it doesn't seem optimal. Correct me if I'm wrong but readers will contend with each other. FYI, z/OS UNIX message queues can be configured to use PLO (with fallback to a latch). The documentation states that inserting/removing from the middle of a long list using PLO is a poor performer http://pic.dhe.ibm.com/infocenter/zos/v1r12/topic/com.ibm.zos.r12.bpxb100/qc t.htm#qct. There isn't a one-size-fits-all solution. It depends on the usage patterns. Hopefully HTM will solve that. Depending on how Intels Haswell TSX implementation performs in the wild we could see HTM in our cell phones as early as next year. On 12/11/2013 11:18 PM, Kenneth Wilkerson wrote: I use cell pools. I also use a proprietary storage manager that doesn't use chains. These methodology offer me capabilities well beyond those found in traditional methods. Much of what I do is based on these capabilities, but the algorithms could easily be adapted to use a conventional storage manager that uses chains. Here is an algorithm I use that I've adopted for traditional storage management. This algorithm will work for any list; LIFO, FIFO, ordered or other , and for deletion from the head, middle or tail. Setup: Allocate a word in each element. Bit 0 is one for all active elements. Bit 1 is one for all elements pending deletion. Bit 2 is `reserved for an extension to this algorithm (such as garbage cleanup). The remaining bits are a use count allowing for many more DUs than are supported by MVS. Note.: When PLO Compare and Swap (CS) or Double Compare and Swap (DCAS) is referenced, the PLO uses the use count address as the lock word. This will serialize all updates to the use counter for that element. For the PLO Compare and Loads or PLO update, the lock word is the active chain counter. To search: Step 1: Use the PLO Compare and Load on the active chain counter to search the chain as before. If the PLO fails, re-drive the search. Step 2: Before examining the element, increment the use count with a PLO Double Compare and Swap (DCAS). Load the first register pair with the current chain counter. The swap value will also be the current chain counter. Essentially, we're using the active chain count to serialize increments to the use count to avoid accessing an area that may have been released. The second register pair will contain the current use count with a swap value incremented by 1 using an AL to avoid resetting the high bit. If the PLO DCAS fails, the previous PLO Compare and Load (Step 1) should be re-driven
Re: Serialization without Enque
On 13/11/2013 12:34 PM, Kenneth Wilkerson wrote: Actually, the algorithm performs well for read-often, write-rarely list because the active chain count does not change and therefore there are relatively infrequent re-drives. The active chain count only changes on an add or delete. So if there are infrequent adds and deletes, there will be infrequent re-drives. And you are wrong, readers will not contend unless two or more tasks are referencing the exact same element simultaneously. And even then, the re-drive only involves the update to the use count. Thanks, I get it now. Maybe IBM should have used PLO for the z/OS C++ shared_ptr SMR algorithm which has both weak/strong reference counts + the pointer. They use a lock-free algorithm using CS http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2674.htm. shared_ptr implements proxy based garbage collection. There are a lot of optimization that I do not describe in this algorithm for simplicity. For example, when you do a PLO DCAS, the condition code is set to indicate which compare failed. This can be used to optimize the re-drive. There are many others optimizations with re-drives to make this more efficient. And if you get away from traditional lists, there are even more optimizations. Like what? What do you replace linked lists with, arrays with offsets instead of pointers? Honestly, I provided this algorithm after reading the paper on hazard pointers. The paper was written in 2002 and claimed there was no atomic DCAS when PLO DCAS became available in 2001. So I took a much simpler algorithm that I had and modified it to use a use count to accommodate traditional storage managers to prove that a PLO could be used to manage a conventional list using a traditional storage manager and provide a SMR algorithm without the need for DU level management structures. I don't use many conventional lists and I have a proprietary storage manager that does not use chains. Most of my PLO operations are much simpler. I would love to test this algorithm against any other SMR algorithm. My point has been and remains, that PLO can be efficiently used to serialize any list in a lock-free manner and even if it does take more CPU this will be offset by increased throughput. And just because UNIX has issues with PLO doesn't mean the issue is with PLO... UNIX doesn't have an issue with PLO. It clearly states that popping/pushing elements at the beginning of the queue is a good performer. Surely your algorithm would have the same problem if multiple producers/consumers were inserting/removing elements from the middle of a long list. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, November 12, 2013 8:39 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for sharing your design Ken. It seems to me that PLO is best used for data structures like double-ended queues where elements can be inserted/removed from both ends of the queue atomically. In the case of a read-often-write-rarely list with multiple readers that traverse the list it doesn't seem optimal. Correct me if I'm wrong but readers will contend with each other. FYI, z/OS UNIX message queues can be configured to use PLO (with fallback to a latch). The documentation states that inserting/removing from the middle of a long list using PLO is a poor performer http://pic.dhe.ibm.com/infocenter/zos/v1r12/topic/com.ibm.zos.r12.bpxb100/qc t.htm#qct. There isn't a one-size-fits-all solution. It depends on the usage patterns. Hopefully HTM will solve that. Depending on how Intels Haswell TSX implementation performs in the wild we could see HTM in our cell phones as early as next year. On 12/11/2013 11:18 PM, Kenneth Wilkerson wrote: I use cell pools. I also use a proprietary storage manager that doesn't use chains. These methodology offer me capabilities well beyond those found in traditional methods. Much of what I do is based on these capabilities, but the algorithms could easily be adapted to use a conventional storage manager that uses chains. Here is an algorithm I use that I've adopted for traditional storage management. This algorithm will work for any list; LIFO, FIFO, ordered or other , and for deletion from the head, middle or tail. Setup: Allocate a word in each element. Bit 0 is one for all active elements. Bit 1 is one for all elements pending deletion. Bit 2 is `reserved for an extension to this algorithm (such as garbage cleanup). The remaining bits are a use count allowing for many more DUs than are supported by MVS. Note.: When PLO Compare and Swap (CS) or Double Compare and Swap (DCAS) is referenced, the PLO uses the use count address as the lock word. This will serialize all updates to the use counter for that element. For the PLO Compare and Loads or PLO update, the lock word is the active chain counter. To search: Step 1: Use
Re: Serialization without Enque
What are bakery style ticketing algorithms? Take-a-number schemes? Googling bakery style landed me among recipes/receipts for chocolate-chip cookies and cupcakes with pink icing. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
My control blocks are actually a queue. So it my understanding from reading these updates (Thank You!) that all I really need is autonomic functions (CS or PLO) to update the queue. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
suggests that Hardware Transaction Memory may not be the panacea we all expect it to be, and in some cases may actually increase CPU Of course it's true that if a transaction experiences too much contention and resorts to its fallback path, you have used more CPU than if you went directly to the fallback path. That is specifically why every use of non-constrained transactions ought to do analysis to determine if it is even theoretically beneficial. What I don't see mentioned in the article is zEC12's constrained transactions. By their very definition they need no fallback path. That is a huge benefit both in terms of complexity and development/test cost. In PLO, the hardware locking occurs according to the lock word. You seem to be assuming that PLO implementation actually is truly locked according to the individual lock word. Maybe it is now. It definitely did not used to be. The machine would decide how to map the individually-specified lock word to (limited) hardware resources that were the true serialization mechanism. It was not necessarily one to one. Transaction Memory sounds exciting but it's complex. IBM should put a layer of abstraction on top with simple semantics. Maybe it's me, but I don't really find TBEGIN...TEND complex compared to other serializing techniques even when you factor in PPI while counting the number of attempts before taking the fallback path. The instructions within a transaction are typically less complex than the instructions you would need without a transaction, if you could even accomplish what you're trying to do outside of a transaction. For example, there is no need for CS, PLO. Just more straightforward compare, store, etc. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In 8451960728331713.wa.markmzelden@listserv.ua.edu, on 11/10/2013 at 03:19 PM, Mark Zelden m...@mzelden.com said: I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. CS. In some cases PLO can be more natural. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11 Nov 2013, at 8:22 pm, John Gilmore jwgli...@gmail.com wrote: What are bakery style ticketing algorithms? Take-a-number schemes? http://en.m.wikipedia.org/wiki/Lamport's_bakery_algorithm Googling bakery style landed me among recipes/receipts for chocolate-chip cookies and cupcakes with pink icing. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Small problem with the url: the bold, blue, underlining does not extend to the end of the url. Copy the whole character string and past into your web browser instead of simply clicking on the bold, blue, underlined part of the url. Bill Fairchild Franklinm, TN - Original Message - From: David Crayford dcrayf...@gmail.com To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 11, 2013 7:22:38 AM Subject: Re: Serialization without Enque On 11 Nov 2013, at 8:22 pm, John Gilmore jwgli...@gmail.com wrote: What are bakery style ticketing algorithms? Take-a-number schemes? http://en.m.wikipedia.org/wiki/Lamport's_bakery_algorithm Googling bakery style landed me among recipes/receipts for chocolate-chip cookies and cupcakes with pink icing. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Can you guys tell me what you consider the appropriate / safe way to enqueue-de queue a resource being shared Scott ford www.identityforge.com from my IPAD 'Infinite wisdom through infinite means' On Nov 11, 2013, at 8:48 AM, DASDBILL2 dasdbi...@comcast.net wrote: Small problem with the url: the bold, blue, underlining does not extend to the end of the url. Copy the whole character string and past into your web browser instead of simply clicking on the bold, blue, underlined part of the url. Bill Fairchild Franklinm, TN - Original Message - From: David Crayford dcrayf...@gmail.com To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 11, 2013 7:22:38 AM Subject: Re: Serialization without Enque On 11 Nov 2013, at 8:22 pm, John Gilmore jwgli...@gmail.com wrote: What are bakery style ticketing algorithms? Take-a-number schemes? http://en.m.wikipedia.org/wiki/Lamport's_bakery_algorithm Googling bakery style landed me among recipes/receipts for chocolate-chip cookies and cupcakes with pink icing. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11/11/2013 9:48 PM, DASDBILL2 wrote: Small problem with the url: the bold, blue, underlining does not extend to the end of the url. Copy the whole character string and past into your web browser instead of simply clicking on the bold, blue, underlined part of the url. I copied that link in my iPhone. Here's the original. http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm Bill Fairchild Franklinm, TN - Original Message - From: David Crayford dcrayf...@gmail.com To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 11, 2013 7:22:38 AM Subject: Re: Serialization without Enque On 11 Nov 2013, at 8:22 pm, John Gilmore jwgli...@gmail.com wrote: What are bakery style ticketing algorithms? Take-a-number schemes? http://en.m.wikipedia.org/wiki/Lamport's_bakery_algorithm Googling bakery style landed me among recipes/receipts for chocolate-chip cookies and cupcakes with pink icing. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I read the article. It's still based on a CAS which I don't necessarily consider simpler than PLO . In this article, it states Each hazard pointer can be written only by its associated thread, but can be read by all threads. This is exactly the provision I state in my example. It's a key element of any serialization algorithm that allows concurrent lock-free operations. Hazard pointers seem like an elegant general solution. I'll have to explore it but I've not had issues with any of the solutions I've written using other methods. I write a solution appropriate to the usage of the serialized resource. The key is a rigid protocol that has to always be followed. In my e-mail last night I meant transactional execution facility not transactional event facility. I've thoroughly read the POM on this facility but have not to date had access to a processor to experiment with it. From my reading, the transactional execution facility (TM) , appears to be a way to implement almost all requirements for concurrent read/write access to a serialized resource. Essentially, from my understanding of the POM, it's a CAS on as many operands as necessary within a hardware defined limit. In my very complex application, there are a couple scenarios where I cannot use PLO to serialize access. I have no problem using multi-step PLO operations for serialization as long as the integrity of the resource is guaranteed after each step. For example, a delete that consists of a PLO to remove from the active chain and a separate PLO to add the removed element to a free chain. However, some operations are too complex for a PLO CAS and triple store even if the operands are organized in storage such that you can modify 128 bits at a time. In these cases, I use a gate word and a spin lock. When available, the gate is 0. When in use the gate contains identification information for the gate owner. I very rarely have to use these. And if I had a TM like transactional execution facility, I would replace this spin lock with this facility. Normally, there are only a handful of instructions within the gate so this has never caused me any problems. In a senses, all methods, LL/CS, TM, etc. are spin locks. If they don't succeed, try, try again. All methods that I know of, the hardware must perform a memory serialization function. I use PLO instead of CS not only because of the increased functionality such as modifying noncontiguous areas and being able to modify up to 4 128 bit areas but also because I believe the PLO lock word is an advantage. In all hardware serialization methods that I know of, a memory serialization function is required during the LL/CS or TM. These serializing functions can be expensive. CS is not granular and the serialization proceeds without regard to accesses by other CPUs meaning the overhead occurs whether the function succeeds or fails. The PLO lock word gates access by all processors using the same lock word thus reducing the total number of serializations performed by stopping a processor performing a PLO using the same lock word. This requires careful selection of lock words and use of the same lock word by all processes that read/write to the same resource. This advantage can be negated in a queue that has a substantial percentage of write operations compared to read operations because write operations will necessarily result in a PLO failure. I believe this is the disadvantage referred to in your first paper on TM by Paul McKenney. I suspect that since IBM is using TM to replace a lock and that in most cases the lock was used to serialize storage alterations. In this case, CPU would increase but so would throughput. I believe this is a classic example of trading the less expensive CPU resource for the more expensive throughput. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Sunday, November 10, 2013 8:56 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 11/11/2013 10:36 AM, Kenneth Wilkerson wrote: I read the article. This article is about transactional event facility introduced in z/EC-12 and not PLO which is an LL/CS. I wish I had access to a z/EC-12 with the transactional event facility to play with it and compare it to PLO. The transactional event facility is much more comprehensive and not as granular as a PLO. In PLO, the hardware locking occurs according to the lock word. Transaction Memory sounds exciting but it's complex. IBM should put a layer of abstraction on top with simple semantics. I've done a lot of testing with PLO. It can increase CPU, particularly in a situation where updates are much higher percentage of the operations. But in all applications that I've tested, it's CPU overhead is offset by higher throughput. In a traditional locking method, tasks end up serializing to the lock. There are lots
Re: Serialization without Enque
In PLO, the hardware locking occurs according to the lock word. The POM is not specific about the lock word other than a transformation occurs to generate a PLT logical address used to acquire a lock. However, this does not affect its application. The key point is that 2 or more processors executing PLO simultaneously can access or alter the values using a PLO instruction and the operation will occur as if one PLO followed the other. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Monday, November 11, 2013 7:09 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque suggests that Hardware Transaction Memory may not be the panacea we all expect it to be, and in some cases may actually increase CPU Of course it's true that if a transaction experiences too much contention and resorts to its fallback path, you have used more CPU than if you went directly to the fallback path. That is specifically why every use of non-constrained transactions ought to do analysis to determine if it is even theoretically beneficial. What I don't see mentioned in the article is zEC12's constrained transactions. By their very definition they need no fallback path. That is a huge benefit both in terms of complexity and development/test cost. In PLO, the hardware locking occurs according to the lock word. You seem to be assuming that PLO implementation actually is truly locked according to the individual lock word. Maybe it is now. It definitely did not used to be. The machine would decide how to map the individually-specified lock word to (limited) hardware resources that were the true serialization mechanism. It was not necessarily one to one. Transaction Memory sounds exciting but it's complex. IBM should put a layer of abstraction on top with simple semantics. Maybe it's me, but I don't really find TBEGIN...TEND complex compared to other serializing techniques even when you factor in PPI while counting the number of attempts before taking the fallback path. The instructions within a transaction are typically less complex than the instructions you would need without a transaction, if you could even accomplish what you're trying to do outside of a transaction. For example, there is no need for CS, PLO. Just more straightforward compare, store, etc. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Appropriate / safe is in the eye of the beholder. I personally see serialization as reducing the risks as much as possible. To eliminate the risk, you would have to orphan the entry. There were recommendations of moving to a free chain and reusing them. This has risks too because you simply don't know when everyone has finished with it. Only maintaining a use count would ensure it's no longer in use but that has risks too. We do what we have must to get the job done so we make the exposure as small as we possibly can. Take for example: L R2,QUEUE L R3,NEXT_ENTRY CS R2,R3,QUEUE New queue head While this seems bullet proof, it's not. If there is a long delay between between the L instructions then next entry could have been freed causing an abend. While the delay is very unlikely, it is still a possiblity (e.g. swapout after first L). This example is just removing the first entry. If you try to remove an entry in the middle of the chain, you run a larger risk. Some of the techniques we use to minimize risk: 1. Abend recovery 2. don't overload the entry - only contains shared information. 3. Place entry on a timeout queue before placing on a free queue or freeing. 4. Copy the entry to workarea - reduces reference time of actual entry 5. Eyecatchers that can be verified. 6. Unique identifier for each entry that can be verified. 7. Turn off interrupts. This is not a complete list. It's only meant to give you an idea of how complex the process can be. You need to think about the small details. Even the support staff is considered a risk because they may not fully understand the serialization requirements you implement. Jon Perryman. From: Scott Ford scott_j_f...@yahoo.com Can you guys tell me what you consider the appropriate / safe way to enqueue-de queue a resource being shared -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11/11/2013 9:09 PM, Peter Relson wrote: Maybe it's me, but I don't really find TBEGIN...TEND complex compared to other serializing techniques even when you factor in PPI while counting the number of attempts before taking the fallback path. The instructions within a transaction are typically less complex than the instructions you would need without a transaction, if you could even accomplish what you're trying to do outside of a transaction. For example, there is no need for CS, PLO. Just more straightforward compare, store, etc. I agree, it's certainly much easier but it's still not *simple*. It requires several pages in PoPs to describe it, there's a parameter list, diagnostic block etc. That's not rocket science but I would prefer a compiler or libraries to do the hard work. Maybe I'm asking for too much but there is already a draft specification to make transactions quite simple from high-level languages http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3341.pdf. Some compilers have supported experimental constructs for transactional memory for a while http://gcc.gnu.org/wiki/TransactionalMemory*. *The IBM C/C++ compiler already supports HTM instruction primitives as built-ins. I'm looking forward to when they implement the standard (Intel have already have an experimental implementation of the draft standard for Haswell). -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11 November 2013 20:15, Jon Perryman jperr...@pacbell.net wrote: LR2,QUEUE LR3,NEXT_ENTRY CS R2,R3,QUEUENew queue head While this seems bullet proof, it's not. If there is a long delay between between the L instructions then next entry could have been freed causing an abend. If your code is referencing an item that may have been freed, then your design is wrong. While the delay is very unlikely, it is still a possiblity (e.g. swapout after first L). This example is just removing the first entry. If you try to remove an entry in the middle of the chain, you run a larger risk. Some of the techniques we use to minimize risk: 1. Abend recovery 2. don't overload the entry - only contains shared information. 3. Place entry on a timeout queue before placing on a free queue or freeing. 4. Copy the entry to workarea - reduces reference time of actual entry 5. Eyecatchers that can be verified. 6. Unique identifier for each entry that can be verified. 7. Turn off interrupts. Most of these are indicative of incorrect design rather than of careful attention to risk. If it's unclear whether a queue entry is eligible to be worked on or even to be looked at unless you do it in a hurry, then there is something quite wrong. By all means provide abend recovery, but don't make it part of the queue handling design. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Could you provide an insight to a design that would handle the situation where an SSI program with non-trivial workload uses a very large list. This list is normally semi-static but there can be periods of time where entries are heavily deleted and added. Not freeing storage is an option because it could be a significant amount of storage. Thanks, Jon Perryman. From: Tony Harminc t...@harminc.net To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 11, 2013 7:07 PM Subject: Re: Serialization without Enque On 11 November 2013 20:15, Jon Perryman jperr...@pacbell.net wrote: L R2,QUEUE L R3,NEXT_ENTRY CS R2,R3,QUEUE New queue head While this seems bullet proof, it's not. If there is a long delay between between the L instructions then next entry could have been freed causing an abend. If your code is referencing an item that may have been freed, then your design is wrong. While the delay is very unlikely, it is still a possiblity (e.g. swapout after first L). This example is just removing the first entry. If you try to remove an entry in the middle of the chain, you run a larger risk. Some of the techniques we use to minimize risk: 1. Abend recovery 2. don't overload the entry - only contains shared information. 3. Place entry on a timeout queue before placing on a free queue or freeing. 4. Copy the entry to workarea - reduces reference time of actual entry 5. Eyecatchers that can be verified. 6. Unique identifier for each entry that can be verified. 7. Turn off interrupts. Most of these are indicative of incorrect design rather than of careful attention to risk. If it's unclear whether a queue entry is eligible to be worked on or even to be looked at unless you do it in a hurry, then there is something quite wrong. By all means provide abend recovery, but don't make it part of the queue handling design. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In caarmm9qm+goqco6p3nef5jsy1miorusy_c8ajjjoxhpeiry...@mail.gmail.com, on 11/07/2013 at 08:55 PM, Tony Harminc t...@harminc.net said: It serializes happily against all the CS variations, TS, and the newer interlocked-update instructions like ASI, LAA, and so on. It's a bit more complicated than that; the behavior of ASI depends on whether interlocked-access facility 1 is available. IAC, the point remains that the user of a synchronization facility needs to understand the rules and should not expect interoperability between unrelated facilities, e.g., latches and semaphores. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
From time to time Shmuel makes a seminal, crucial contribution; and begin extract the point remains that the user of a synchronization facility needs to understand the rules and should not expect interoperability between unrelated facilities, e.g., latches and semaphores. /end extract is one of them. This thread has been a sort of celebration of misconceived expectations. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In 20131109.144735.898...@webmail11.dca.untd.com, on 11/09/2013 at 07:47 PM, esst...@juno.com esst...@juno.com said: It would be nice if someone would actually post some working code using a PLO instruction, It would be even nicer if IBM would add some PLO examples to Appendix A of PoOps. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On Sat, 9 Nov 2013 19:47:35 GMT, esst...@juno.com esst...@juno.com wrote: I have been reading and following this thread sine PLO is not an instruction I use every day. It would be nice if someone would actually post some working code using a PLO instruction, to illustrate how one would add an element to a queue and remove an element from a queue. Paul D'Angelo I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. Mark -- Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS mailto:m...@mzelden.com ITIL v3 Foundation Certified Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html Systems Programming expert at http://search390.techtarget.com/ateExperts/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11/11/2013 5:19 AM, Mark Zelden wrote: On Sat, 9 Nov 2013 19:47:35 GMT, esst...@juno.com esst...@juno.com wrote: I have been reading and following this thread sine PLO is not an instruction I use every day. It would be nice if someone would actually post some working code using a PLO instruction, to illustrate how one would add an element to a queue and remove an element from a queue. Paul D'Angelo I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. They used smart algorithms using the atomic instructions they had, like RCU http://en.wikipedia.org/wiki/Read-copy-update. It's interesting that I have never seen any use of the PLO instruction in the zLinux kernel code. Paul McKenney, IBMs expert on these things, wrote a good article that suggests that Hardware Transaction Memory may not be the panacea we all expect it to be, and in some cases may actually increase CPU http://paulmck.livejournal.com/31285.html. Mark -- Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS mailto:m...@mzelden.com ITIL v3 Foundation Certified Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html Systems Programming expert at http://search390.techtarget.com/ateExperts/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I read the article. This article is about transactional event facility introduced in z/EC-12 and not PLO which is an LL/CS. I wish I had access to a z/EC-12 with the transactional event facility to play with it and compare it to PLO. The transactional event facility is much more comprehensive and not as granular as a PLO. In PLO, the hardware locking occurs according to the lock word. I've done a lot of testing with PLO. It can increase CPU, particularly in a situation where updates are much higher percentage of the operations. But in all applications that I've tested, it's CPU overhead is offset by higher throughput. In a traditional locking method, tasks end up serializing to the lock. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Sunday, November 10, 2013 6:50 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 11/11/2013 5:19 AM, Mark Zelden wrote: On Sat, 9 Nov 2013 19:47:35 GMT, esst...@juno.com esst...@juno.com wrote: I have been reading and following this thread sine PLO is not an instruction I use every day. It would be nice if someone would actually post some working code using a PLO instruction, to illustrate how one would add an element to a queue and remove an element from a queue. Paul D'Angelo I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. They used smart algorithms using the atomic instructions they had, like RCU http://en.wikipedia.org/wiki/Read-copy-update. It's interesting that I have never seen any use of the PLO instruction in the zLinux kernel code. Paul McKenney, IBMs expert on these things, wrote a good article that suggests that Hardware Transaction Memory may not be the panacea we all expect it to be, and in some cases may actually increase CPU http://paulmck.livejournal.com/31285.html. Mark -- Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS mailto:m...@mzelden.com ITIL v3 Foundation Certified Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html Systems Programming expert at http://search390.techtarget.com/ateExperts/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 11/11/2013 10:36 AM, Kenneth Wilkerson wrote: I read the article. This article is about transactional event facility introduced in z/EC-12 and not PLO which is an LL/CS. I wish I had access to a z/EC-12 with the transactional event facility to play with it and compare it to PLO. The transactional event facility is much more comprehensive and not as granular as a PLO. In PLO, the hardware locking occurs according to the lock word. Transaction Memory sounds exciting but it's complex. IBM should put a layer of abstraction on top with simple semantics. I've done a lot of testing with PLO. It can increase CPU, particularly in a situation where updates are much higher percentage of the operations. But in all applications that I've tested, it's CPU overhead is offset by higher throughput. In a traditional locking method, tasks end up serializing to the lock. There are lots of lock-free algorithms out there that do very job just using a simple CAS. RCU, hazard pointers to name but a few. Hazard pointers are interesting in how they deal with ownership http://www.research.ibm.com/people/m/michael/podc-2002.pdf (IBM patent warning!). PLO is a fine instruction. It makes it easy to implement lock-free multi-producer multi-consumer stacks/queues. I'm interested how would one use PLO to implement a fair reader/writer lock? I've seen some interesting bakery style ticketing algorithms. They-re basically spinlocks on steriods. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Sunday, November 10, 2013 6:50 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque On 11/11/2013 5:19 AM, Mark Zelden wrote: On Sat, 9 Nov 2013 19:47:35 GMT, esst...@juno.com esst...@juno.com wrote: I have been reading and following this thread sine PLO is not an instruction I use every day. It would be nice if someone would actually post some working code using a PLO instruction, to illustrate how one would add an element to a queue and remove an element from a queue. Paul D'Angelo I've not been paying that close of attention, but I'm more curious about what people did for these situations prior to PLO. They used smart algorithms using the atomic instructions they had, like RCU http://en.wikipedia.org/wiki/Read-copy-update. It's interesting that I have never seen any use of the PLO instruction in the zLinux kernel code. Paul McKenney, IBMs expert on these things, wrote a good article that suggests that Hardware Transaction Memory may not be the panacea we all expect it to be, and in some cases may actually increase CPU http://paulmck.livejournal.com/31285.html. Mark -- Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS mailto:m...@mzelden.com ITIL v3 Foundation Certified Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html Systems Programming expert at http://search390.techtarget.com/ateExperts/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I see a flaw in my logic... I need to use PLO to serialize all this time. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Serialization without Enque
I have been reading and following this thread sine PLO is not an instruction I use every day. It would be nice if someone would actually post some working code using a PLO instruction, to illustrate how one would add an element to a queue and remove an element from a queue. Paul D'Angelo -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
** * SOMEONE ELSE HAS UPDATED THE CHAIN. WE NEED TO RETURN THE BUILT * ELEMENT TO THE FREE CHAIN AND REDRIVE. REDRIVES ARE VERY INFREQUENT ADDTOFREE DS 0H XC ELEMENT_PTRS,ELEMENT_PTRS LG R14,QUEUE_AVCOUNTERS GET COUNTER FOR AVAILABLE QUEUE LGR R15,R14COPY COUNTER TO AJUST LA R1,QUEUE_AVCOUNTERSCOUNTER IS LOCK WORD LG R2,QUEUE_AVAILQGET CURRENT FREE ELEMENT HEAD STG R2,ELEMENT_NEXTSET CURR HEAD AS NEXT ALG R15,INCCHANGES INCR CHANGES BY 1 HIGH WORD AHI R15,1 INCR ELEMENTS IN FREE BY 1 LA R0,10 PLO DOUBLE COMPARE AND SWAP PLO R14,0(R1),R2,QUEUE_AVAILQ ADD TO FREE ELEMENTS JNZ ADDTOFREE JSEARCH_REDRIVE ** * COME HERE IF ELEMENT FOUND - PROCESS IS DONE FOUND DS 0H * R3 HAS FOUND ELEMENT ... *** * ERROR ROUTINE FOR OUT OF SPACE WEREOUTOFSPACE DS 0H ... HOPEFULLY, I DIDN'T TRANSLATE SOMETHING INCORRECTLY OR MISTYPED IT BUT THIS IS A WORKING ALGORITHM THAT GETS CALLED THE FIRST TIME ANY DU REQUESTS ANY SERVICE FROM THIS SERVER. IT'S USED BY TERMINATION TO INSURE ALL RESOURCES ARE RETURNED WHEN A DU ENDS. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of esst...@juno.com Sent: Saturday, November 09, 2013 1:48 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Serialization without Enque I have been reading and following this thread sine PLO is not an instruction I use every day. It would be nice if someone would actually post some working code using a PLO instruction, to illustrate how one would add an element to a queue and remove an element from a queue. Paul D'Angelo -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
applicable to 99%+ of all serialization scenarios you encounter To be frank, you might not have very complex serialization requirements. Also, using PLO when CS,CSG,CDS,CDSG would do is a significant waste of cycles. For the cases I have seen within our code, uses of PLO (in the cases where it is not better to use something simpler) are a tiny percentage of our serialization needs. When the updating process wakes up S0C4! Using PLO to update a free queue, as is the case with CPOOL and its CDS-based free-queue protocol, requires that the queue elements *never* be freed (unless you like potentially blowing up or, worse, overlaying something you didn't intend to write into). Perhaps this is not well understood. I really don't see the big deal with an 0c4 in this scenario (should happen rarely) That's a scary statement. If you get an 0C4 you could probably deal with it. The real risk is the case where you don't get an 0C4 because the storage was re-allocated and used for something else and now it does not program check but overlays something. I think I figured out a solution: There are a lot of details missing in what was shown, but if you want my honest suspicion, it's that if this is a free queue type of protocol, it will not work. The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I really don't see the big deal with an 0c4 in this scenario (should happen rarely) You misunderstood my point. You could use PLO to serialize a chain even if the areas are released as they are deleted provided you always use PLO Compare and Load to load the pointers and recovery sets a retry to re-drive the process. As long as the count is updated each time you do a delete (and release), if the delete occurs while some other management function is being performed, the PLO Compare and Load will force the redrive either by CC or 0c4. If the storage had been reallocated but still in the same key, the PLO would fail because the count has changed. PLO may fetch operands before the lock and memory serialization so an 0c4 can occur on the PLO Compare and Load. Either way, the PLO does not store so there is never an overlay. I would never design an application to use PLO in this fashion. However, if I had an existing application, I find this methodology more desirable than getting a CMS lock everywhere I access the chain. I stand by my statement that you can serialize 99+% of all scenarios using PLO and that P:LO serialization is much more desirable than locks. If this were not the case, why bother creating the transactional execution facility? Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Friday, November 08, 2013 8:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque applicable to 99%+ of all serialization scenarios you encounter To be frank, you might not have very complex serialization requirements. Also, using PLO when CS,CSG,CDS,CDSG would do is a significant waste of cycles. For the cases I have seen within our code, uses of PLO (in the cases where it is not better to use something simpler) are a tiny percentage of our serialization needs. When the updating process wakes up S0C4! Using PLO to update a free queue, as is the case with CPOOL and its CDS-based free-queue protocol, requires that the queue elements *never* be freed (unless you like potentially blowing up or, worse, overlaying something you didn't intend to write into). Perhaps this is not well understood. I really don't see the big deal with an 0c4 in this scenario (should happen rarely) That's a scary statement. If you get an 0C4 you could probably deal with it. The real risk is the case where you don't get an 0C4 because the storage was re-allocated and used for something else and now it does not program check but overlays something. I think I figured out a solution: There are a lot of details missing in what was shown, but if you want my honest suspicion, it's that if this is a free queue type of protocol, it will not work. The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
-Original Message- From: Kenneth Wilkerson [mailto:redb...@austin.rr.com] Sent: Friday, November 08, 2013 8:46 AM To: 'IBM Mainframe Discussion List' Subject: RE: Serialization without Enque I really don't see the big deal with an 0c4 in this scenario (should happen rarely) You misunderstood my point. You could use PLO to serialize a chain even if the areas are released as they are deleted provided you always use PLO Compare and Load to load the pointers and recovery sets a retry to re-drive the process. As long as the count is updated each time you do a delete (and release), if the delete occurs while some other management function is being performed, the PLO Compare and Load will force the redrive either by CC or 0c4. If the storage had been reallocated but still in the same key, the PLO would fail because the count has changed. PLO may fetch operands before the lock and memory serialization so an 0c4 can occur on the PLO Compare and Load. Either way, the PLO does not store so there is never an overlay. I would never design an application to use PLO in this fashion. However, if I had an existing application, I find this methodology more desirable than getting a CMS lock everywhere I access the chain. I stand by my statement that you can serialize 99+% of all scenarios using PLO and that P:LO serialization is much more desirable than locks. If this were not the case, why bother creating the transactional execution facility? Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Friday, November 08, 2013 8:03 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque applicable to 99%+ of all serialization scenarios you encounter To be frank, you might not have very complex serialization requirements. Also, using PLO when CS,CSG,CDS,CDSG would do is a significant waste of cycles. For the cases I have seen within our code, uses of PLO (in the cases where it is not better to use something simpler) are a tiny percentage of our serialization needs. When the updating process wakes up S0C4! Using PLO to update a free queue, as is the case with CPOOL and its CDS-based free-queue protocol, requires that the queue elements *never* be freed (unless you like potentially blowing up or, worse, overlaying something you didn't intend to write into). Perhaps this is not well understood. I really don't see the big deal with an 0c4 in this scenario (should happen rarely) That's a scary statement. If you get an 0C4 you could probably deal with it. The real risk is the case where you don't get an 0C4 because the storage was re-allocated and used for something else and now it does not program check but overlays something. I think I figured out a solution: There are a lot of details missing in what was shown, but if you want my honest suspicion, it's that if this is a free queue type of protocol, it will not work. The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Thank You for your help (all of you) but Peter's statement below does not make sense to me (maybe because I don't understand something). The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. I use PLO to add an element to a chain. The chain only has forward pointers. I always add to the end of the chain. I use storage (not CPOOL) to get the element. It turns out that I haven't gotten the PLO instruction to work properly yet but in theory in my scenario it seem to me that if the pointer to the last element is pointing to a element (Not 0) I should be able to store the next pointer and the last pointer in one PLO CSDST. Here is the actual (not working code... It is not updating the chain properly): CSAMSEGL == Last element on the chain CSAMSEGF == First element on the chain R8 Address of element to add. MSEGNEXT == Pointer to next element in last control block MSEGCB == element name DOX078 DS0H *C IF CSAMSEGL EQ 0 THEN(no elements on the chain) XRR4,R4 XRR5,R5 LRR2,R8(R8: ADDRESS OF MSEGCB) LRR3,R8 *C SET CSAMSEGL = CSAMSEGF = MSEGCB JUST BUILT CDS R4,R2,CSAMSEGF IF CSAMSEGF CSAMSEGL = 0, STM 2,3,CSAMSEGF BC4,ELIFX076 *C ELSE B EIFX076 ELIFX076 DS0H *C IF CSAMSEGL = CSAMSEGL (R2) *C SET CSAMSEGL = POINTER_TO_NEW_MSEG (R8) *C SET MSEGNEXT = POINTER_TO_NEW_MSEG (R8) CSDSTEQU 16 L R2,CSAMSEGL LAR0,CSDST LAR1,PLT LRR3,R8 LAR4,CSAMSEGL CSAMSEGL IS IN CSA STR4,OPERAND6 LAR4,MSEGNEXT MSEGNEXT IS IN CSA STR4,OPERAND4 STR8,OPERAND3 STR8,OPERAND5 PLO R2,CSAMSEGL,0,PL * THE FIRST-OPERAND COMPARISON VALUE AND THE SECOND OPERAND ARE * COMPARED. IF THEY ARE EQUAL, THE FIRST-OPERAND REPLACEMENT VALUE * IS STORED AT THE SECOND-OPERAND LOCATION, AND THE THIRD OPERAND IS * STORED AT THE FOURTHOPERAND LOCATION. THEN, THE FIFTH OPERAND IS * STORED AT THE SIXTH-OPERAND LOCATION. BNZ DOX078 EIFX076 DS0H PLT DSDPLO LOCK TOKEN PL DS0F PARAMETER LIST ORG PL+60 OPERAND3 DSANEW MSEG ADDRESS ORG PL+76 OPERAND4 DSAADDRESS OF CSAMSEGL ORG PL+92 OPERAND5 DSANEW MSEG ADDRESS ORG PL+108 OPERAND6 DSAADDRESS OF MSEGNEXT -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
This specific paragraph from Peter is about FREE QUEUE PROTOCOL. This is where elements on your chain are no longer needed. Peter recommends not freeing the element. Instead, you should use a queue of free elements that you reuse when you need a new element. Peter's concern is not the chaining. It's with the data each element represents and making sure it remains consistent in a multi-processing environment. A program using the element is not guaranteed the element hasn't been freed and possibly re-allocated. PLO only ensures serialization of the chain. To ensure the element is valid, header validation is not a enough. You should maintain a validation field that is occasionally verified. This greatly reduces the exposure but does not completely eliminate it. Peter's concern is valid and justified. S0C4 abends are visible so they can be dealt with. The real problem is storage overlays that are not immediately apparent. Even worse is when they affect unrelated programs. Jon Perryman. From: Donald Likens dlik...@infosecinc.com Thank You for your help (all of you) but Peter's statement below does not make sense to me (maybe because I don't understand something). The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. I use PLO to add an element to a chain. The chain only has forward pointers. I always add to the end of the chain. I use storage (not CPOOL) to get the element. It turns out that I haven't gotten the PLO instruction to work properly yet but in theory in my scenario it seem to me that if the pointer to the last element is pointing to a element (Not 0) I should be able to store the next pointer and the last pointer in one PLO CSDST. Here is the actual (not working code... It is not updating the chain properly): CSAMSEGL == Last element on the chain CSAMSEGF == First element on the chain R8 Address of element to add. MSEGNEXT == Pointer to next element in last control block MSEGCB == element name DOX078 DS 0H *C IF CSAMSEGL EQ 0 THEN (no elements on the chain) XR R4,R4 XR R5,R5 LR R2,R8 (R8: ADDRESS OF MSEGCB) LR R3,R8 *C SET CSAMSEGL = CSAMSEGF = MSEGCB JUST BUILT CDS R4,R2,CSAMSEGF IF CSAMSEGF CSAMSEGL = 0, STM 2,3,CSAMSEGF BC 4,ELIFX076 *C ELSE B EIFX076 ELIFX076 DS 0H *C IF CSAMSEGL = CSAMSEGL (R2) *C SET CSAMSEGL = POINTER_TO_NEW_MSEG (R8) *C SET MSEGNEXT = POINTER_TO_NEW_MSEG (R8) CSDST EQU 16 L R2,CSAMSEGL LA R0,CSDST LA R1,PLT LR R3,R8 LA R4,CSAMSEGL CSAMSEGL IS IN CSA ST R4,OPERAND6 LA R4,MSEGNEXT MSEGNEXT IS IN CSA ST R4,OPERAND4 ST R8,OPERAND3 ST R8,OPERAND5 PLO R2,CSAMSEGL,0,PL * THE FIRST-OPERAND COMPARISON VALUE AND THE SECOND OPERAND ARE * COMPARED. IF THEY ARE EQUAL, THE FIRST-OPERAND REPLACEMENT VALUE * IS STORED AT THE SECOND-OPERAND LOCATION, AND THE THIRD OPERAND IS * STORED AT THE FOURTHOPERAND LOCATION. THEN, THE FIFTH OPERAND IS * STORED AT THE SIXTH-OPERAND LOCATION. BNZ DOX078 EIFX076 DS 0H PLT DS D PLO LOCK TOKEN PL DS 0F PARAMETER LIST ORG PL+60 OPERAND3 DS A NEW MSEG ADDRESS ORG PL+76 OPERAND4 DS A ADDRESS OF CSAMSEGL ORG PL+92 OPERAND5 DS A NEW MSEG ADDRESS ORG PL+108 OPERAND6 DS A ADDRESS OF MSEGNEXT -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to
Re: Serialization without Enque
First, I'm not sure why you have chosen PLT as your lock word. It's very important the lock word resolve to the same REAL address no matter where the PLO executes. Since you are talking about multiple operations against the same chain, unless all the processes exist in the same shared program you can't be sure the same lock word is always used. The lock word contents are not altered just used to create a lock value(PLT) for serialization. From your code, I would chose CSAMSEGF. Second, you can't mix the CDS with PLO and expect consistent results. It would be very easy to convert the CDS to a PLO Compare Double and Swap (compare each full word and swap). Just be sure to use the same lock word which I still suggest as CSAMSEGF. Third, you're going to have to use a count. And since you are acquiring and releasing this storage, you REALLY have to use a counter. Just add a full word counter in CSA. In this case, I would choose the counter as the lock word. As long as you have a full word counter and recovery to treat a 0c4 as a re-drive (see my prior comments), this should work. I don't know if you can, but the choice of another methodology where storage is not acquired for each add and released for each delete would be recommended. However, you can make this work but not without a counter. Last, Peter's comment is very valid. Read the notes in CDS in the POM. I don't know why PLO doesn't reference them since they are just as applicable to PLO. I have not closely examined your logic so I don't know what the logic problem is in the code. I'm just commenting on the methodology. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Donald Likens Sent: Friday, November 08, 2013 10:22 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thank You for your help (all of you) but Peter's statement below does not make sense to me (maybe because I don't understand something). The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. I use PLO to add an element to a chain. The chain only has forward pointers. I always add to the end of the chain. I use storage (not CPOOL) to get the element. It turns out that I haven't gotten the PLO instruction to work properly yet but in theory in my scenario it seem to me that if the pointer to the last element is pointing to a element (Not 0) I should be able to store the next pointer and the last pointer in one PLO CSDST. Here is the actual (not working code... It is not updating the chain properly): CSAMSEGL == Last element on the chain CSAMSEGF == First element on the chain R8 Address of element to add. MSEGNEXT == Pointer to next element in last control block MSEGCB == element name DOX078 DS0H *C IF CSAMSEGL EQ 0 THEN(no elements on the chain) XRR4,R4 XRR5,R5 LRR2,R8(R8: ADDRESS OF MSEGCB) LRR3,R8 *C SET CSAMSEGL = CSAMSEGF = MSEGCB JUST BUILT CDS R4,R2,CSAMSEGF IF CSAMSEGF CSAMSEGL = 0, STM 2,3,CSAMSEGF BC4,ELIFX076 *C ELSE B EIFX076 ELIFX076 DS0H *C IF CSAMSEGL = CSAMSEGL (R2) *C SET CSAMSEGL = POINTER_TO_NEW_MSEG (R8) *C SET MSEGNEXT = POINTER_TO_NEW_MSEG (R8) CSDSTEQU 16 L R2,CSAMSEGL LAR0,CSDST LAR1,PLT LRR3,R8 LAR4,CSAMSEGL CSAMSEGL IS IN CSA STR4,OPERAND6 LAR4,MSEGNEXT MSEGNEXT IS IN CSA STR4,OPERAND4 STR8,OPERAND3 STR8,OPERAND5 PLO R2,CSAMSEGL,0,PL * THE FIRST-OPERAND COMPARISON VALUE AND THE SECOND OPERAND ARE * COMPARED. IF THEY ARE EQUAL, THE FIRST-OPERAND REPLACEMENT VALUE * IS STORED AT THE SECOND-OPERAND LOCATION, AND THE THIRD OPERAND IS * STORED AT THE FOURTHOPERAND LOCATION. THEN, THE FIFTH OPERAND
Re: Serialization without Enque
A storage overlay cannot occur in a properly implemented PLO with a counter as long as the counter is properly maintained with every process incrementing it by 1. Even in in a free chain implementation, an improper PLO sequence can result in a circular or broken chain. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Friday, November 08, 2013 11:44 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque This specific paragraph from Peter is about FREE QUEUE PROTOCOL. This is where elements on your chain are no longer needed. Peter recommends not freeing the element. Instead, you should use a queue of free elements that you reuse when you need a new element. Peter's concern is not the chaining. It's with the data each element represents and making sure it remains consistent in a multi-processing environment. A program using the element is not guaranteed the element hasn't been freed and possibly re-allocated. PLO only ensures serialization of the chain. To ensure the element is valid, header validation is not a enough. You should maintain a validation field that is occasionally verified. This greatly reduces the exposure but does not completely eliminate it. Peter's concern is valid and justified. S0C4 abends are visible so they can be dealt with. The real problem is storage overlays that are not immediately apparent. Even worse is when they affect unrelated programs. Jon Perryman. From: Donald Likens dlik...@infosecinc.com Thank You for your help (all of you) but Peter's statement below does not make sense to me (maybe because I don't understand something). The reason that the free queue protocol needs a sequence number is because even if the header matches, the values that you need to put into your new element for the next and/or previous may no longer be correct due to a set of additional asynchronous operations. I use PLO to add an element to a chain. The chain only has forward pointers. I always add to the end of the chain. I use storage (not CPOOL) to get the element. It turns out that I haven't gotten the PLO instruction to work properly yet but in theory in my scenario it seem to me that if the pointer to the last element is pointing to a element (Not 0) I should be able to store the next pointer and the last pointer in one PLO CSDST. Here is the actual (not working code... It is not updating the chain properly): CSAMSEGL == Last element on the chain CSAMSEGF == First element on the chain R8 Address of element to add. MSEGNEXT == Pointer to next element in last control block MSEGCB == element name DOX078 DS 0H *C IF CSAMSEGL EQ 0 THEN (no elements on the chain) XR R4,R4 XR R5,R5 LR R2,R8 (R8: ADDRESS OF MSEGCB) LR R3,R8 *C SET CSAMSEGL = CSAMSEGF = MSEGCB JUST BUILT CDS R4,R2,CSAMSEGF IF CSAMSEGF CSAMSEGL = 0, STM 2,3,CSAMSEGF BC 4,ELIFX076 *C ELSE B EIFX076 ELIFX076 DS 0H *C IF CSAMSEGL = CSAMSEGL (R2) *C SET CSAMSEGL = POINTER_TO_NEW_MSEG (R8) *C SET MSEGNEXT = POINTER_TO_NEW_MSEG (R8) CSDST EQU 16 L R2,CSAMSEGL LA R0,CSDST LA R1,PLT LR R3,R8 LA R4,CSAMSEGL CSAMSEGL IS IN CSA ST R4,OPERAND6 LA R4,MSEGNEXT MSEGNEXT IS IN CSA ST R4,OPERAND4 ST R8,OPERAND3 ST R8,OPERAND5 PLO R2,CSAMSEGL,0,PL * THE FIRST-OPERAND COMPARISON VALUE AND THE SECOND OPERAND ARE * COMPARED. IF THEY ARE EQUAL, THE FIRST-OPERAND REPLACEMENT VALUE * IS STORED AT THE SECOND-OPERAND LOCATION, AND THE THIRD OPERAND IS * STORED AT THE FOURTHOPERAND LOCATION. THEN, THE FIFTH OPERAND IS * STORED AT THE SIXTH-OPERAND LOCATION. BNZ DOX078 EIFX076 DS 0H PLT DS D PLO LOCK TOKEN PL DS 0F PARAMETER LIST ORG PL+60 OPERAND3 DS A NEW MSEG ADDRESS ORG PL+76 OPERAND4 DS A ADDRESS OF CSAMSEGL ORG PL+92 OPERAND5 DS A NEW MSEG ADDRESS ORG PL+108 OPERAND6 DS A ADDRESS OF MSEGNEXT -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
The storage overlay does not pertain to the PLO. It pertains to the entire element not being immediately removed from any type of use. Just because you removed the element from the chain does not mean it's not in use somewhere. You can't even say how long the element may be in use (e.g. task does not get any CPU because of CPU load or swapped out address space in multi-address space serialization). Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com A storage overlay cannot occur in a properly implemented PLO with a counter as long as the counter is properly maintained with every process incrementing it by 1. Even in in a free chain implementation, an improper PLO sequence can result in a circular or broken chain. -Original Message- Behalf Of Jon Perryman This specific paragraph from Peter is about FREE QUEUE PROTOCOL. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Jon Perryman wrote: begin extract This is where elements on your chain are no longer needed. Peter recommends not freeing the element. Instead, you should use a queue of free elements that you reuse when you need a new , , , /end extract and I have two comments. First, 'queue of free elements' need not be taken literally. These elements can and should be managed as a pool, implemented as a LIFO list not because LIFO sequencing is itself important but because it is simpler than alternatives to it. Second, such lists and their associated pools are better managed using a primary element-allocation quantity p and a secondary element-allocation quantity s, with s = p. When processing is initialized p elements are added to the pool. Thereafter an element is always removed from the pool when one is needed and returned to the pool when it is no longer needed. The secondary allocation s figures in processing in just one way. If the pool is found to be empty when an attempt is made to remove an element from it s elements are added to the pool. A further operation is usually appropriate. Testing to ensure that the current pool-element count c is less than s when an element is to be returned to the pool, and freeing its storage instead iff c = s ensures that surges of acrtivity do not result in the accumulation of excessive numbers of elements in the pool. This scheme has the merit that it is 0c4-proof. No freed storage is reused. It is also simple, fast and tunable. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Yes, I was talking about all references using PLO. I was also assuming this was a work queue where deletion from the chain was the methodology for claiming ownership of the element. However, any serialization method that performs deletion must also have a method for claiming ownership before an element can be deleted. It doesn't matter whether deletion is an actual release of storage or the placement of the element into a free chain. If any process other than the owning process maintains a reference to an element without claiming it, the problem exists whether you use locks, PLO, CS, whatever. If the reference is nothing more than searching the chain, then PLO Compare and Load can solve that. If the reference is more than that, the problem is not storage overlays or 0c4s. The problem is a missing method for ownership. If this is true for this application, the chain is only serializable with a lock and the lock must be held throughout the period where the element is referenced before the element can be safely deleted. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Friday, November 08, 2013 1:11 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque The storage overlay does not pertain to the PLO. It pertains to the entire element not being immediately removed from any type of use. Just because you removed the element from the chain does not mean it's not in use somewhere. You can't even say how long the element may be in use (e.g. task does not get any CPU because of CPU load or swapped out address space in multi-address space serialization). Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com A storage overlay cannot occur in a properly implemented PLO with a counter as long as the counter is properly maintained with every process incrementing it by 1. Even in in a free chain implementation, an improper PLO sequence can result in a circular or broken chain. -Original Message- Behalf Of Jon Perryman This specific paragraph from Peter is about FREE QUEUE PROTOCOL. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Who frees what is always important. In general a list manager should have no notion and make no assumptions about the structure or organization of the things it puts into and removes from the list. I tell my students that they should think of these things as SOTs, instances of Something Out There. What they should deal in is a pointer to a SOT [and sometimes a key too] and a chaining block (CB) specific to the list, which always contains 1) a pointer to the SOT and 2) the additional chaining pointers required by the list organization. The SOT belongs to a 'user'. Its storage is never freed. The CBs belong to the list, which allocates and frees their storage. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I'm guessing that Peter was warning against the situation where you have an element table rather than a queue. Queues are straight forward because each entry will only be in use by 1 program at any given time (ownership is always the task using it). Element tables on the other hand may have multiple tasks referencing the same entry at the same time. An example that is easily understood is dynamic LPA. These modules exist in common storage and you may dynamically replace them with a new version. Anyone still using the old version must not abend so IBM leaves all old versions in storage until the next IPL. Since you don't know if anyone is using the old version, you can't free / reuse it. In the old days when CSA was constrained, it was a real problem to use such commands. Today, we don't think twice about it but it still eats up some CSA. As for handling a queue, I use CS with the entries LIFO on the queue. A common task removes the queued entries and takes ownership. At that time, it repairs the forward pointers and gets the first entry. I haven't heard anything here that would make me think that PLO would be less expensive so I wouldn't bother changing it's implementation Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com Yes, I was talking about all references using PLO. I was also assuming this was a work queue where deletion from the chain was the methodology for claiming ownership of the element. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Looks like my impossible with respect to the free queue protocol needed to be qualified by until PLO came along as long as you have storage to work with for the PLO parameters. As was pointed out to me, PLO CL and PLO DCS can be used to implement the free queue protocol (and not need the sequence number) I don't think that's the protocol that the OP was going after, however, but I've lost track. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I was wondering if there was quad-word consistency as well? I'm sure there is I always insure that the primary counter is in the first double word. This is not necessary because of the quadword consistency. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
It has taken me this long to mostly understand PLO... I must be slow. Now that I understand it (mostly) I am pretty sure it will not work for me. My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4! That is why I was looking at using locks. If the process updating the chain holds a lock and the process removing the chain needs that lock to update the pointers this would not happen. So back to my original question: My code must be able to run in SRB mode and with locks held. I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). I have attempted to use a combination of CS instructions to resolve this problem but it does not work. I know this will work if I use a CMS lock but I am concerned about affecting the whole system. Any advice? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
If your application is not designed to use PLO for serialization, it'll definitely not work for you. I use PLO for serialization because of issues with locks that you are describing (system affects) and many others. All my code can run as SRBs but unlike what you describe I almost never acquire locks. I always use PLO except when interfacing with MVS services that require locks which I do rarely. Besides you can't mix CS with PLO meaning you would have to convert every CS affecting that chain. Sounds to me like you're stuck with the CMS lock. And if it’s a process done frequently, it will have a system impact; a significantly greater impact than a PLO serialized one. The thing I don't understand is you're statement My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4! If you mean that you're releasing the storage for the chain, the process doing the release could use a PLO to set the chain pointers to 0 and in that process update the swap word. The second process will then get a CC forcing re-drive and it'll discover the chain is now 0. In that case, that would probably mean that you would need to use PLO Compare and loads for every reference to those chain pointers or (my preference) you would have a retry point detect the 0c4 and realize the chain has been released and just continue . I really don't see the big deal with an 0c4 in this scenario (should happen rarely). The PLO process does not update anything until the PLO executes so no harm no foul. Again, if the application is not designed to use PLO, it won't work. And you're not slow. When I first started using PLO almost 10 years ago. I spent days writing test cases with special traces so I could see what was happening. The whole time I had the POM open to the PLO instruction reading it over and over. Now, I consistently code PLO without much thought. If you chose to spend time learning the PLO, I think you'll find it vastly superior to locks (no restrictions) and applicable to 99%+ of all serialization scenarios you encounter provided you design the application to use PLO from the start. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Donald Likens Sent: Thursday, November 07, 2013 8:33 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque It has taken me this long to mostly understand PLO... I must be slow. Now that I understand it (mostly) I am pretty sure it will not work for me. My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4! That is why I was looking at using locks. If the process updating the chain holds a lock and the process removing the chain needs that lock to update the pointers this would not happen. So back to my original question: My code must be able to run in SRB mode and with locks held. I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). I have attempted to use a combination of CS instructions to resolve this problem but it does not work. I know this will work if I use a CMS lock but I am concerned about affecting the whole system. Any advice? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 7 November 2013 09:33, Donald Likens dlik...@infosecinc.com wrote: Now that I understand it (mostly) I am pretty sure it will not work for me. My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4! That is why I was looking at using locks. If the process updating the chain holds a lock and the process removing the chain needs that lock to update the pointers this would not happen. Why don't you show us your control block layout, and explain how many work units can add items, and how, how many can read, etc. There are few of these things that truly require use of locks/ENQs and the like. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I think I figured out a solution: DOX078 DS0H *C IF LASTCB EQ 0 THEN *C SET LASTCB= FIRSTCB = MSEGCB JUST BUILT CDS R4,R2,FIRSTCB IF MSEGF MSEGL = 0, STM 2,3,FIRSTCB *C ELSE *C SET FIRSTCB = POINTER_TO_NEW_MSEG IF FIRSTCB = ZERO CSR4,R8,FIRSTCB IF FIRSTCB = 0, ST R8 FIRSTCB *C IF LASTCB= LASTCB (R2) *C SET LASTCB= POINTER_TO_NEW_MSEG (R1) *C SET NEXTCB = POINTER_TO_NEW_MSEG (R1) LAR0,CSDST PLO R2,CSAMSEGL,0,PL BNZ DOX078 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
If I understand your situation, then you are maintaining a queue. Today, you are probably queuing in the reverse order using CS (LIFO). You want to eliminate reversing the queue (FIFO) by using PLO to create the queue FIFO. PLO is desgned to solve this problem. PLO doesn't care about the store fields and values. It only serializes the counter that has been described. In your case, the serialization is to represent any change to the queue regardless of the location on the queue. Fields: counter, address of first queued CB and address of last queued CB. To remove queued entries: 1. Load and increment counter. Must be first to ensure correct to ensure step 2 is the correct CB address.. 2. Save first queued CB address. 3. PLO CSDST counter and store 0 at first last CB address 4. Repeat if PLO failed. To add to the queue 1. Load and increment counter 2. Copy last CB address to the backwards pointer of the new CB. 3. PLO CSDST counter and store new CB address in the forward pointer of the last CB and store the new CB address in the last CB address. 4. Repeat if PLO fails. You could serialize updates to any entry in the queue, just the same as the add but the second step would be to locate the entry to be updated. If you If you are deleting entries, you might need to use PLO compare and load when scanning thru the chain. Is it more efficient to continue using CS / reverse chain or to use PLO to build the chain in the correct sequence? Jon Perryman. From: Tony Harminc t...@harminc.net Now that I understand it (mostly) I am pretty sure it will not work for me. My problem is that a process comes in and removes the control block chain while another process is suspended and attempting to update the chain. When the updating process wakes up S0C4! That is why I was looking at using locks. If the process updating the chain holds a lock and the process removing the chain needs that lock to update the pointers this would not happen. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
In ofe01adb17.e9264150-on85257c1b.0044d053-85257c1b.00455...@us.ibm.com, on 11/06/2013 at 07:37 AM, Peter Relson rel...@us.ibm.com said: One of the shortcomings of PLO (unlike TBEGIN(C) ) is that PLO in general serializes only against other uses of PLO. I'd hardly label that as a shortcoming of PLO. CS only serializes agains CS, ENQ only serializes against ENQ, etc.; it's the nature of the beast. -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 6 November 2013 16:30, Shmuel Metz (Seymour J.) shmuel+ibm-m...@patriot.net wrote: Peter Relson rel...@us.ibm.com said: One of the shortcomings of PLO (unlike TBEGIN(C) ) is that PLO in general serializes only against other uses of PLO. I'd hardly label that as a shortcoming of PLO. CS only serializes agains CS, It serializes happily against all the CS variations, TS, and the newer interlocked-update instructions like ASI, LAA, and so on. And there are cases where a simple ST or the like can interoperate usefully with CS. For instance, if you update a counter with CS, it is safe to zero it with ST. Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
t...@harminc.net (Tony Harminc) writes: It serializes happily against all the CS variations, TS, and the newer interlocked-update instructions like ASI, LAA, and so on. And there are cases where a simple ST or the like can interoperate usefully with CS. For instance, if you update a counter with CS, it is safe to zero it with ST. compare-and-swap is atomic instruction ... it does the compare and does the store if the compare matches ... it solves the problem of interrupts and other processes doing something while interrupted. normal process is if the compare doesn't match ... and the store isn't done ... loop back and restart the operation. charlie had invented compare-and-swap while doing fine-grain multiprocessing locking for cp67 at the cambridge science center past posts mentioning science center http://www.garlic.com/~lynn/subtopic.html#545tech ... compare-and-swap was chosen for the name of the instruction because CAS are charlie's initials. past posts mentionin smp and/or compare-and-swap http://www.garlic.com/~lynn/subtopic.html#smp the initial attempt to include compare-and-swap in 370 was rejected because the pok favorite son operating system people said it wasn't needed, that TS was more than sufficient for multiprocessor locking (single kernel spin-lock). the 370 architecture owners said that to get compare-and-swap justified for 370 (over pok objections), had to come up with purposes other than multiprocessor locking. Thus was born the examples (still in principles of operation) for interrupt-enabled, multi-threaded applications (like large dbms) ... whether or not running in multiprocessor configuration or not. -- virtualization experience starting Jan1968, online at home since Mar1970 -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
One of the shortcomings of PLO (unlike TBEGIN(C) ) is that PLO in general serializes only against other uses of PLO. It does not serialize against CS on the same storage, for example. However, cache considerations and doubleword consistency still come into play. A LM of 2 words of a doubleword is done with what is referred to as doubleword consistency. That matters. If you need to load two consecutive words and you can arrange that those two words are in the same doubleword, it can be to your advantage. It's why in a doubleword serialized by CDS you do not typically (if ever) want to load both of the individual words consecutively (as you might get results that come half from one CDS and half from another CDS); you want to use LM. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Thank you for mentioning the issue with CS/CDS. I have always understood that if you use PLO anywhere to serialize access to an area, you must use it everywhere to serialize access to that area. It's nice to know that the transactional facility serializes it against CS as well. I wish I had access to a processor to play with it. I always understood that the reason PLO did a memory serialization at the end of a PLO was to insure that any processors that were referencing the swap word(s) in a compare and swap would get consistent results; either the swap value(32, 64 or 128) before the swap or after the swap. Since the swap value is always stored last and provided you always loaded the swap value first, you would either get the correct swap and store values and the subsequent PLO would succeed or you would get an outdated swap value with inconsistent store values and the subsequent PLO would return a CC to re-drive. I have written software traces for PLO and my observations have supported this understanding of the POM's description of PLO and memory serialization. Is this your understanding? I often use 128 bit operations with consecutive words and double words to perform some sophisticated operations. I've understood about double word consistency. I've always assumed that PLO 128 bit required quad-word alignment for 128 bit operations for the same reason. I always load the swap value (in any flavor compare and swap and store) first. I always use a LMG (even if its consecutive words) and I always insure that the primary counter is in the first double word. I've read the POM many times looking for any references to quad-word consistency. It doesn't really matter because of the way I design PLO compare and swaps but I was wondering if there was quad-word consistency as well? -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Peter Relson Sent: Wednesday, November 06, 2013 6:37 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque One of the shortcomings of PLO (unlike TBEGIN(C) ) is that PLO in general serializes only against other uses of PLO. It does not serialize against CS on the same storage, for example. However, cache considerations and doubleword consistency still come into play. A LM of 2 words of a doubleword is done with what is referred to as doubleword consistency. That matters. If you need to load two consecutive words and you can arrange that those two words are in the same doubleword, it can be to your advantage. It's why in a doubleword serialized by CDS you do not typically (if ever) want to load both of the individual words consecutively (as you might get results that come half from one CDS and half from another CDS); you want to use LM. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Sorry for the confusion but that's not the question that I was asking. I agree with you on guaranteeing the consistency using the count. I'm talking about TCB1 using PLO CSDST to store 2 adjacent words (4th 6th PLO operands) and TCB2 using LM or LG for those same 2 words. There is a very small window between the 2 store's where TCB2 will pick up inconsistent inconsistent values. In other words, the first store has completed and the LM/LG occurs before the second store completes. This window is extremely small because PLO cannot be interrupted and the instruction was prepared before performing the stores. I think the window is so small that even under heavy usage, you would only see an error every couple of months but it does exist. I think TCB2 must also use the PLO compare and load to avoid this situation. Thanks for the great information, Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com The order of stores is unpredictable except that according to the POM, operand 2 (in this case, the count) is always stored last. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
This is a complicated question that is very dependent on the design of your application. So if the design needs to use a PLO Compare and Load, by all means do so. I try to design these applications to avoid as many PLO Compare and Loads as possible. The memory serialization (once at the start and once at the end) are expensive but much less so than software locks. This means that I design the update operations so that loading the pointers during a PLO operation will; at worse, simply result in outdated information. As I explained in my first example, this is possible regardless of whether you use PLO or a lock. It's just a consequence of concurrent activity and the order in which these activities occur. In reality, I don't normally use PLO Compare and Swap and store (any flavor) for chains. They work fine for singly linked lists. Usually I use it (just wrote something today) to dynamically add an entry to a pre-allocated slot in a cell. In this case, the cell has pre-allocated slots with a high water mark pointer and a count, consecutive words in a double word. In this case, the high water mark and count are the 2nd operand and the lock word. Since the 2nd operand always updates last, any references would not even know of the new addition until the PLO completes. I fall back to my original provision. The use of PLO is heavily dependent on designing the application to use PLO. Just yesterday, I debugged a problem in a task that was not doing a PLO Compare and Load to get a count. Certainly, all references to the counts in a Compare and Swap Store should all be updated with a PLO and probably require a PLO Compare and Load. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Tuesday, November 05, 2013 10:50 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Sorry for the confusion but that's not the question that I was asking. I agree with you on guaranteeing the consistency using the count. I'm talking about TCB1 using PLO CSDST to store 2 adjacent words (4th 6th PLO operands) and TCB2 using LM or LG for those same 2 words. There is a very small window between the 2 store's where TCB2 will pick up inconsistent inconsistent values. In other words, the first store has completed and the LM/LG occurs before the second store completes. This window is extremely small because PLO cannot be interrupted and the instruction was prepared before performing the stores. I think the window is so small that even under heavy usage, you would only see an error every couple of months but it does exist. I think TCB2 must also use the PLO compare and load to avoid this situation. Thanks for the great information, Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com The order of stores is unpredictable except that according to the POM, operand 2 (in this case, the count) is always stored last. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
PLO CSDST and CSTST are *extremely* useful for queue and linked list manipulation in multi-ASID multi-TCB environments. The key to their use is to have a lock word counter that the caller increments and then prepares the new values in other regs. When it comes time to actually atomically update the lock word, you can redrive the structure manipulation logic if the CC indicates that the lock word value has changed, otherwise the other fields are updated atomically. For actual practical uses, it is well worth putting all this inside some sort of macro or small stub service as you do not want to have to code the guts of it each time. I also think the uptake of PLO would be greater if there were some decent example code in the manuals - for instance a client adding a request to the tail of the queue whilst a server is removing from the head. Rob Scott Rocket Software On 4 Nov 2013, at 03:58, Jon Perryman jperr...@pacbell.net wrote: I sure missed that one with the locks. PLO CDS does exactly what is wanted. It does 2 CS's within the locked instruction. PLO CSDST on the other hand only does a single CS followed by 2 ST's. Since 3 separate load instructions (not under PLO control) are required when not in contiguous storage, there is not any method that will guarantee the 3 values are consistent with the others. A counter as suggested by Peter Relson won't help either for this same reason. I can't think of a situation where PLO CSDST is useful. Can anyone describe a situation where it is useful? Jon Perryman. From: Rob Scott rsc...@rocketsoftware.com I think the OP stated that his code could hold locks - in which case the latch services cannot be used. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 4/11/2013 4:49 PM, Rob Scott wrote: I also think the uptake of PLO would be greater if there were some decent example code in the manuals - for instance a client adding a request to the tail of the queue whilst a server is removing from the head. Maybe somebody with expertise should blog about it with example code. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
If you truly need a triple compare and swap then PLO will not help you. But if you need a disjoint double compare and swap, you use the compare-and-swap field as a counter and then you con do a compare swap and double store. Example: Fetch counter A PLO compare-and-fetch value1 CC0, go to A PLO compare-and-fetch value 2 CC0, go to A calculate new value1 and 2 Add one to fetched counter PLO CSDST fetched-counter new-fetched-counter, new value1, new-value2 CC0, go to A This requires that all processes that update value1 or value2 use PLO with the same lock word. On Sun, 3 Nov 2013 19:58:12 -0800 Jon Perryman jperr...@pacbell.net wrote: :I sure missed that one with the locks. : :PLO CDS does exactly what is wanted. It does 2 CS's within the locked instruction. : :PLO CSDST on the other hand only does a single CS followed by 2 ST's. Since 3 separate load instructions (not under PLO control) are required when not in contiguous storage, there is not any method that will guarantee the 3 values are consistent with the others. A counter as suggested by Peter Relson won't help either for this same reason. : : :I can't think of a situation where PLO CSDST is useful. Can anyone describe a situation where it is useful? : :Jon Perryman. : : : : : From: Rob Scott rsc...@rocketsoftware.com : : : :I think the OP stated that his code could hold locks - in which case the latch services cannot be used. : : :-- :For IBM-MAIN subscribe / signoff / archive access instructions, :send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
PLO CDS does exactly what is wanted. I presume this was a typo and should have been PLO DCS (double compare and swap) PLO CSDST (and CSST and CSTST) have limitless potential exploitations. It all depends on what your requirements are. Suppose you have N (2) separate fields all of which need to be serialized together and the happy circumstance is that you never need to update more than 2 of them together. A counter (subject of the CS) and store(s) may be used to do the updates. Since 3 separate load instructions I wonder if a double compare and swap can guarantee the consistency of 3 load instructions. To the extent that a free queue protocol is analogous, you have to use a sequence number to get that right. (not under PLO control) Why is that a requirement? It is far from unheard of to use PLO Compare-and-load Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I have used PLO almost exclusively for serialization in multi-address space, multi-du code for almost 10 years. I use all 6 operations. Since everything I write is 64 bit mode, I generally use the +2 variant (64 bit length) but I like using the +3 variant (128 bit length) for some really cool stuff. As Rob pointed out, The key to their use is to have a lock word counter that the caller increments and then prepares the new values in other regs.. But I add that the real key is designing the application to use PLO for serialization which is much more than just writing macros to do the guts of the processes (though I almost always use macros). Consider this example. I have a singly linked chain of 64 bit cell addresses on quad word boundaries. The chain pointers are a quad word at the start of the cell. The first double word is the head of the active chain. The second double word is the head of the free chain. The first quad word of each cell is a double word pointer to the next active entry followed by a double word pointer to the next free entry. The chain has a quad word counter on a quad word boundary. The first double word of the counter is a change count. The second double word is an active element count. The algorithm always adds new cells to the head of the list. I can add a new cell by using a LMG to load the counters, increment each counter by 1, compute the new head, compute the old head's new previous and then use a Compare and swap and double store 128 bit to add the new entry. Since every update increments the first double word counter by 1, the process only completes if no other process updated the counter. If the counter has changed, it needs to re-drive. By adding entries to the head, I can also have code simultaneously searching the chain while entries are added. Of course, if the new head is added before the search starts, it won't be found. But that's no different than using a lock. If the search acquires the lock before the add, it won't be found either. I can even add an element that requires a search for one that has already been added. In this case, I load the counters before the search. I search the chain. If not found, I increment the counter and perform the add. If the add fails, I have to re-drive the search. I can also delete entries from the chain. When I find the entry to be deleted, I save its previous entry. I can adjust the counts, re-compute the chain pointers and do a Compare and Swap and triple store to delete the entry and add it to the fee chain. I can still search the chain but I'll probably need to do a Compare and load to do so. I can avoid the PLO compare and load by not actually deleting the cell but using the low half byte of the active next pointer as a deleted flag. But that has disadvantages as well. This also adds a little more logic to the add, since I now need to add using the free chain if one exits or an add acquiring a new cell. There are a lot of details not given here for brevity. This example also uses an unordered single linked list to simplify the example. But properly designed PLO operations can be performed on ordered doubly linked lists as well. When I read the Principles of Operations on the Z/EC12 transactional execution facility, I think strongly of a PLO on steroids. The point is that PLO can almost be used exclusively for serialization. As far as overhead, I have done a lot of testing and the key is the proper choice of the lock word and the algorithm. In my research, the throughput advantages of PLO far outweigh its overhead. I would love some time with the transactional execution facility. From my reading, it eliminates the need for any serialization other than PLO or transactional execution. Though I understand that IBM has chosen a redrive limit as the determining factor as to whether to fall back to a lock. I believe the only limit to using PLO for serialization is the imagination. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Rob Scott Sent: Monday, November 04, 2013 2:49 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque PLO CSDST and CSTST are *extremely* useful for queue and linked list manipulation in multi-ASID multi-TCB environments. The key to their use is to have a lock word counter that the caller increments and then prepares the new values in other regs. When it comes time to actually atomically update the lock word, you can redrive the structure manipulation logic if the CC indicates that the lock word value has changed, otherwise the other fields are updated atomically. For actual practical uses, it is well worth putting all this inside some sort of macro or small stub service as you do not want to have to code the guts of it each time. I also think the uptake of PLO would be greater if there were some decent example code in the manuals - for instance a client adding a request to the tail of the queue whilst
Re: Serialization without Enque
Thanks Kenneth. Excellent example. I didn't consider that the load for the counter must be first to cause the serialization. Jon Perryman. From: Kenneth Wilkerson redb...@austin.rr.com I have used PLO almost exclusively for serialization in multi-address space, multi-du code for almost 10 years. I use all 6 operations. Since everything I write is 64 bit mode, I generally use the +2 variant (64 bit length) but I like using the +3 variant (128 bit length) for some really cool stuff. As Rob pointed out, The key to their use is to have a lock word counter that the caller increments and then prepares the new values in other regs.. But I add that the real key is designing the application to use PLO for serialization which is much more than just writing macros to do the guts of the processes (though I almost always use macros). Consider this example. I have a singly linked chain of 64 bit cell addresses on quad word boundaries. The chain pointers are a quad word at the start of the cell. The first double word is the head of the active chain. The second double word is the head of the free chain. The first quad word of each cell is a double word pointer to the next active entry followed by a double word pointer to the next free entry. The chain has a quad word counter on a quad word boundary. The first double word of the counter is a change count. The second double word is an active element count. The algorithm always adds new cells to the head of the list. I can add a new cell by using a LMG to load the counters, increment each counter by 1, compute the new head, compute the old head's new previous and then use a Compare and swap and double store 128 bit to add the new entry. Since every update increments the first double word counter by 1, the process only completes if no other process updated the counter. If the counter has changed, it needs to re-drive. By adding entries to the head, I can also have code simultaneously searching the chain while entries are added. Of course, if the new head is added before the search starts, it won't be found. But that's no different than using a lock. If the search acquires the lock before the add, it won't be found either. I can even add an element that requires a search for one that has already been added. In this case, I load the counters before the search. I search the chain. If not found, I increment the counter and perform the add. If the add fails, I have to re-drive the search. I can also delete entries from the chain. When I find the entry to be deleted, I save its previous entry. I can adjust the counts, re-compute the chain pointers and do a Compare and Swap and triple store to delete the entry and add it to the fee chain. I can still search the chain but I'll probably need to do a Compare and load to do so. I can avoid the PLO compare and load by not actually deleting the cell but using the low half byte of the active next pointer as a deleted flag. But that has disadvantages as well. This also adds a little more logic to the add, since I now need to add using the free chain if one exits or an add acquiring a new cell. There are a lot of details not given here for brevity. This example also uses an unordered single linked list to simplify the example. But properly designed PLO operations can be performed on ordered doubly linked lists as well. When I read the Principles of Operations on the Z/EC12 transactional execution facility, I think strongly of a PLO on steroids. The point is that PLO can almost be used exclusively for serialization. As far as overhead, I have done a lot of testing and the key is the proper choice of the lock word and the algorithm. In my research, the throughput advantages of PLO far outweigh its overhead. I would love some time with the transactional execution facility. From my reading, it eliminates the need for any serialization other than PLO or transactional execution. Though I understand that IBM has chosen a redrive limit as the determining factor as to whether to fall back to a lock. I believe the only limit to using PLO for serialization is the imagination. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Thanks Binyamin. Also a great example but it brings me to another question. What is the advantage of using PLO compare and fetch? Is it just saving CPU time in the case where the counter has changed? Is there another advantage that I'm not thinking about? Jon Perryman. From: Binyamin Dissen bdis...@dissensoftware.com If you truly need a triple compare and swap then PLO will not help you. But if you need a disjoint double compare and swap, you use the compare-and-swap field as a counter and then you con do a compare swap and double store. Example: Fetch counter A PLO compare-and-fetch value1 CC0, go to A PLO compare-and-fetch value 2 CC0, go to A calculate new value1 and 2 Add one to fetched counter PLO CSDST fetched-counter new-fetched-counter, new value1, new-value2 CC0, go to A -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Yes, it is possible that the updates are not performed in any order. However, it is guaranteed that the updates are only performed if the swap can be done. Therefore, I use a simple rule. If the number of instructions needed to compute the new chain pointers are small (as is the case in my example). I don't incur the overhead of doing the extra 2 PLO (Compare and Load) operations. I simply re-drive the operation as shown in Binyamin's example. Even with the PLO Compare and Load, there is no guarantee the swap will succeed. It just lessens the likelihood. So the decision point is whether the overhead of 2 additional PLO instructions is less than the overhead of a re-drive. This can only be determined with testing. You can determine this by using a CS to update a counter for every re-drive. You already have an operation count, so you can then easily determine the percentage of re-drives. In my experience, even in very active chains, the PLO serialization process will incur a very small number of re-drives (much less than 1 percent). But only testing can reveal that. -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Monday, November 04, 2013 11:15 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque My understanding is with multi-threading it is possible that the updates to the fields may be out of order and thus it is possible to fetch the updated counter with the unupdated value1. PLO serializes it. On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman jperr...@pacbell.net wrote: :Thanks Binyamin. Also a great example but it brings me to another question. What is the advantage of using PLO compare and fetch? Is it just saving CPU time in the case where the counter has changed? Is there another advantage that I'm not thinking about? : :Jon Perryman. : : : : : From: Binyamin Dissen bdis...@dissensoftware.com : : : :If you truly need a triple compare and swap then PLO will not help you. But if :you need a disjoint double compare and swap, you use the compare-and-swap :field as a counter and then you con do a compare swap and double store. : :Example: : : Fetch counter :A PLO compare-and-fetch value1 : CC0, go to A : PLO compare-and-fetch value 2 : CC0, go to A : calculate new value1 and 2 : Add one to fetched counter : PLO CSDST fetched-counter new-fetched-counter, new value1, new-value2 : CC0, go to A : : : : :-- :For IBM-MAIN subscribe / signoff / archive access instructions, :send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson redb...@austin.rr.com wrote: :Yes, it is possible that the updates are not performed in any order. :However, it is guaranteed that the updates are only performed if the swap :can be done. Therefore, I use a simple rule. If the number of instructions :needed to compute the new chain pointers are small (as is the case in my :example). I don't incur the overhead of doing the extra 2 PLO (Compare and :Load) operations. I simply re-drive the operation as shown in Binyamin's :example. Even with the PLO Compare and Load, there is no guarantee the swap :will succeed. It just lessens the likelihood. So the decision point is :whether the overhead of 2 additional PLO instructions is less than the :overhead of a re-drive. This can only be determined with testing. You can :determine this by using a CS to update a counter for every re-drive. You :already have an operation count, so you can then easily determine the :percentage of re-drives. In my experience, even in very active chains, the :PLO serialization process will incur a very small number of re-drives (much :less than 1 percent). But only testing can reveal that. : :-Original Message- :From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :Behalf Of Binyamin Dissen :Sent: Monday, November 04, 2013 11:15 AM :To: IBM-MAIN@LISTSERV.UA.EDU :Subject: Re: Serialization without Enque : :My understanding is with multi-threading it is possible that the updates to :the fields may be out of order and thus it is possible to fetch the updated :counter with the unupdated value1. PLO serializes it. : :On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman jperr...@pacbell.net wrote: : ::Thanks Binyamin. Also a great example but it brings me to another :question. What is the advantage of using PLO compare and fetch? Is it just :saving CPU time in the case where the counter has changed? Is there another :advantage that I'm not thinking about? :: ::Jon Perryman. :: :: :: :: :: From: Binyamin Dissen bdis...@dissensoftware.com : : : :If you :truly need a triple compare and swap then PLO will not help you. But if ::you need a disjoint double compare and swap, you use the compare-and-swap ::field as a counter and then you con do a compare swap and double store. :: ::Example: :: :: Fetch counter ::A PLO compare-and-fetch value1 :: CC0, go to A :: PLO compare-and-fetch value 2 :: CC0, go to A :: calculate new value1 and 2 :: Add one to fetched counter :: PLO CSDST fetched-counter new-fetched-counter, new value1, :new-value2 : CC0, go to A : : : : ::-- ::For IBM-MAIN subscribe / signoff / archive access instructions, :send :email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I'm glad you brought that up because I knew what I have been doing for years was correct but I hadn't taken the time to read the manual on PLO in some time. The order of stores is unpredictable except that according to the POM, operand 2 (in this case, the count) is always stored last. In those cases when a store is performed to the second- operand location and one or more of the fourth-, sixth-, and eighth-operand locations, the store to the second-operand location is always performed last, as observed by other CPUs and by channel programs. Page 7-290 right column top half of page in SA22-7832-09, 7-281 in SA22-7832-08 Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Binyamin Dissen Sent: Monday, November 04, 2013 2:02 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson redb...@austin.rr.com wrote: :Yes, it is possible that the updates are not performed in any order. :However, it is guaranteed that the updates are only performed if the swap :can be done. Therefore, I use a simple rule. If the number of instructions :needed to compute the new chain pointers are small (as is the case in my :example). I don't incur the overhead of doing the extra 2 PLO (Compare and :Load) operations. I simply re-drive the operation as shown in Binyamin's :example. Even with the PLO Compare and Load, there is no guarantee the swap :will succeed. It just lessens the likelihood. So the decision point is :whether the overhead of 2 additional PLO instructions is less than the :overhead of a re-drive. This can only be determined with testing. You can :determine this by using a CS to update a counter for every re-drive. You :already have an operation count, so you can then easily determine the :percentage of re-drives. In my experience, even in very active chains, the :PLO serialization process will incur a very small number of re-drives (much :less than 1 percent). But only testing can reveal that. : :-Original Message- :From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :Behalf Of Binyamin Dissen :Sent: Monday, November 04, 2013 11:15 AM :To: IBM-MAIN@LISTSERV.UA.EDU :Subject: Re: Serialization without Enque : :My understanding is with multi-threading it is possible that the updates to :the fields may be out of order and thus it is possible to fetch the updated :counter with the unupdated value1. PLO serializes it. : :On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman jperr...@pacbell.net wrote: : ::Thanks Binyamin. Also a great example but it brings me to another :question. What is the advantage of using PLO compare and fetch? Is it just :saving CPU time in the case where the counter has changed? Is there another :advantage that I'm not thinking about? :: ::Jon Perryman. :: :: :: :: :: From: Binyamin Dissen bdis...@dissensoftware.com : : : :If you :truly need a triple compare and swap then PLO will not help you. But if ::you need a disjoint double compare and swap, you use the compare-and-swap ::field as a counter and then you con do a compare swap and double store. :: ::Example: :: :: Fetch counter ::A PLO compare-and-fetch value1 :: CC0, go to A :: PLO compare-and-fetch value 2 :: CC0, go to A :: calculate new value1 and 2 :: Add one to fetched counter :: PLO CSDST fetched-counter new-fetched-counter, new value1, :new-value2 : CC0, go to A : : : : ::-- ::For IBM-MAIN subscribe / signoff / archive access instructions, :send :email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Thanks for pointing out that it's required to do the PLO COMPARE against the counter and FETCH of the value otherwise there is no guarantee that value1 is consistent with the counter. I'm also hearing you say that programs that reference more than a single word, must use PLO COMPARE and FETCH. In Kenneth's example where he uses PLO to save 64 bit addresses (which is 2 words), he can't use LG to reference the 64 bit address otherwise he risks using high and low register values that do not match. Is that correct? Jon Perryman. From: Binyamin Dissen bdis...@dissensoftware.com That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson redb...@austin.rr.com wrote: :Yes, it is possible that the updates are not performed in any order. :However, it is guaranteed that the updates are only performed if the swap :can be done. Therefore, I use a simple rule. If the number of instructions :needed to compute the new chain pointers are small (as is the case in my :example). I don't incur the overhead of doing the extra 2 PLO (Compare and :Load) operations. I simply re-drive the operation as shown in Binyamin's :example. Even with the PLO Compare and Load, there is no guarantee the swap :will succeed. It just lessens the likelihood. So the decision point is :whether the overhead of 2 additional PLO instructions is less than the :overhead of a re-drive. This can only be determined with testing. You can :determine this by using a CS to update a counter for every re-drive. You :already have an operation count, so you can then easily determine the :percentage of re-drives. In my experience, even in very active chains, the :PLO serialization process will incur a very small number of re-drives (much :less than 1 percent). But only testing can reveal that. : :-Original Message- :From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :Behalf Of Binyamin Dissen :Sent: Monday, November 04, 2013 11:15 AM :To: IBM-MAIN@LISTSERV.UA.EDU :Subject: Re: Serialization without Enque : :My understanding is with multi-threading it is possible that the updates to :the fields may be out of order and thus it is possible to fetch the updated :counter with the unupdated value1. PLO serializes it. : :On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman jperr...@pacbell.net wrote: : ::Thanks Binyamin. Also a great example but it brings me to another :question. What is the advantage of using PLO compare and fetch? Is it just :saving CPU time in the case where the counter has changed? Is there another :advantage that I'm not thinking about? :: ::Jon Perryman. :: :: :: :: :: From: Binyamin Dissen bdis...@dissensoftware.com : : : :If you :truly need a triple compare and swap then PLO will not help you. But if ::you need a disjoint double compare and swap, you use the compare-and-swap ::field as a counter and then you con do a compare swap and double store. :: ::Example: :: :: Fetch counter ::A PLO compare-and-fetch value1 :: CC0, go to A :: PLO compare-and-fetch value 2 :: CC0, go to A :: calculate new value1 and 2 :: Add one to fetched counter :: PLO CSDST fetched-counter new-fetched-counter, new value1, :new-value2 : CC0, go to A : : : : ::-- ::For IBM-MAIN subscribe / signoff / archive access instructions, :send :email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
This is not correct. The choice to PLO compare and load is not required since the count is always guaranteed to be swapped after the stores (my last email). I only use PLO Compare and load for complex chain manipulations. But do it if you want. The serialization performed by a PLO forces serialization on the lock word for all processors. I try to Avoid it for situations where a re-drive is less costly -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 04, 2013 2:42 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for pointing out that it's required to do the PLO COMPARE against the counter and FETCH of the value otherwise there is no guarantee that value1 is consistent with the counter. I'm also hearing you say that programs that reference more than a single word, must use PLO COMPARE and FETCH. In Kenneth's example where he uses PLO to save 64 bit addresses (which is 2 words), he can't use LG to reference the 64 bit address otherwise he risks using high and low register values that do not match. Is that correct? Jon Perryman. From: Binyamin Dissen bdis...@dissensoftware.com That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson redb...@austin.rr.com wrote: :Yes, it is possible that the updates are not performed in any order. :However, it is guaranteed that the updates are only performed if the swap :can be done. Therefore, I use a simple rule. If the number of instructions :needed to compute the new chain pointers are small (as is the case in my :example). I don't incur the overhead of doing the extra 2 PLO (Compare and :Load) operations. I simply re-drive the operation as shown in Binyamin's :example. Even with the PLO Compare and Load, there is no guarantee the swap :will succeed. It just lessens the likelihood. So the decision point is :whether the overhead of 2 additional PLO instructions is less than the :overhead of a re-drive. This can only be determined with testing. You can :determine this by using a CS to update a counter for every re-drive. You :already have an operation count, so you can then easily determine the :percentage of re-drives. In my experience, even in very active chains, the :PLO serialization process will incur a very small number of re-drives (much :less than 1 percent). But only testing can reveal that. : :-Original Message- :From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :Behalf Of Binyamin Dissen :Sent: Monday, November 04, 2013 11:15 AM :To: IBM-MAIN@LISTSERV.UA.EDU :Subject: Re: Serialization without Enque : :My understanding is with multi-threading it is possible that the updates to :the fields may be out of order and thus it is possible to fetch the updated :counter with the unupdated value1. PLO serializes it. : :On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman jperr...@pacbell.net wrote: : ::Thanks Binyamin. Also a great example but it brings me to another :question. What is the advantage of using PLO compare and fetch? Is it just :saving CPU time in the case where the counter has changed? Is there another :advantage that I'm not thinking about? :: ::Jon Perryman. :: :: :: :: :: From: Binyamin Dissen bdis...@dissensoftware.com : : : :If you :truly need a triple compare and swap then PLO will not help you. But if ::you need a disjoint double compare and swap, you use the compare-and-swap ::field as a counter and then you con do a compare swap and double store. :: ::Example: :: :: Fetch counter ::A PLO compare-and-fetch value1 :: CC0, go to A :: PLO compare-and-fetch value 2 :: CC0, go to A :: calculate new value1 and 2 :: Add one to fetched counter :: PLO CSDST fetched-counter new-fetched-counter, new value1, :new-value2 : CC0, go to A : : : : ::--- --- ::For IBM-MAIN subscribe / signoff / archive access instructions, :send :email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM
Re: Serialization without Enque
As you say, PLO only locks CPU's using the same PLO lock word. For other CPU's not using the lockword, it is consider multiple unique instructions. So in the case of the 64 bit address, PLO CSDST, it is considered compare, store value1, store value2, store swap value. Although it's unlikely, it is possible for the LG instruction to occur after store value1 but before store value2. Or are the stores considered a single occurrance instruction to the other CPU's? Thanks, Jon Peryman. From: Kenneth Wilkerson redb...@austin.rr.com To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 4, 2013 1:06 PM Subject: Re: Serialization without Enque This is not correct. The choice to PLO compare and load is not required since the count is always guaranteed to be swapped after the stores (my last email). I only use PLO Compare and load for complex chain manipulations. But do it if you want. The serialization performed by a PLO forces serialization on the lock word for all processors. I try to Avoid it for situations where a re-drive is less costly -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 04, 2013 2:42 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for pointing out that it's required to do the PLO COMPARE against the counter and FETCH of the value otherwise there is no guarantee that value1 is consistent with the counter. I'm also hearing you say that programs that reference more than a single word, must use PLO COMPARE and FETCH. In Kenneth's example where he uses PLO to save 64 bit addresses (which is 2 words), he can't use LG to reference the 64 bit address otherwise he risks using high and low register values that do not match. Is that correct? Jon Perryman. From: Binyamin Dissen bdis...@dissensoftware.com That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson redb...@austin.rr.com wrote: :Yes, it is possible that the updates are not performed in any order. :However, it is guaranteed that the updates are only performed if the swap :can be done. Therefore, I use a simple rule. If the number of instructions :needed to compute the new chain pointers are small (as is the case in my :example). I don't incur the overhead of doing the extra 2 PLO (Compare and :Load) operations. I simply re-drive the operation as shown in Binyamin's :example. Even with the PLO Compare and Load, there is no guarantee the swap :will succeed. It just lessens the likelihood. So the decision point is :whether the overhead of 2 additional PLO instructions is less than the :overhead of a re-drive. This can only be determined with testing. You can :determine this by using a CS to update a counter for every re-drive. You :already have an operation count, so you can then easily determine the :percentage of re-drives. In my experience, even in very active chains, the :PLO serialization process will incur a very small number of re-drives (much :less than 1 percent). But only testing can reveal that. : :-Original Message- :From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :Behalf Of Binyamin Dissen :Sent: Monday, November 04, 2013 11:15 AM :To: IBM-MAIN@LISTSERV.UA.EDU :Subject: Re: Serialization without Enque : :My understanding is with multi-threading it is possible that the updates to :the fields may be out of order and thus it is possible to fetch the updated :counter with the unupdated value1. PLO serializes it. : :On Mon, 4 Nov 2013 07:46:51 -0800 Jon Perryman jperr...@pacbell.net wrote: : ::Thanks Binyamin. Also a great example but it brings me to another :question. What is the advantage of using PLO compare and fetch? Is it just :saving CPU time in the case where the counter has changed? Is there another :advantage that I'm not thinking about? :: ::Jon Perryman. :: :: :: :: :: From: Binyamin Dissen bdis...@dissensoftware.com : : : :If you :truly need a triple compare and swap then PLO will not help you. But if ::you need a disjoint double compare and swap, you use the compare-and-swap ::field as a counter and then you con do a compare swap and double store. :: ::Example: :: :: Fetch counter ::A PLO compare-and-fetch value1 :: CC0, go to A :: PLO compare-and-fetch value 2 :: CC0, go to A :: calculate new value1 and 2 :: Add one to fetched counter :: PLO CSDST fetched-counter new-fetched-counter, new value1, :new-value2 : CC0, go to A : : : : ::--- --- ::For IBM-MAIN subscribe / signoff / archive access instructions, :send :email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director
Re: Serialization without Enque
The order of stores is unpredictable except that according to the POM, operand 2 (in this case, the count) is always stored last. In those cases when a store is performed to the second- operand location and one or more of the fourth-, sixth-, and eighth-operand locations, the store to the second-operand location is always performed last, as observed by other CPUs and by channel programs. Page 7-290 right column top half of page in SA22-7832-09, 7-281 in SA22-7832-08 So it's impossible for the count to be updated before the stores. I've been using and relying on these techniques for years with exhaustive testing under high workloads with re-drive statistics to help me decide the algorithm that I use. It can't hurt to do the PLO Compare and Load. It just adds overhead that is probably more efficiently handled by a re-drive. But it's up to you. I suggest you add redrive counter to your test case and see for yourself. I would be extremely surprised if the re-drive percent were ever higher than a small fraction of 1% no matter how hard you drove the chain. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 04, 2013 3:31 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque As you say, PLO only locks CPU's using the same PLO lock word. For other CPU's not using the lockword, it is consider multiple unique instructions. So in the case of the 64 bit address, PLO CSDST, it is considered compare, store value1, store value2, store swap value. Although it's unlikely, it is possible for the LG instruction to occur after store value1 but before store value2. Or are the stores considered a single occurrance instruction to the other CPU's? Thanks, Jon Peryman. From: Kenneth Wilkerson redb...@austin.rr.com To: IBM-MAIN@LISTSERV.UA.EDU Sent: Monday, November 4, 2013 1:06 PM Subject: Re: Serialization without Enque This is not correct. The choice to PLO compare and load is not required since the count is always guaranteed to be swapped after the stores (my last email). I only use PLO Compare and load for complex chain manipulations. But do it if you want. The serialization performed by a PLO forces serialization on the lock word for all processors. I try to Avoid it for situations where a re-drive is less costly -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: Monday, November 04, 2013 2:42 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque Thanks for pointing out that it's required to do the PLO COMPARE against the counter and FETCH of the value otherwise there is no guarantee that value1 is consistent with the counter. I'm also hearing you say that programs that reference more than a single word, must use PLO COMPARE and FETCH. In Kenneth's example where he uses PLO to save 64 bit addresses (which is 2 words), he can't use LG to reference the 64 bit address otherwise he risks using high and low register values that do not match. Is that correct? Jon Perryman. From: Binyamin Dissen bdis...@dissensoftware.com That won't help if you fetch the new count and the old value1. On Mon, 4 Nov 2013 11:38:38 -0600 Kenneth Wilkerson redb...@austin.rr.com wrote: :Yes, it is possible that the updates are not performed in any order. :However, it is guaranteed that the updates are only performed if the swap :can be done. Therefore, I use a simple rule. If the number of instructions :needed to compute the new chain pointers are small (as is the case in my :example). I don't incur the overhead of doing the extra 2 PLO (Compare and :Load) operations. I simply re-drive the operation as shown in Binyamin's :example. Even with the PLO Compare and Load, there is no guarantee the swap :will succeed. It just lessens the likelihood. So the decision point is :whether the overhead of 2 additional PLO instructions is less than the :overhead of a re-drive. This can only be determined with testing. You can :determine this by using a CS to update a counter for every re-drive. You :already have an operation count, so you can then easily determine the :percentage of re-drives. In my experience, even in very active chains, the :PLO serialization process will incur a very small number of re-drives (much :less than 1 percent). But only testing can reveal that. : :-Original Message- :From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :Behalf Of Binyamin Dissen :Sent: Monday, November 04, 2013 11:15 AM :To: IBM-MAIN@LISTSERV.UA.EDU :Subject: Re: Serialization without Enque : :My understanding is with multi-threading it is possible that the updates to :the fields may be out of order and thus it is possible to fetch the updated :counter with the unupdated value1. PLO serializes
Re: Serialization without Enque
I sure missed that one with the locks. PLO CDS does exactly what is wanted. It does 2 CS's within the locked instruction. PLO CSDST on the other hand only does a single CS followed by 2 ST's. Since 3 separate load instructions (not under PLO control) are required when not in contiguous storage, there is not any method that will guarantee the 3 values are consistent with the others. A counter as suggested by Peter Relson won't help either for this same reason. I can't think of a situation where PLO CSDST is useful. Can anyone describe a situation where it is useful? Jon Perryman. From: Rob Scott rsc...@rocketsoftware.com I think the OP stated that his code could hold locks - in which case the latch services cannot be used. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
As a last resort, a GRS latch could be used. Jon Perryman. From: Tony Harminc t...@harminc.net On 1 November 2013 16:22, Donald Likens dlik...@infosecinc.com wrote: I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). CDSG, perhaps? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
I think the OP stated that his code could hold locks - in which case the latch services cannot be used. Rob Scott Lead Developer Rocket Software 77 Fourth Avenue . Suite 100 . Waltham . MA 02451-1468 . USA Tel: +1.781.684.2305 Email: rsc...@rs.com Web: www.rocketsoftware.com -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jon Perryman Sent: 02 November 2013 19:30 To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Serialization without Enque As a last resort, a GRS latch could be used. Jon Perryman. From: Tony Harminc t...@harminc.net On 1 November 2013 16:22, Donald Likens dlik...@infosecinc.com wrote: I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). CDSG, perhaps? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
Have you considered whether any of the PLO functions would do the trick? More expensive than CDS, but perhaps they would help? HTH Peter -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Donald Likens Sent: Friday, November 01, 2013 4:23 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Serialization without Enque My code must be able to run in SRB mode and with locks held. I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). I have attempted to use a combination of CS instructions to resolve this problem but it does not work. I know this will work if I use a CMS lock but I am concerned about affecting the whole system. Any advice? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
PLO with the CSDST function (compare swap and double store) Sent from my iPhone On 1 Nov 2013, at 20:22, Donald Likens dlik...@infosecinc.com wrote: My code must be able to run in SRB mode and with locks held. I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). I have attempted to use a combination of CS instructions to resolve this problem but it does not work. I know this will work if I use a CMS lock but I am concerned about affecting the whole system. Any advice? -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Serialization without Enque
On 1 November 2013 16:22, Donald Likens dlik...@infosecinc.com wrote: I have a situation where I need to serialize processing and cannot use CDS because the two addresses being updated cannot be next to each other (because I use CDS with these two addresses with other addresses). CDSG, perhaps? Tony H. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN