Re: Is there a significant performance penalty for non-aligned operands?
On Mon, 2 Jun 2014 11:45:38 -0700 Ed Jaffe edja...@phoenixsoftware.com wrote: :NIL and OIL are not required on z196 and higher machines. NI and OI now :do the necessary serialization. Do I understand correctly? NI/OI do the same serialization that CS does? That two OI's issued on two processors will guarantee the result of both bits set? -- Binyamin Dissen bdis...@dissensoftware.com http://www.dissensoftware.com Director, Dissen Software, Bar Grill - Israel Should you use the mailblocks package and expect a response from me, you should preauthorize the dissensoftware.com domain. I very rarely bother responding to challenge/response systems, especially those from irresponsible companies. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Binyamin Dissen wrote: On Mon, 2 Jun 2014 11:45:38 -0700 Ed Jaffe edja...@phoenixsoftware.com wrote: :NIL and OIL are not required on z196 and higher machines. NI and OI now :do the necessary serialization. Do I understand correctly? NI/OI do the same serialization that CS does? That two OI's issued on two processors will guarantee the result of both bits set? For OR (OI, OIY), the first operand is one byte in length, and only one byte is stored. When the inter-locked-access facility 2 is installed, the update of the first operand appears to be an interlocked-update ref-erence as observed by other CPUs and channel pro-grams, and a specific-operand-serialization operation is performed. Facility bit 52. Bob -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Early in my career I developed a habit when writing ASM code. In 'working storage', code DS/DC fields in this order unless some other structural sequence is required: double words first, then full words, half words, and finally character (C or X) fields of whatever length. Most macros generate their own alignment instructions. Always heed alignment warnings from the assembler. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 626-302-7535 Office 323-715-0595 Mobile jo.skip.robin...@sce.com From: Lloyd leful...@sbcglobal.net To: IBM-MAIN@LISTSERV.UA.EDU, Date: 06/01/2014 06:12 PM Subject:Re: Is there a significant performance penalty for non-aligned operands? Sent by:IBM Mainframe Discussion List IBM-MAIN@LISTSERV.UA.EDU On 6/1/2014 7:51 PM, Peter Relson wrote: I believe the answer is no, all other things being equal. But if one of those other things is that the non-aligned operand spans cache lines whereas an aligned one would not, then the answer could be a huge yes. Things such as doubleword (or quadword) consistency could not be relied upon, depending on the degree of non-alignment. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN There are times, though, when things have to be aligned for specific instructions whether hardware instructions or supervisor calls (not necessarily SVC) that things HAVE to be on a doubleword or quadword boundary. What I usually do then is defne a variable larger by the appropriate number of bytes and verify by shifting the address that I am aligned. Lloyd -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Once upon a time IIRC there was a significant performance penalty for non-aligned operands (loading a fullword from an address not evenly divisible by four, etc.). Does that still exist for modern Z processors? (Once upon a time it didn't work at all, but that's AFH, to use an acronym I learned this week.) I have a string that looks like halfword, char, char, ..., halfword, char, char, ... . It will be accessed millions of times per day in a system exit. Would it be better to pack the halfwords immediately following the chars, or halfword-align them with a slack byte where necessary? The access will be read-only if that makes a difference. Response from hardware designer: It depends on the system. z10 and before, if an operand crossed a DW there was a penalty. On z196 and later, it is only if an operand crosses a cache line. Of course things could change again in the future. Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Pardon my ignorance: what exactly is a cache line? Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Jim Mulder Sent: Monday, June 02, 2014 8:13 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a significant performance penalty for non-aligned operands? Once upon a time IIRC there was a significant performance penalty for non-aligned operands (loading a fullword from an address not evenly divisible by four, etc.). Does that still exist for modern Z processors? (Once upon a time it didn't work at all, but that's AFH, to use an acronym I learned this week.) I have a string that looks like halfword, char, char, ..., halfword, char, char, ... . It will be accessed millions of times per day in a system exit. Would it be better to pack the halfwords immediately following the chars, or halfword-align them with a slack byte where necessary? The access will be read-only if that makes a difference. Response from hardware designer: It depends on the system. z10 and before, if an operand crossed a DW there was a penalty. On z196 and later, it is only if an operand crosses a cache line. Of course things could change again in the future. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
On 6/2/2014 11:01 AM, Skip Robinson wrote: Early in my career I developed a habit when writing ASM code. In 'working storage', code DS/DC fields in this order unless some other structural sequence is required: double words first, then full words, half words, and finally character (C or X) fields of whatever length. Most macros generate their own alignment instructions. Always heed alignment warnings from the assembler. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 626-302-7535 Office 323-715-0595 Mobile jo.skip.robin...@sce.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN I do that also when I am allocating the storage area. Unfortunately, some of the things that I code need to run with Language Environment and needs to use the LE stack. The LE stack is NOT guaranteed to be allocated on a doubleword boundary. It is on a byte boundary basis, and yes, I have seen it allocated at ...1 or .7. That type of code is when I need to do as I proposed. Lloyd -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Pardon my ignorance: what exactly is a cache line? http://en.wikipedia.org/wiki/CPU_cache For purposes of this alignment discussion, the cache line size on IBM mainframes has been 256 (x'100') bytes for at least all of the z/Architecture machines, and I think even for several generations before that. Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Charles Mills wrote: Pardon my ignorance: what exactly is a cache line? The caches are divided into 256-byte increments called cache lines in current processors. -- John Eells z/OS Technical Marketing IBM Poughkeepsie ee...@us.ibm.com -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
In ofe6b203a0.45aeb620-on85257cea.0082e0b5-85257cea.00831...@us.ibm.com, on 06/01/2014 at 07:51 PM, Peter Relson rel...@us.ibm.com said: I believe the answer is no, all other things being equal. What about the performance of Load versus ICM for nonaligned data? -- Shmuel (Seymour J.) Metz, SysProg and JOAT ISO position; see http://patriot.net/~shmuel/resume/brief.html We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
On Mon, 2 Jun 2014 11:47:17 -0400, John Eells wrote: Charles Mills wrote: Pardon my ignorance: what exactly is a cache line? The caches are divided into 256-byte increments called cache lines in current processors. (The exact number is probably subject to change.) Is this also the width of the data path to main memory? It would seem sensible to avoid spending cycles fetching data that would never be accessed. Does HLASM and/or STORAGE provide for cache line alignment? I've long wondered why the NIL and OIL macros are necessary to lock access to bits within a byte, whereas the hardware performs the equivalent function for bytes within a cache line. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
... and code CNOP 2,8 or 4,8 (not DS 0H or 0F) at the end of DSECTs Skip Robinson wrote: Early in my career I developed a habit when writing ASM code. In 'working storage', code DS/DC fields in this order unless some other structural sequence is required: double words first, then full words, half words, and finally character (C or X) fields of whatever length. Most macros generate their own alignment instructions. Always heed alignment warnings from the assembler. . . J.O.Skip Robinson Southern California Edison Company Electric Dragon Team Paddler SHARE MVS Program Co-Manager 626-302-7535 Office 323-715-0595 Mobile jo.skip.robin...@sce.com From: Lloyd leful...@sbcglobal.net To: IBM-MAIN@LISTSERV.UA.EDU, Date: 06/01/2014 06:12 PM Subject:Re: Is there a significant performance penalty for non-aligned operands? Sent by:IBM Mainframe Discussion List IBM-MAIN@LISTSERV.UA.EDU On 6/1/2014 7:51 PM, Peter Relson wrote: I believe the answer is no, all other things being equal. But if one of those other things is that the non-aligned operand spans cache lines whereas an aligned one would not, then the answer could be a huge yes. Things such as doubleword (or quadword) consistency could not be relied upon, depending on the degree of non-alignment. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN There are times, though, when things have to be aligned for specific instructions whether hardware instructions or supervisor calls (not necessarily SVC) that things HAVE to be on a doubleword or quadword boundary. What I usually do then is defne a variable larger by the appropriate number of bytes and verify by shifting the address that I am aligned. Lloyd -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
On 6/2/2014 8:01 AM, Skip Robinson wrote: Early in my career I developed a habit when writing ASM code. In 'working storage', code DS/DC fields in this order unless some other structural sequence is required: double words first, then full words, half words, and finally character (C or X) fields of whatever length. These days, fields at the top of a data area (i.e., lowest positive addresses for base register coverage) tend to be those that must be accessed via instructions that support only 12-bit displacements, e.g. MVC. Fields referenced by instructions with 20-bit displacements (e.g., L can be replaced with LY, LLGT, LLGF, LGF, etc.) can appear much, much further away or even in a negative direction (though I have yet to leverage that) from the origin of the data area. -- Edward E Jaffe Phoenix Software International, Inc 831 Parkview Drive North El Segundo, CA 90245 http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
On 6/2/2014 9:06 AM, Paul Gilmartin wrote: On Mon, 2 Jun 2014 11:47:17 -0400, John Eells wrote: The caches are divided into 256-byte increments called cache lines in current processors. (The exact number is probably subject to change.) Modern processors implement instructions a program can use to learn about the cache configuration on the machine, though I don't expect the line size to change in my lifetime. Does HLASM and/or STORAGE provide for cache line alignment? For a long, long time now... I've long wondered why the NIL and OIL macros are necessary to lock access to bits within a byte, whereas the hardware performs the equivalent function for bytes within a cache line. NIL and OIL are not required on z196 and higher machines. NI and OI now do the necessary serialization. -- Edward E Jaffe Phoenix Software International, Inc 831 Parkview Drive North El Segundo, CA 90245 http://www.phoenixsoftware.com/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Thanks, John. My question was indeed more what is? than how big? Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John Eells Sent: Monday, June 02, 2014 8:47 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a significant performance penalty for non-aligned operands? Charles Mills wrote: Pardon my ignorance: what exactly is a cache line? The caches are divided into 256-byte increments called cache lines in current processors. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
There is a performance penalty. I have measured it for aligned and unaligned signed halfword, i.e., signed binary fixed(15,0), in compiled PL/I code and found that it is usually about 13%, which may be trivial or important depending upon context. More important in multiple-CP environments, I think, is that there are contexts in which alignment|non-alignment determines whether an operation is performed as an interlocked|non-interlocked update. See the discussion of the ASI and AGSI instructions on page 7-25 of the current PrOp. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
Thanks, John, exactly the sort of information I was looking for. Do you recall what the hardware was? I am leaning toward aligning it. If I can gain a couple of milliseconds of CPU time per customer per day in return for coding + 1 0xfffe once I think that's worth it. OTOH, it will make the whole structure a little bit bigger. There is an absolute downside to using more ECSA, and it also carries a performance penalty of its own: a byte of padding decreases the likelihood that the byte you need is already in cache. Re-emphasizing what I said, the references in question are read-only. The structure is effectively write once, read many. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John Gilmore Sent: Sunday, June 01, 2014 6:19 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a significant performance penalty for non-aligned operands? There is a performance penalty. I have measured it for aligned and unaligned signed halfword, i.e., signed binary fixed(15,0), in compiled PL/I code and found that it is usually about 13%, which may be trivial or important depending upon context. More important in multiple-CP environments, I think, is that there are contexts in which alignment|non-alignment determines whether an operation is performed as an interlocked|non-interlocked update. See the discussion of the ASI and AGSI instructions on page 7-25 of the current PrOp. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
I believe the answer is no, all other things being equal. But if one of those other things is that the non-aligned operand spans cache lines whereas an aligned one would not, then the answer could be a huge yes. Things such as doubleword (or quadword) consistency could not be relied upon, depending on the degree of non-alignment. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a significant performance penalty for non-aligned operands?
On 6/1/2014 7:51 PM, Peter Relson wrote: I believe the answer is no, all other things being equal. But if one of those other things is that the non-aligned operand spans cache lines whereas an aligned one would not, then the answer could be a huge yes. Things such as doubleword (or quadword) consistency could not be relied upon, depending on the degree of non-alignment. Peter Relson z/OS Core Technology Design -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN There are times, though, when things have to be aligned for specific instructions whether hardware instructions or supervisor calls (not necessarily SVC) that things HAVE to be on a doubleword or quadword boundary. What I usually do then is defne a variable larger by the appropriate number of bytes and verify by shifting the address that I am aligned. Lloyd -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN