Re: Unexpected C code
From: "Paul Gilmartin" <0014e0e4a59b-dmarc-requ...@listserv.uga.edu> Sent: Wednesday, April 20, 2022 10:32 AM C programmers don't give a damn about overflows. An unfortunate consequence, probably, of hardware architectures which, unlike 360, lack unsigned instructions, forcing compilers to generate signed instructions for unsigned operations. I think that you will find that in machines that store negative values in two's complement form will produce a correct sum (or difference) using "signed" instructions, since all 32 bits* participate in the addition (or subtraction. On the S/360, the AL and SL instructions set the condition code differently from A and S, but the 32-bit sum or difference is the same as A and S. __ * [or whatever the word size is] --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: Unexpected C code
On 2022-04-20 20:05, Thomas David Rivers wrote: That's a great explanation Thomas. I'm curious though: how come both compilers produce this same sequence of instructions? I'd have thought it was a rather obscure combination. Is it perhaps more common than I'd suspected, or do GCC and Dignus have some common heritage in the back end? No common heritage at all. The IBM compiler produces a similar sequence. Compiler writers are always looking for ways to do more using less. It's pretty well known, as are some other surprising sequences. Here's one that will surprise people... For this source: int foo(int x) { return x / 5; } the Dignus compiler (generating code for 32-bit z/OS) generates: * *** return x / 5; L 15,0(0,1) ; x LR3,15; . SRL 15,31(0); . M 2,@lit_153_0 ; . SRA 2,1(0) ; . ALR 2,15; . LR15,2 ... DS0D @lit_153_0 DC F'1717986919' 0x6667 which is correct, and avoids the division instruction. It takes advantage of the fact the the M)ultiply instruction uses 2 signed 32-bit operands and produces a signed 64-bit result. H. S. Warren, "Hackers Delight" (2015), gives specific code for division by 5 (p. 209), also multiplying by the same magic number as above. Vowels, "Division by 10" (1992), shows division by 10 without using either multiplication or division: only 5 shifts and 5 adds. Division of an 8-bit integer by 10 is achieved in 2 shifts and 2 adds. Floating-point division by 10 is achieved with 5 shifts and 6 adds upon the mantissa. p.s. Even though we don't "know" the timing difference between division and multiplication, it's a sure bet that division takes a lot more time than multiplication on any hardware. So, best to avoid it if you can. In the "good old days" [the 1950s], multiplication and division took about the same time. For the S/360-50, multiplication (M) took 28.75 microseconds, and division (D) took 33.25 microseconds. To both these times, 0.5 microseconds had to be added to both. Thus, M took 29.25 uS D took 33.75 uS
Re: ASM500Ws after Applying Z16 PTF to HLASM
Thanks, John, Good information as always. I asked Frank Chu to open the support case. He did so about an hour ago. Dave At 4/20/2022 03:48 PM, Jonathan Scott wrote: Ref: Your note of Wed, 20 Apr 2022 15:41:44 -0400 Dave Cole writes: > Here is the snippet from an assembly listing... > 93657+ENDCMDS DS0D,F > 93658+DXDCMDS DXD (ENDCMDS+8-TFSCMDS)X > ** ASMA500W Requested alignment exceeds section alignment Any case where the DXD has to be deferred (because it cannot be resolved at the time it is processed during the first pass) will trigger the problem. If anything that might affect the location counter could not be resolved immediately, such as an earlier forward reference, then the address of ENDCMD would need to be resolved by "interlude" processing after the first pass completes, after which the DXD could be successfully resolved. As the alignment of a DXD section is automatically determined by its contents, I don't think that check should be able to fail. (Even LQ in a DXD has a valid representation in object code format, so we allow it). We already know how to fix it, so when we receive the support case we should be able to respond very rapidly. (We could obviously fix it without a support case, but it is easier to give it higher priority when there is a support case). Jonathan Scott, HLASM IBM Hursley, UK
Re: Quadword constant
On Wed, 20 Apr 2022 at 13:03, Charles Mills wrote: > > USING *,16 > I was wondering about R16. Would come in handy. Maybe on the z16...? [There was an old PL/I Optimizer APAR (1980ish?) complaining that a new compiler release could not generate code for certain large source modules that had worked previously. The IBM answer was that there weren't enough registers, and that if that were to be addressed in a future hardware version, then they'd be able to generate larger code blocks. Of course in a sense the compiler folks eventually got what they wanted in the High-Word Facility on zArch.] Tony H.
Re: ASM500Ws after Applying Z16 PTF to HLASM
Ref: Your note of Wed, 20 Apr 2022 15:41:44 -0400 Dave Cole writes: > Here is the snippet from an assembly listing... > 93657+ENDCMDS DS0D,F > 93658+DXDCMDS DXD (ENDCMDS+8-TFSCMDS)X > ** ASMA500W Requested alignment exceeds section alignment Any case where the DXD has to be deferred (because it cannot be resolved at the time it is processed during the first pass) will trigger the problem. If anything that might affect the location counter could not be resolved immediately, such as an earlier forward reference, then the address of ENDCMD would need to be resolved by "interlude" processing after the first pass completes, after which the DXD could be successfully resolved. As the alignment of a DXD section is automatically determined by its contents, I don't think that check should be able to fail. (Even LQ in a DXD has a valid representation in object code format, so we allow it). We already know how to fix it, so when we receive the support case we should be able to respond very rapidly. (We could obviously fix it without a support case, but it is easier to give it higher priority when there is a support case). Jonathan Scott, HLASM IBM Hursley, UK
Re: ASM500Ws after Applying Z16 PTF to HLASM
Hi John, Thanks for your comments. They're helpful. My actual case is similar to your example, but is not quite the same. Below is a snippet from the listing. I am using DXDs to record the length of a csect. It occurs at the end of the assembly, and it assigns a duplication factor to a DXD X that is equal to the length of the csect. All my assemblies end with a similar DXD, but their names are all different. Then later, the Binder will accumulate all the DXDs into an external dummy whose length will match the load module's length. Then the Binder will stash that length into a CXD. Here is the snippet from an assembly listing... 3CB0 93657+ENDCMDS DS0D,F 93658+DXDCMDS DXD (ENDCMDS+8-TFSCMDS)X ** ASMA500W Requested alignment exceeds section alignment ** ASMA435I Record 87 in CSW.Z22.PUBLISHD(@CSECTZ) on volume: CSW00M 3CB4 93659+ DCQ(DXDCMDS) REQUIRED REFERENCE 93661 END , SRCCMDS You will note that while the DXD does not reference any forward defined variable, its dup factor nonetheless resolves to a value derived from a forward address. The peculiar thing is, while every single one of my 200+ assemblies ends with similar logic, only fourteen report the ASMA500W warning. I hope this sheds additional light on this bug of yours. I will raise a support case. Dave Cole President, ColeSoft dbc...@gmail.com (personal) dbc...@colesoft.com (business) 540-456-6518 (cell) At 4/20/2022 05:59 AM, you wrote: Dave Cole, please raise a support case for this. APAR PH40885 exposed a bug when the duplication factor on a DXD involves a forward reference, for example: DXD1 DXD (A)X AEQU 3 The problem is that if a DXD definition has to be deferred because it cannot be resolved immediately, the field which would normally point to the owning section (itself) is used instead to point to some information about the deferred definition, to be resolved after the end of the first pass. In the second pass, when the alignment is supposed to be checked against the alignment of the owning section, it ends up being checked against the deferred definition instead, in which the field corresponding to the alignment is unused, equal to zero, causing the warning. The good news is that this doesn't affect the object code output. Jonathan Scott, HLASM IBM Hursley, UK
Re: Quadword constant
I was wondering about R16. Would come in handy. Charles -Original Message- From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On Behalf Of Steve Smith Sent: Wednesday, April 20, 2022 9:31 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Quadword constant Sheesh. Sorry, I meant ORG *,16. On Wed, Apr 20, 2022 at 12:30 PM Steve Smith wrote: > That's the old-fashioned way. This is the new way: > > USING *,16
Re: Quadword constant
Sheesh. Sorry, I meant ORG *,16. On Wed, Apr 20, 2022 at 12:30 PM Steve Smith wrote: > That's the old-fashioned way. This is the new way: > > USING *,16 > > There are some caveats. For CSECTs, HLASM will complain if SECALGN is > insufficient. For DSECTs, it's your responsibility to ensure the alignment > matches (if it's real important). Fortunately, STORAGE has a corresponding > alignment specification. > > sas > > On Wed, Apr 20, 2022 at 10:50 AM Bob Raicer wrote: > >> Ed; >> >> Of course, what you said about the LQ type of DC is true, and I too >> have used LQ data types in some of my code too. However, the >> SECTALGN requirement is a bit of an issue when assembling code with >> 2**3 (double word) section alignment and which also contains DSECTs >> which map quad word aligned storage areas. I've had to resort to >> schemes like what is shown below (I hope the list server doesn't >> mangle the sample listing too badly). >> >> The reason(s) for still having double word aligned sections is (are) >> a bit lost in antiquity -- inertia is a powerful thing :) >> >> : D-Loc Object Code Stmt Source Statement >> :1 SAMPLE DSECT , >> :2 PRINT ON,DATA >> : 0010 3 REF DCA(QUADITEM) >> :0004 00 4 BYTE DCAL1(0) >> :5 * >> - >> :0005 6 DC >> (*-SAMPLE)+15)/16)*16)-(*-SAMPLE))AL1(0) >> :000D 00 >> :7 * Round up to a Quad Word >> :8 * boundary. >> :9 * >> :0010 10 QUADITEM DCXL16'00' >> :0018 >> : 11 END , >> >
Re: Quadword constant
That's the old-fashioned way. This is the new way: USING *,16 There are some caveats. For CSECTs, HLASM will complain if SECALGN is insufficient. For DSECTs, it's your responsibility to ensure the alignment matches (if it's real important). Fortunately, STORAGE has a corresponding alignment specification. sas On Wed, Apr 20, 2022 at 10:50 AM Bob Raicer wrote: > Ed; > > Of course, what you said about the LQ type of DC is true, and I too > have used LQ data types in some of my code too. However, the > SECTALGN requirement is a bit of an issue when assembling code with > 2**3 (double word) section alignment and which also contains DSECTs > which map quad word aligned storage areas. I've had to resort to > schemes like what is shown below (I hope the list server doesn't > mangle the sample listing too badly). > > The reason(s) for still having double word aligned sections is (are) > a bit lost in antiquity -- inertia is a powerful thing :) > > : D-Loc Object Code Stmt Source Statement > :1 SAMPLE DSECT , > :2 PRINT ON,DATA > : 0010 3 REF DCA(QUADITEM) > :0004 00 4 BYTE DCAL1(0) > :5 * > - > :0005 6 DC > (*-SAMPLE)+15)/16)*16)-(*-SAMPLE))AL1(0) > :000D 00 > :7 * Round up to a Quad Word > :8 * boundary. > :9 * > :0010 10 QUADITEM DCXL16'00' > :0018 > : 11 END , >
Re: Detection of integer overflow
>doesn't current z model have a specific Unsigned Multiply instruction?< ML, MLR, MLG, and MLGR called "Multiply Logical" are unsigned with no cc change since overflow does not occur with 64 and 128 bit results. Don Higgins d...@higgins.net www.don-higgins.net -Original Message- From: IBM Mainframe Assembler List On Behalf Of Paul Gilmartin Sent: Wednesday, April 20, 2022 10:55 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Detection of integer overflow On Apr 20, 2022, at 07:46:49, Ian Worthington wrote: > > Whilst looking at reliable techniques to detect signed and unsigned overflow > in integer multiplication I was checking out the late John Erhman's > "Assembler Language Programming for IBM System z™ Servers" in which I > discovered he presented this problem and solution: > 18.2.13.(2)+ A programmer wanted to test whether the product of two > positive 32-bit binary integers was too large to fit in a 32-bit > register. > I have relied on: Multiply SLDA 32 Test CC for overflow. Works for signed integers in any quadrant. Does "positive" mean both sign bits are zero? Harder for unsigned. For that reason, doesn't current z model have a specific Unsigned Multiply instruction? -- gil
Re: Detection of integer overflow
On Apr 20, 2022, at 07:46:49, Ian Worthington wrote: > > Whilst looking at reliable techniques to detect signed and unsigned overflow > in integer multiplication I was checking out the late John Erhman's > "Assembler Language Programming for IBM System z™ Servers" in which I > discovered he presented this problem and solution: > 18.2.13.(2)+ A programmer wanted to test whether the product of two positive > 32-bit binary > integers was too large to fit in a 32-bit register. > I have relied on: Multiply SLDA 32 Test CC for overflow. Works for signed integers in any quadrant. Does "positive" mean both sign bits are zero? Harder for unsigned. For that reason, doesn't current z model have a specific Unsigned Multiply instruction? -- gil
Re: Quadword constant
Ed; Of course, what you said about the LQ type of DC is true, and I too have used LQ data types in some of my code too. However, the SECTALGN requirement is a bit of an issue when assembling code with 2**3 (double word) section alignment and which also contains DSECTs which map quad word aligned storage areas. I've had to resort to schemes like what is shown below (I hope the list server doesn't mangle the sample listing too badly). The reason(s) for still having double word aligned sections is (are) a bit lost in antiquity -- inertia is a powerful thing :) : D-Loc Object Code Stmt Source Statement : 1 SAMPLE DSECT , : 2 PRINT ON,DATA : 0010 3 REF DC A(QUADITEM) :0004 00 4 BYTE DC AL1(0) : 5 * - :0005 6 DC (*-SAMPLE)+15)/16)*16)-(*-SAMPLE))AL1(0) :000D 00 : 7 * Round up to a Quad Word : 8 * boundary. : 9 * :0010 10 QUADITEM DC XL16'00' :0018 : 11 END ,
Re: Detection of integer overflow
On 2022-04-21 00:19, Seymour J Metz wrote: That has at least two bugs: the first test will incorrectly treat 1*-1 The task is to form the product of two POSITIVE integers. as having an overflow and the second test is testing all of R0, The second test must test R1, as shown, not R0. not just the high bit. From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Ian Worthington [0c9b78d54aea-dmarc-requ...@listserv.uga.edu] Sent: Wednesday, April 20, 2022 9:46 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Detection of integer overflow Whilst looking at reliable techniques to detect signed and unsigned overflow in integer multiplication I was checking out the late John Erhman's "Assembler Language Programming for IBM System z™ Servers" in which I discovered he presented this problem and solution: 18.2.13.(2)+ A programmer wanted to test whether the product of two positive 32-bit binary integers was too large to fit in a 32-bit register. Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32 bits long but appears to be negative, −27785047. An additional test is needed: L 1,X Load first operand M 0,Y Multiply by second operand LTR 0,0 Check high-order 32 bits BNZ NotOK If not zero, product is too big LTR 1,1 Check high-order bit of GR1 BZ ProdOK Branch if high-order 33 bits are 0s - - - Not OK X DC F'75141' Y DC F'56789' One hesitates to suggest it, but surely this cannot be correct? This checks that r0 and r1 are both zero. Surely John meant BNM ProdOk as the final instruction (at least for signed 32 bit integers: no further test is required for unsigned integers, I think.)? Google finds no errata for John's book. It is, of course, much more likely I've misinterpreted something that John made an error!
Re: Detection of integer overflow
The first case is ruled "out of scope" by the wording of the question wherein both inputs are deemed to be positive. (Though I think that makes it a bit of a hokey example). Best wishes / Mejores deseos / Meilleurs vœux Ian ... On Wednesday, April 20, 2022, 04:19:32 PM GMT+2, Seymour J Metz wrote: That has at least two bugs: the first test will incorrectly treat 1*-1 as having an overflow and the second test is testing all of R0, not just the high bit. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Ian Worthington [0c9b78d54aea-dmarc-requ...@listserv.uga.edu] Sent: Wednesday, April 20, 2022 9:46 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Detection of integer overflow Whilst looking at reliable techniques to detect signed and unsigned overflow in integer multiplication I was checking out the late John Erhman's "Assembler Language Programming for IBM System z™ Servers" in which I discovered he presented this problem and solution: 18.2.13.(2)+ A programmer wanted to test whether the product of two positive 32-bit binary integers was too large to fit in a 32-bit register. Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32 bits long but appears to be negative, −27785047. An additional test is needed: L 1,X Load first operand M 0,Y Multiply by second operand LTR 0,0 Check high-order 32 bits BNZ NotOK If not zero, product is too big LTR 1,1 Check high-order bit of GR1 BZ ProdOK Branch if high-order 33 bits are 0s - - - Not OK X DC F'75141' Y DC F'56789' One hesitates to suggest it, but surely this cannot be correct? This checks that r0 and r1 are both zero. Surely John meant BNM ProdOk as the final instruction (at least for signed 32 bit integers: no further test is required for unsigned integers, I think.)? Google finds no errata for John's book. It is, of course, much more likely I've misinterpreted something that John made an error! Best wishes / Mejores deseos / Meilleurs vœux Ian ...
Re: Detection of integer overflow
On 2022-04-20 23:46, Ian Worthington wrote: Whilst looking at reliable techniques to detect signed and unsigned overflow in integer multiplication I was checking out the late John Erhman's "Assembler Language Programming for IBM System z™ Servers" in which I discovered he presented this problem and solution: 18.2.13.(2)+ A programmer wanted to test whether the product of two positive 32-bit binary integers was too large to fit in a 32-bit register. Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32 bits long but appears to be negative, −27785047. An additional test is needed: L 1,X Load first operand M 0,Y Multiply by second operand LTR 0,0 Check high-order 32 bits BNZ NotOK If not zero, product is too big LTR 1,1 Check high-order bit of GR1 BZ ProdOK Branch if high-order 33 bits are 0s This should be BNM PRODOK - - - Not OK X DC F'75141' Y DC F'56789' One hesitates to suggest it, but surely this cannot be correct? This checks that r0 and r1 are both zero. Surely John meant BNM ProdOk as the final instruction (at least for signed 32 bit integers: Correct no further test is required for unsigned integers, I think.)? Correct. Google finds no errata for John's book. It is, of course, much more likely I've misinterpreted something that John made an error!
Re: Detection of integer overflow
That has at least two bugs: the first test will incorrectly treat 1*-1 as having an overflow and the second test is testing all of R0, not just the high bit. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Ian Worthington [0c9b78d54aea-dmarc-requ...@listserv.uga.edu] Sent: Wednesday, April 20, 2022 9:46 AM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Detection of integer overflow Whilst looking at reliable techniques to detect signed and unsigned overflow in integer multiplication I was checking out the late John Erhman's "Assembler Language Programming for IBM System z™ Servers" in which I discovered he presented this problem and solution: 18.2.13.(2)+ A programmer wanted to test whether the product of two positive 32-bit binary integers was too large to fit in a 32-bit register. Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32 bits long but appears to be negative, −27785047. An additional test is needed: L 1,X Load first operand M 0,Y Multiply by second operand LTR 0,0 Check high-order 32 bits BNZ NotOK If not zero, product is too big LTR 1,1 Check high-order bit of GR1 BZ ProdOK Branch if high-order 33 bits are 0s - - - Not OK X DC F'75141' Y DC F'56789' One hesitates to suggest it, but surely this cannot be correct? This checks that r0 and r1 are both zero. Surely John meant BNM ProdOk as the final instruction (at least for signed 32 bit integers: no further test is required for unsigned integers, I think.)? Google finds no errata for John's book. It is, of course, much more likely I've misinterpreted something that John made an error! Best wishes / Mejores deseos / Meilleurs vœux Ian ...
Detection of integer overflow
Whilst looking at reliable techniques to detect signed and unsigned overflow in integer multiplication I was checking out the late John Erhman's "Assembler Language Programming for IBM System z™ Servers" in which I discovered he presented this problem and solution: 18.2.13.(2)+ A programmer wanted to test whether the product of two positive 32-bit binary integers was too large to fit in a 32-bit register. Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32 bits long but appears to be negative, −27785047. An additional test is needed: L 1,X Load first operand M 0,Y Multiply by second operand LTR 0,0 Check high-order 32 bits BNZ NotOK If not zero, product is too big LTR 1,1 Check high-order bit of GR1 BZ ProdOK Branch if high-order 33 bits are 0s - - - Not OK X DC F'75141' Y DC F'56789' One hesitates to suggest it, but surely this cannot be correct? This checks that r0 and r1 are both zero. Surely John meant BNM ProdOk as the final instruction (at least for signed 32 bit integers: no further test is required for unsigned integers, I think.)? Google finds no errata for John's book. It is, of course, much more likely I've misinterpreted something that John made an error! Best wishes / Mejores deseos / Meilleurs vœux Ian ...
Re: Quadword constant
Yes, but I know of no way to define a quadword binary fixed point integer constant in the current HLASM. -- Shmuel (Seymour J.) Metz http://mason.gmu.edu/~smetz3 From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf of Ed Jaffe [edja...@phoenixsoftware.com] Sent: Tuesday, April 19, 2022 10:22 PM To: ASSEMBLER-LIST@LISTSERV.UGA.EDU Subject: Re: Quadword constant On 4/19/2022 7:13 PM, Bob Raicer wrote: > > Having the ability to assemble quadword aligned 128-bit items for > use with these instructions would be helpful. We define quadword-aligned storage areas all the time. For example: Field1 DC LQ'0' Field2 DC LQ'0' Of course, you need to specify the SECTALGN option. We align all of our sections on cache-line boundaries, but technically you don't need more than quadword alignment to make LQ work. -- Phoenix Software International Edward E. Jaffe 831 Parkview Drive North El Segundo, CA 90245 https://secure-web.cisco.com/1ogePMRKwHO_SN3nXQvDcTQByAD0y_PcuSB5iExWepwkCXwevZFLD_lp5TkvnS_DPrwh9gFyyLxQKsbk50j--YszHyZVqZlyf6CzYz_ex-FTyIslsUxWo8_6zZaZRjSWoedf-eDloErk4Qs9VrSsJJzFCz5g1CmhlRBcwvAP9a6KyrvEwOMLZQy-lh8Eleg6YyRCwbuWD4QRJ4MG_-RIpJF32UQT5XbYEdjS32q7XC9l7B4Ym2p2_NCbL9H2r5zJeUYzNo-vn4FkyHeQPZHgrl2cvth1XdXDhpyeZHDfTFPyLeH9b6KKN_on3AUXeewCFZLk6-1ki6uawj_f_2UyIA9O-1J2x-0Q-zsJPbpdEe09rnM1obD0_8th4bTEGtefT9uzu5J9oFWeW7QMGO-z0I_fWx9IcIwce_dv5heMdhdw08AISIF5LzISKGOXRTl0T/https%3A%2F%2Fwww.phoenixsoftware.com%2F This e-mail message, including any attachments, appended messages and the information contained therein, is for the sole use of the intended recipient(s). If you are not an intended recipient or have otherwise received this email message in error, any use, dissemination, distribution, review, storage or copying of this e-mail message and the information contained therein is strictly prohibited. If you are not an intended recipient, please contact the sender by reply e-mail and destroy all copies of this email message and do not otherwise utilize or retain this email message or any or all of the information contained therein. Although this email message and any attachments or appended messages are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by the sender for any loss or damage arising in any way from its opening or use.
Re: ASM500Ws after Applying Z16 PTF to HLASM
Dave Cole, please raise a support case for this. APAR PH40885 exposed a bug when the duplication factor on a DXD involves a forward reference, for example: DXD1 DXD (A)X AEQU 3 The problem is that if a DXD definition has to be deferred because it cannot be resolved immediately, the field which would normally point to the owning section (itself) is used instead to point to some information about the deferred definition, to be resolved after the end of the first pass. In the second pass, when the alignment is supposed to be checked against the alignment of the owning section, it ends up being checked against the deferred definition instead, in which the field corresponding to the alignment is unused, equal to zero, causing the warning. The good news is that this doesn't affect the object code output. Jonathan Scott, HLASM IBM Hursley, UK
Re: Unexpected C code
> > That's a great explanation Thomas. > I'm curious though: how come both compilers produce this > same sequence of instructions? I'd have thought it was a > rather obscure combination. Is it perhaps more common > than I'd suspected, or do GCC and Dignus have some common heritage > in the back end? No common heritage at all. The IBM compiler produces a similar sequence. Compiler writers are always looking for ways to do more using less. It's pretty well known, as are some other surprising sequences. Here's one that will surprise people... For this source: int foo(int x) { return x / 5; } the Dignus compiler (generating code for 32-bit z/OS) generates: * *** return x / 5; L 15,0(0,1) ; x LR3,15; . SRL 15,31(0); . M 2,@lit_153_0 ; . SRA 2,1(0) ; . ALR 2,15; . LR15,2 ... DS0D @lit_153_0 DC F'1717986919' 0x6667 which is correct, and avoids the division instruction. It takes advantage of the fact the the M)ultiply instruction uses 2 signed 32-bit operands and produces a signed 64-bit result. - Dave Rivers - p.s. Even though we don't "know" the timing difference between division and multiplication, it's a sure bet that division takes a lot more time than multiplication on any hardware. So, best to avoid it if you can. -- riv...@dignus.comWork: (919) 676-0847 Get your mainframe programming tools at http://www.dignus.com
Re: Unexpected C code
Many thanks, comments below ... Am 20.04.2022 um 04:17 schrieb Thomas David Rivers: The "secret" is in the operation of the LPR and LCR instructions for the 2's complement maximum negative value (X'8000'): These notes in the Principles of Operation give a hint: LPR: An overflow condition occurs when the maximum negative number is complemented; the number remains unchanged. I did not see this hint in the PoOp; if I did, I would have understood much better; So this special situation is the only case where LPR leaves a negative sign bit in the target register? Which turns the name of the instruction (LPR) a bit strange in this case ... LCR: Zero and the maximum negative number remain unchanged. An overflow condition occurs when the maximum negative number is complemented. Same here, LCR does not "load the complement" for the maximum negative number. When I first learned about this instruction, I thought that it would do something completely different; I later learned that to get a real complement, you need XR with all ones. So, as it happens, the LPR of the most negative number (X'8000') produces X'8000' as its result (and sets overflow, which is ignored.) And the same thing happens for the LCR instruction. There is something new to learn every single day ...
Re: Unexpected C code
That's a great explanation Thomas. I'm curious though: how come both compilers produce this same sequence of instructions? I'd have thought it was a rather obscure combination. Is it perhaps more common than I'd suspected, or do GCC and Dignus have some common heritage in the back end? Best wishes / Mejores deseos / Meilleurs vœux Ian ... On Wednesday, April 20, 2022, 04:17:32 AM GMT+2, Thomas David Rivers wrote: I thought I'd bring an explanation to what's going on here... Let's consider the following short C example (just to have something to compile): foo() { unsigned char ovfl; int ccpm, carrybit; ccpm = bar(); carrybit=bar2(); ovfl = (ccpm & carrybit) != 0; blah(ovfl); } The functions bar(), bar2() and blah() are simply external to this source (compilation unit in C terms) and are there so the optimizer doesn't have a clue about the possible values of the variables. When I compile this for z/OS (31-bit mode) with the Dignus compiler, I get this code: * *** ccpm = bar(); carrybit=bar2(); L 15,@lit_153_0 ; bar @@gen_label0 DS 0H BALR 14,15 @@gen_label1 DS 0H L 1,@lit_153_1 ; bar2 LR 2,15 LR 15,1 @@gen_label2 DS 0H BALR 14,15 @@gen_label3 DS 0H * *** * *** ovfl = (ccpm & carrybit) != 0; NR 2,15 LPR 2,2 LCR 2,2 SRL 2,31(0) * *** * *** blah(ovfl); STC 2,80(0,13) ; ovfl which is similar to what's going on with GCC. (The values happen to be in registers though.) Now, how does this work? 1) The two values are AND'd together (this is just a bit-wise/logical AND operation). 2) The absolute value is taken (making the 2's complement sign bit a zero) with the LPR instruction. So we now have either a zero or non-zero (positive) value (or a special case which we'll see below.) 3) The 2's complement of that is taken. If the value is zero, the result is zero - otherwise the result is a negative value (and the sign-bit will be set) (or - another special case, which we'll see below.) 4) The sign-bit is shifted right 31 times to result in either a X'' or X'0001' in the the final result. So, what's going on in step #2 and why does that work? Especially if we consider that the result of the AND sets the sign-bit? Note that the only value from the AND that is interesting is the situation where the AND results in the sign bit being set, which presumably is cleared after the LPR. Hence the confusion. The "secret" is in the operation of the LPR and LCR instructions for the 2's complement maximum negative value (X'8000'): These notes in the Principles of Operation give a hint: LPR: An overflow condition occurs when the maximum negative number is complemented; the number remains unchanged. LCR: Zero and the maximum negative number remain unchanged. An overflow condition occurs when the maximum negative number is complemented. So, as it happens, the LPR of the most negative number (X'8000') produces X'8000' as its result (and sets overflow, which is ignored.) And the same thing happens for the LCR instruction. Going through the steps, when the result of the AND is X'800', we get these values: LPR ==> X'8000' LCR ==> X'8000' SRL ==> X'0001' And, for the X'000' value we have: LPR ==> X'' LCR ==> X'' SRL ==> X'' For any other situation where the AND operation produces a negative value (the sign bit is set) you'll have a value which isn't the most negative. Thus some of the lower-order bits (the non-sign-bit) will be set. If we have, for example, X'8xxx' then LPR ==> X'0xxx' LCR ==> X'8...' (whatever the 2's complement of 0xxx is) SRL ==> X'0001' Then we only need to consider the situation where the result of the AND is non-zero but positive, which is just an innocuous execution of the LPR instruction, which does "nothing" and proceeds as above. It's a clever sequence of instructions to produce a zero or non-zero value based on an input without a branch. - Dave Rivers - -- riv...@dignus.com Work: (919) 676-0847 Get your mainframe programming tools at http://www.dignus.com
Re: Unexpected C code
> C programmers don't give a damn about overflows. An unfortunate consequence, > probably, of hardware architectures which, unlike 360, lack unsigned > instructions, forcing compilers to generate signed instructions for > unsigned operations. I've spent more of the last week finding out more about integer overflow (non-) handling in C than I would have wished to, and certainly enough to last a lifetime. In a nutshell the story appears to be that the C standards ("The nice thing about standards is that you have so many to choose from" - Tanenbaum) simply codified the variety of existing practice. Where there was no consensus behavior was left "undefined". Thus we have: C11 6.5/5 If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined. with the exception clause: C11 6.2.5/9 The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type. Thus unsigned integers wrap without warning, signed integers can do whatever their fancy takes (qv "nasal demons"). Come back Ada, all is forgiven, Best wishes / Mejores deseos / Meilleurs vœux Ian ... On Wednesday, April 20, 2022, 02:32:44 AM GMT+2, Paul Gilmartin <0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote: On Apr 19, 2022, at 17:57:23, Bernd Oppolzer wrote: > > LPR: if the register contains 0x8000, IMO the result will be zero (and > overflow), > I'd expect 0x8000, with overflow. > so you're right ... this will lead to a zero result. IMO, the overflow will > be ignored. > C programmers don't give a damn about overflows. An unfortunate consequence, probably, of hardware architectures which, unlike 360, lack unsigned instructions, forcing compilers to generate signed instructions for unsigned operations. > N result zero: LPR ... LCR ... SRL puts x'00' in R1 > N result X'8000': LPR overflow (zero) ... LCR ... SRL puts x'00' in R1 > N result otherwise non-zero: LPR non-zero positive ... LCR negative ... SRL > puts x'01' in R1 > Oops! I forgot that one non-negate value with a complement. > Is bitwise AND defined for signed ints, which are negative? > AND doesn't care -- the sign is just one of 32 bits. > IMO, this is difficult; the result depends on the number format (2s or 1s > complement). > So, maybe, this is not defined in the C language; bit operations are for > unsigned ints, normally. > Does the C standard require that shifts are equivalent to multiply/divide by powers of 2? That pretty much implies 2/s complement. Hmmm. Sift right truncates toward -∞, not toward zero. -- gil
Re: Unexpected C code
> ccpm and carrybit are probably ints or unsigned ints, > because of the L and N instructions, which read them. > The final STC moves the rightmost 8 bits to the bool variable; > bool (no C standard type) is probably a typedef which means char. > I hope, I understood the coding correctly. Indeed. The full code, which I omitted to keep the focus, is: __uint32_t ccpm; const __uint32_t carrybit = 0x2000; __uint32_t sum = u32a; asm("alr %[r1],%[r2] \n\t" "ipm %[r3]" : [r1] "+r" (sum), // output [r3] "=r" (ccpm) : [r2] "r" (u32b) // input : // clobbers ); bool overflow = (ccpm & carrybit) != 0; // check if carry bit set bool is defined in stdbool.h as an alias for _Bool (a standard type, I believe, since since C99). I had to check up how LCR behaved with x80...0, and the answer (viz, leave it alone and set the overflow bit) somewhat surprised me. It's certainly a mighty clever piece of code, one that I've not seen before. It brings up in my mind the question what exactly is the cost of a branch? How many instructions can I use before they before more expensive than the branch they replace? Best wishes / Mejores deseos / Meilleurs vœux Ian ... On Wednesday, April 20, 2022, 01:00:03 AM GMT+2, Bernd Oppolzer wrote: ccpm and carrybit are probably ints or unsigned ints, because of the L and N instructions, which read them. so, the & (bitwise AND) operation yields a nonzero result, if there is a one bit in the same bit position in both operands. This nonzero result must be transferred to a one byte value X'01', using some clever register operations. And: yes, IMO the coding here tries to avoid branches and compares. The solution LPR ... LCR ... SRL looks OK for me. LPR keeps a nonzero result, but with a positive sign, LCR does the same, but enforces a negative sign, and SRL moves the sign to the rightmost bit position. In contrast, a zero result of the N operation would stay zero throughout the LPR / LCR sequence, and the SRL would move a zero bit in the rightmost bit position. The final STC moves the rightmost 8 bits to the bool variable; bool (no C standard type) is probably a typedef which means char. I hope, I understood the coding correctly. Kind regards Bernd Am 19.04.2022 um 15:06 schrieb Ian Worthington: > Noticed today that the GCC C compiler generated an unexpected sequence of > instructions for an AND and TEST: > > bool overflow = (ccpm & carrybit) != 0; // check if carry bit set > 109 .loc 1 189 0 > 110 0078 5810B25C l %r1,604(%r11) # D.7949, ccpm > 111 007c 5410B26C n %r1,620(%r11) # D.7949, carrybit > 112 0080 1011 lpr %r1,%r1 # tmp54, D.7949 > 113 0082 1311 lcr %r1,%r1 # tmp55, tmp54 > 114 0084 8810001F srl %r1,31 # tmp56, > 115 0088 4210B25B stc %r1,603(%r11) # tmp56, overflow > > I can only guess this is to avoid the cost of a branch? Or is there some > other advantage in this? > > > Best wishes / Mejores deseos / Meilleurs vœux > > Ian ...