Re: Unexpected C code

2022-04-20 Thread Robin Vowels

From: "Paul Gilmartin" <0014e0e4a59b-dmarc-requ...@listserv.uga.edu>
Sent: Wednesday, April 20, 2022 10:32 AM


C programmers don't give a damn about overflows.  An unfortunate consequence,
probably, of hardware architectures which, unlike 360, lack unsigned
instructions, forcing compilers to generate signed instructions for
unsigned operations.


I think that you will find that in machines that store negative values
in two's complement form will produce a correct sum (or difference)
using "signed" instructions, since all 32 bits* participate in the
addition (or subtraction.

On the S/360, the AL and SL instructions set the condition code
differently from A and S, but the 32-bit sum or difference is the same
as A and S.
__

* [or whatever the word size is]

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


Re: Unexpected C code

2022-04-20 Thread Robin Vowels

On 2022-04-20 20:05, Thomas David Rivers wrote:


That's a great explanation Thomas.
I'm curious though:  how come both compilers produce this
same sequence of instructions?  I'd have thought it was a
rather obscure combination.  Is it perhaps more common
than I'd suspected, or do GCC and Dignus have some common heritage
in the back end?


 No common heritage at all.   The IBM compiler produces a similar
 sequence.

 Compiler writers are always looking for ways to do more using
 less.

 It's pretty well known, as are some other surprising sequences.

 Here's one that will surprise people...

 For this source:

   int
   foo(int x)
   {
 return x / 5;
   }

 the Dignus compiler (generating code for 32-bit z/OS)
 generates:

* *** return x / 5;
 L 15,0(0,1)   ; x
 LR3,15;   .
 SRL   15,31(0);   .
 M 2,@lit_153_0 ;   .
 SRA   2,1(0)  ;   .
 ALR   2,15;   .
 LR15,2
   ...
 DS0D
@lit_153_0 DC  F'1717986919' 0x6667

 which is correct, and avoids the division instruction.
 It takes advantage of the fact the the M)ultiply instruction
 uses 2 signed 32-bit operands and produces a signed 64-bit
 result.


H. S. Warren, "Hackers Delight" (2015), gives specific code for
division by 5 (p. 209), also multiplying by the same magic number
as above.

Vowels, "Division by 10" (1992), shows division by 10 without using
either multiplication or division: only 5 shifts and 5 adds.

Division of an 8-bit integer by 10 is achieved in 2 shifts and 2 adds.

Floating-point division by 10 is achieved with 5 shifts and 6 adds
upon the mantissa.


p.s. Even though we don't "know" the timing difference between
 division and multiplication, it's a sure bet that division
 takes a lot more time than multiplication on any hardware.
 So, best to avoid it if you can.


In the "good old days" [the 1950s], multiplication and division
took about the same time.

For the S/360-50, multiplication (M) took 28.75 microseconds,
and division (D) took 33.25 microseconds.
To both these times, 0.5 microseconds had to be added to both.
Thus, M took 29.25 uS
  D took 33.75 uS


Re: ASM500Ws after Applying Z16 PTF to HLASM

2022-04-20 Thread David Cole

Thanks, John, Good information as always.

I asked Frank Chu to open the support case. He did so about an hour ago.

Dave


At 4/20/2022 03:48 PM, Jonathan Scott wrote:

Ref:  Your note of Wed, 20 Apr 2022 15:41:44 -0400

Dave Cole writes:
> Here is the snippet from an assembly listing...
> 93657+ENDCMDS  DS0D,F
> 93658+DXDCMDS  DXD   (ENDCMDS+8-TFSCMDS)X
>   ** ASMA500W Requested alignment exceeds section alignment

Any case where the DXD has to be deferred (because it cannot be
resolved at the time it is processed during the first pass) will
trigger the problem.  If anything that might affect the location
counter could not be resolved immediately, such as an earlier
forward reference, then the address of ENDCMD would need to be
resolved by "interlude" processing after the first pass
completes, after which the DXD could be successfully resolved.

As the alignment of a DXD section is automatically determined by
its contents, I don't think that check should be able to fail.
(Even LQ in a DXD has a valid representation in object code
format, so we allow it).

We already know how to fix it, so when we receive the support
case we should be able to respond very rapidly.  (We could
obviously fix it without a support case, but it is easier to
give it higher priority when there is a support case).

Jonathan Scott, HLASM
IBM Hursley, UK


Re: Quadword constant

2022-04-20 Thread Tony Harminc
On Wed, 20 Apr 2022 at 13:03, Charles Mills  wrote:

> > USING *,16

> I was wondering about R16. Would come in handy.

Maybe on the z16...?

[There was an old PL/I Optimizer APAR (1980ish?) complaining that a
new compiler release could not generate code for certain large source
modules that had worked previously. The IBM answer was that there
weren't enough registers, and that if that were to be addressed in a
future hardware version, then they'd be able to generate larger code
blocks. Of course in a sense the compiler folks eventually got what
they wanted in the High-Word Facility on zArch.]

Tony H.


Re: ASM500Ws after Applying Z16 PTF to HLASM

2022-04-20 Thread Jonathan Scott
Ref:  Your note of Wed, 20 Apr 2022 15:41:44 -0400

Dave Cole writes:
> Here is the snippet from an assembly listing...
> 93657+ENDCMDS  DS0D,F
> 93658+DXDCMDS  DXD   (ENDCMDS+8-TFSCMDS)X
>   ** ASMA500W Requested alignment exceeds section alignment

Any case where the DXD has to be deferred (because it cannot be
resolved at the time it is processed during the first pass) will
trigger the problem.  If anything that might affect the location
counter could not be resolved immediately, such as an earlier
forward reference, then the address of ENDCMD would need to be
resolved by "interlude" processing after the first pass
completes, after which the DXD could be successfully resolved.

As the alignment of a DXD section is automatically determined by
its contents, I don't think that check should be able to fail.
(Even LQ in a DXD has a valid representation in object code
format, so we allow it).

We already know how to fix it, so when we receive the support
case we should be able to respond very rapidly.  (We could
obviously fix it without a support case, but it is easier to
give it higher priority when there is a support case).

Jonathan Scott, HLASM
IBM Hursley, UK


Re: ASM500Ws after Applying Z16 PTF to HLASM

2022-04-20 Thread David Cole

Hi John,

Thanks for your comments. They're helpful.



My actual case is similar to your example, but is not quite the same. 
Below is a snippet from the listing.


I am using DXDs to record the length of a csect. It occurs at the end 
of the assembly, and it assigns a duplication factor to a DXD X that 
is equal to the length of the csect.


All my assemblies end with a similar DXD, but their names are all different.

Then later, the Binder will accumulate all the DXDs into an external 
dummy whose length will match the load module's length. Then the 
Binder will stash that length into a CXD.





Here is the snippet from an assembly listing...

 3CB0   93657+ENDCMDS  DS0D,F
93658+DXDCMDS  DXD 
(ENDCMDS+8-TFSCMDS)X

 ** ASMA500W Requested alignment exceeds section alignment
 ** ASMA435I Record 87 in CSW.Z22.PUBLISHD(@CSECTZ) on volume: CSW00M
 3CB4 
  93659+ DCQ(DXDCMDS) 
REQUIRED REFERENCE

93661  END   , SRCCMDS



You will note that while the DXD does not reference any forward 
defined variable, its dup factor nonetheless resolves to a value 
derived from a forward address.


The peculiar thing is, while every single one of my 200+ assemblies 
ends with similar logic, only fourteen report the ASMA500W warning.


I hope this sheds additional light on this bug of yours.



I will raise a support case.
Dave Cole
President, ColeSoft
dbc...@gmail.com (personal)
dbc...@colesoft.com (business)
540-456-6518 (cell)





At 4/20/2022 05:59 AM, you wrote:

Dave Cole, please raise a support case for this.

APAR PH40885 exposed a bug when the duplication factor on a DXD
involves a forward reference, for example:

DXD1 DXD   (A)X
AEQU   3

The problem is that if a DXD definition has to be deferred
because it cannot be resolved immediately, the field which would
normally point to the owning section (itself) is used instead to
point to some information about the deferred definition, to be
resolved after the end of the first pass.

In the second pass, when the alignment is supposed to be checked
against the alignment of the owning section, it ends up being
checked against the deferred definition instead, in which the
field corresponding to the alignment is unused, equal to zero,
causing the warning.

The good news is that this doesn't affect the object code output.

Jonathan Scott, HLASM
IBM Hursley, UK


Re: Quadword constant

2022-04-20 Thread Charles Mills
I was wondering about R16. Would come in handy.

Charles


-Original Message-
From: IBM Mainframe Assembler List [mailto:ASSEMBLER-LIST@LISTSERV.UGA.EDU] On 
Behalf Of Steve Smith
Sent: Wednesday, April 20, 2022 9:31 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Quadword constant

Sheesh.  Sorry, I meant ORG *,16.

On Wed, Apr 20, 2022 at 12:30 PM Steve Smith  wrote:

> That's the old-fashioned way.  This is the new way:
>
> USING *,16


Re: Quadword constant

2022-04-20 Thread Steve Smith
Sheesh.  Sorry, I meant ORG *,16.

On Wed, Apr 20, 2022 at 12:30 PM Steve Smith  wrote:

> That's the old-fashioned way.  This is the new way:
>
> USING *,16
>
> There are some caveats.  For CSECTs, HLASM will complain if SECALGN is
> insufficient.  For DSECTs, it's your responsibility to ensure the alignment
> matches (if it's real important).  Fortunately, STORAGE has a corresponding
> alignment specification.
>
> sas
>
> On Wed, Apr 20, 2022 at 10:50 AM Bob Raicer  wrote:
>
>> Ed;
>>
>> Of course, what you said about the LQ type of DC is true, and I too
>> have used LQ data types in some of my code too.  However, the
>> SECTALGN requirement is a bit of an issue when assembling code with
>> 2**3 (double word) section alignment and which also contains DSECTs
>> which map quad word aligned storage areas.  I've had to resort to
>> schemes like what is shown below (I hope the list server doesn't
>> mangle the sample listing too badly).
>>
>> The reason(s) for still having double word aligned sections is (are)
>> a bit lost in antiquity -- inertia is a powerful thing :)
>>
>> : D-Loc   Object Code Stmt   Source Statement
>> :1 SAMPLE   DSECT ,
>> :2  PRINT ON,DATA
>> : 0010   3 REF  DCA(QUADITEM)
>> :0004 00 4 BYTE DCAL1(0)
>> :5 *
>> -
>> :0005    6  DC
>> (*-SAMPLE)+15)/16)*16)-(*-SAMPLE))AL1(0)
>> :000D 00
>> :7 * Round up to a Quad Word
>> :8 * boundary.
>> :9 *
>> :0010   10 QUADITEM DCXL16'00'
>> :0018 
>> :   11  END   ,
>>
>


Re: Quadword constant

2022-04-20 Thread Steve Smith
That's the old-fashioned way.  This is the new way:

USING *,16

There are some caveats.  For CSECTs, HLASM will complain if SECALGN is
insufficient.  For DSECTs, it's your responsibility to ensure the alignment
matches (if it's real important).  Fortunately, STORAGE has a corresponding
alignment specification.

sas

On Wed, Apr 20, 2022 at 10:50 AM Bob Raicer  wrote:

> Ed;
>
> Of course, what you said about the LQ type of DC is true, and I too
> have used LQ data types in some of my code too.  However, the
> SECTALGN requirement is a bit of an issue when assembling code with
> 2**3 (double word) section alignment and which also contains DSECTs
> which map quad word aligned storage areas.  I've had to resort to
> schemes like what is shown below (I hope the list server doesn't
> mangle the sample listing too badly).
>
> The reason(s) for still having double word aligned sections is (are)
> a bit lost in antiquity -- inertia is a powerful thing :)
>
> : D-Loc   Object Code Stmt   Source Statement
> :1 SAMPLE   DSECT ,
> :2  PRINT ON,DATA
> : 0010   3 REF  DCA(QUADITEM)
> :0004 00 4 BYTE DCAL1(0)
> :5 *
> -
> :0005    6  DC
> (*-SAMPLE)+15)/16)*16)-(*-SAMPLE))AL1(0)
> :000D 00
> :7 * Round up to a Quad Word
> :8 * boundary.
> :9 *
> :0010   10 QUADITEM DCXL16'00'
> :0018 
> :   11  END   ,
>


Re: Detection of integer overflow

2022-04-20 Thread Don Higgins
>doesn't current z model have a specific Unsigned Multiply instruction?<

ML, MLR, MLG, and MLGR called "Multiply Logical" are unsigned with no cc change 
since overflow does not occur with 64 and 128 bit results.

Don Higgins
d...@higgins.net
www.don-higgins.net

-Original Message-
From: IBM Mainframe Assembler List  On Behalf 
Of Paul Gilmartin
Sent: Wednesday, April 20, 2022 10:55 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Detection of integer overflow

On Apr 20, 2022, at 07:46:49, Ian Worthington wrote:
> 
> Whilst looking at reliable techniques to detect signed and unsigned overflow 
> in integer multiplication I was checking out the late John Erhman's 
> "Assembler Language Programming for IBM System z™ Servers" in which I 
> discovered he presented this problem and solution:
> 18.2.13.(2)+ A programmer wanted to test whether the product of two 
> positive 32-bit binary integers was too large to fit in a 32-bit 
> register. 
> 
I have relied on:
Multiply
SLDA 32
Test CC for overflow.

Works for signed integers in any quadrant.

Does "positive" mean both sign bits are zero?

Harder for unsigned.  For that reason, doesn't current z model have a specific 
Unsigned Multiply instruction?

--
gil


Re: Detection of integer overflow

2022-04-20 Thread Paul Gilmartin
On Apr 20, 2022, at 07:46:49, Ian Worthington wrote:
> 
> Whilst looking at reliable techniques to detect signed and unsigned overflow 
> in integer multiplication I was checking out the late John Erhman's 
> "Assembler Language Programming for IBM System z™ Servers" in which I 
> discovered he presented this problem and solution:
> 18.2.13.(2)+ A programmer wanted to test whether the product of two positive 
> 32-bit binary
> integers was too large to fit in a 32-bit register. 
> 
I have relied on:
Multiply
SLDA 32
Test CC for overflow.

Works for signed integers in any quadrant.

Does "positive" mean both sign bits are zero?

Harder for unsigned.  For that reason, doesn't current z
model have a specific Unsigned Multiply instruction?

-- 
gil


Re: Quadword constant

2022-04-20 Thread Bob Raicer

Ed;

Of course, what you said about the LQ type of DC is true, and I too
have used LQ data types in some of my code too.  However, the
SECTALGN requirement is a bit of an issue when assembling code with
2**3 (double word) section alignment and which also contains DSECTs
which map quad word aligned storage areas.  I've had to resort to
schemes like what is shown below (I hope the list server doesn't
mangle the sample listing too badly).

The reason(s) for still having double word aligned sections is (are)
a bit lost in antiquity -- inertia is a powerful thing :)

: D-Loc   Object Code Stmt   Source Statement
:    1 SAMPLE   DSECT ,
:    2  PRINT ON,DATA
: 0010   3 REF  DC    A(QUADITEM)
:0004 00 4 BYTE DC    AL1(0)
:    5 * 
-
:0005    6  DC 
(*-SAMPLE)+15)/16)*16)-(*-SAMPLE))AL1(0)
:000D 00
:    7 * Round up to a Quad Word
:    8 * boundary.
:    9 *
:0010   10 QUADITEM DC    XL16'00'
:0018 
:   11  END   ,


Re: Detection of integer overflow

2022-04-20 Thread Robin Vowels

On 2022-04-21 00:19, Seymour J Metz wrote:

That has at least two bugs: the first test will incorrectly treat 1*-1


The task is to form the product of two POSITIVE integers.


as having an overflow and the second test is testing all of R0,


The second test must test R1, as shown, not R0.


not just the high bit.





From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU]
on behalf of Ian Worthington
[0c9b78d54aea-dmarc-requ...@listserv.uga.edu]
Sent: Wednesday, April 20, 2022 9:46 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Detection of integer overflow

Whilst looking at reliable techniques to detect signed and unsigned
overflow in integer multiplication I was checking out the late John
Erhman's "Assembler Language Programming for IBM System z™ Servers" in
which I discovered he presented this problem and solution:
18.2.13.(2)+ A programmer wanted to test whether the product of two
positive 32-bit binary
integers was too large to fit in a 32-bit register. 
Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32
bits long but appears to be negative, −27785047. An additional test is 
needed:

L 1,X Load first operand
M 0,Y Multiply by second operand
LTR 0,0 Check high-order 32 bits
BNZ NotOK If not zero, product is too big
LTR 1,1 Check high-order bit of GR1
BZ ProdOK Branch if high-order 33 bits are 0s
- - - Not OK
X DC F'75141'
Y DC F'56789'
One hesitates to suggest it, but surely this cannot be correct?  This
checks that r0 and r1 are both zero.  Surely John meant BNM ProdOk as
the final instruction (at least for signed 32 bit integers:  no
further test is required for unsigned integers, I think.)?
Google finds no errata for John's book.  It is, of course, much more
likely I've misinterpreted something that John made an error!


Re: Detection of integer overflow

2022-04-20 Thread Ian Worthington
The first case is ruled "out of scope" by the wording of the question wherein 
both inputs are deemed to be positive.  (Though I think that makes it a bit of 
a hokey example).


Best wishes / Mejores deseos /  Meilleurs vœux

Ian ... 

On Wednesday, April 20, 2022, 04:19:32 PM GMT+2, Seymour J Metz 
 wrote:  
 
 That has at least two bugs: the first test will incorrectly treat 1*-1 as 
having an overflow and the second test is testing all of R0, not just the high 
bit.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Ian Worthington [0c9b78d54aea-dmarc-requ...@listserv.uga.edu]
Sent: Wednesday, April 20, 2022 9:46 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Detection of integer overflow

Whilst looking at reliable techniques to detect signed and unsigned overflow in 
integer multiplication I was checking out the late John Erhman's "Assembler 
Language Programming for IBM System z™ Servers" in which I discovered he 
presented this problem and solution:
18.2.13.(2)+ A programmer wanted to test whether the product of two positive 
32-bit binary
integers was too large to fit in a 32-bit register. 
Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32
bits long but appears to be negative, −27785047. An additional test is needed:
L 1,X Load first operand
M 0,Y Multiply by second operand
LTR 0,0 Check high-order 32 bits
BNZ NotOK If not zero, product is too big
LTR 1,1 Check high-order bit of GR1
BZ ProdOK Branch if high-order 33 bits are 0s
- - - Not OK
X DC F'75141'
Y DC F'56789'
One hesitates to suggest it, but surely this cannot be correct?  This checks 
that r0 and r1 are both zero.  Surely John meant BNM ProdOk as the final 
instruction (at least for signed 32 bit integers:  no further test is required 
for unsigned integers, I think.)?
Google finds no errata for John's book.  It is, of course, much more likely 
I've misinterpreted something that John made an error!


Best wishes / Mejores deseos /  Meilleurs vœux

Ian ...
  


Re: Detection of integer overflow

2022-04-20 Thread Robin Vowels

On 2022-04-20 23:46, Ian Worthington wrote:

Whilst looking at reliable techniques to detect signed and unsigned
overflow in integer multiplication I was checking out the late John
Erhman's "Assembler Language Programming for IBM System z™ Servers" in
which I discovered he presented this problem and solution:
18.2.13.(2)+ A programmer wanted to test whether the product of two
positive 32-bit binary
integers was too large to fit in a 32-bit register. 
Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32
bits long but appears to be negative, −27785047. An additional test is 
needed:

L 1,X Load first operand
M 0,Y Multiply by second operand
LTR 0,0 Check high-order 32 bits
BNZ NotOK If not zero, product is too big
LTR 1,1 Check high-order bit of GR1
BZ ProdOK Branch if high-order 33 bits are 0s


This should be BNM PRODOK


- - - Not OK
X DC F'75141'
Y DC F'56789'
One hesitates to suggest it, but surely this cannot be correct?  This
checks that r0 and r1 are both zero.  Surely John meant BNM ProdOk as
the final instruction (at least for signed 32 bit integers:


Correct


no further test is required for unsigned integers, I think.)?


Correct.


Google finds no errata for John's book.  It is, of course, much more
likely I've misinterpreted something that John made an error!


Re: Detection of integer overflow

2022-04-20 Thread Seymour J Metz
That has at least two bugs: the first test will incorrectly treat 1*-1 as 
having an overflow and the second test is testing all of R0, not just the high 
bit.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Ian Worthington [0c9b78d54aea-dmarc-requ...@listserv.uga.edu]
Sent: Wednesday, April 20, 2022 9:46 AM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Detection of integer overflow

Whilst looking at reliable techniques to detect signed and unsigned overflow in 
integer multiplication I was checking out the late John Erhman's "Assembler 
Language Programming for IBM System z™ Servers" in which I discovered he 
presented this problem and solution:
18.2.13.(2)+ A programmer wanted to test whether the product of two positive 
32-bit binary
integers was too large to fit in a 32-bit register. 
Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32
bits long but appears to be negative, −27785047. An additional test is needed:
L 1,X Load first operand
M 0,Y Multiply by second operand
LTR 0,0 Check high-order 32 bits
BNZ NotOK If not zero, product is too big
LTR 1,1 Check high-order bit of GR1
BZ ProdOK Branch if high-order 33 bits are 0s
- - - Not OK
X DC F'75141'
Y DC F'56789'
One hesitates to suggest it, but surely this cannot be correct?  This checks 
that r0 and r1 are both zero.  Surely John meant BNM ProdOk as the final 
instruction (at least for signed 32 bit integers:  no further test is required 
for unsigned integers, I think.)?
Google finds no errata for John's book.  It is, of course, much more likely 
I've misinterpreted something that John made an error!


Best wishes / Mejores deseos /  Meilleurs vœux

Ian ...


Detection of integer overflow

2022-04-20 Thread Ian Worthington
Whilst looking at reliable techniques to detect signed and unsigned overflow in 
integer multiplication I was checking out the late John Erhman's "Assembler 
Language Programming for IBM System z™ Servers" in which I discovered he 
presented this problem and solution:
18.2.13.(2)+ A programmer wanted to test whether the product of two positive 
32-bit binary
integers was too large to fit in a 32-bit register. 
Consider multiplying 75141×56789: the product X'FE5808A9' is indeed 32
bits long but appears to be negative, −27785047. An additional test is needed:
L 1,X Load first operand
M 0,Y Multiply by second operand
LTR 0,0 Check high-order 32 bits
BNZ NotOK If not zero, product is too big
LTR 1,1 Check high-order bit of GR1
BZ ProdOK Branch if high-order 33 bits are 0s
- - - Not OK
X DC F'75141'
Y DC F'56789'
One hesitates to suggest it, but surely this cannot be correct?  This checks 
that r0 and r1 are both zero.  Surely John meant BNM ProdOk as the final 
instruction (at least for signed 32 bit integers:  no further test is required 
for unsigned integers, I think.)?
Google finds no errata for John's book.  It is, of course, much more likely 
I've misinterpreted something that John made an error!


Best wishes / Mejores deseos /  Meilleurs vœux

Ian ...


Re: Quadword constant

2022-04-20 Thread Seymour J Metz
Yes, but I know of no way to define a quadword binary fixed point integer 
constant in the current HLASM.


--
Shmuel (Seymour J.) Metz
http://mason.gmu.edu/~smetz3


From: IBM Mainframe Assembler List [ASSEMBLER-LIST@LISTSERV.UGA.EDU] on behalf 
of Ed Jaffe [edja...@phoenixsoftware.com]
Sent: Tuesday, April 19, 2022 10:22 PM
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Quadword constant

On 4/19/2022 7:13 PM, Bob Raicer wrote:
>
> Having the ability to assemble quadword aligned 128-bit items for
> use with these instructions would be helpful.

We define quadword-aligned storage areas all the time. For example:

Field1  DC LQ'0'
Field2  DC LQ'0'

Of course, you need to specify the SECTALGN option.

We align all of our sections on cache-line boundaries, but technically
you don't need more than quadword alignment to make LQ work.

--
Phoenix Software International
Edward E. Jaffe
831 Parkview Drive North
El Segundo, CA 90245
https://secure-web.cisco.com/1ogePMRKwHO_SN3nXQvDcTQByAD0y_PcuSB5iExWepwkCXwevZFLD_lp5TkvnS_DPrwh9gFyyLxQKsbk50j--YszHyZVqZlyf6CzYz_ex-FTyIslsUxWo8_6zZaZRjSWoedf-eDloErk4Qs9VrSsJJzFCz5g1CmhlRBcwvAP9a6KyrvEwOMLZQy-lh8Eleg6YyRCwbuWD4QRJ4MG_-RIpJF32UQT5XbYEdjS32q7XC9l7B4Ym2p2_NCbL9H2r5zJeUYzNo-vn4FkyHeQPZHgrl2cvth1XdXDhpyeZHDfTFPyLeH9b6KKN_on3AUXeewCFZLk6-1ki6uawj_f_2UyIA9O-1J2x-0Q-zsJPbpdEe09rnM1obD0_8th4bTEGtefT9uzu5J9oFWeW7QMGO-z0I_fWx9IcIwce_dv5heMdhdw08AISIF5LzISKGOXRTl0T/https%3A%2F%2Fwww.phoenixsoftware.com%2F



This e-mail message, including any attachments, appended messages and the
information contained therein, is for the sole use of the intended
recipient(s). If you are not an intended recipient or have otherwise
received this email message in error, any use, dissemination, distribution,
review, storage or copying of this e-mail message and the information
contained therein is strictly prohibited. If you are not an intended
recipient, please contact the sender by reply e-mail and destroy all copies
of this email message and do not otherwise utilize or retain this email
message or any or all of the information contained therein. Although this
email message and any attachments or appended messages are believed to be
free of any virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the recipient
to ensure that it is virus free and no responsibility is accepted by the
sender for any loss or damage arising in any way from its opening or use.


Re: ASM500Ws after Applying Z16 PTF to HLASM

2022-04-20 Thread Jonathan Scott
Dave Cole, please raise a support case for this.

APAR PH40885 exposed a bug when the duplication factor on a DXD
involves a forward reference, for example:

DXD1 DXD   (A)X
AEQU   3

The problem is that if a DXD definition has to be deferred
because it cannot be resolved immediately, the field which would
normally point to the owning section (itself) is used instead to
point to some information about the deferred definition, to be
resolved after the end of the first pass.

In the second pass, when the alignment is supposed to be checked
against the alignment of the owning section, it ends up being
checked against the deferred definition instead, in which the
field corresponding to the alignment is unused, equal to zero,
causing the warning.

The good news is that this doesn't affect the object code output.

Jonathan Scott, HLASM
IBM Hursley, UK


Re: Unexpected C code

2022-04-20 Thread Thomas David Rivers
> 
> That's a great explanation Thomas.
> I'm curious though:  how come both compilers produce this 
> same sequence of instructions?  I'd have thought it was a
> rather obscure combination.  Is it perhaps more common
> than I'd suspected, or do GCC and Dignus have some common heritage
> in the back end?

 No common heritage at all.   The IBM compiler produces a similar
 sequence.

 Compiler writers are always looking for ways to do more using
 less.

 It's pretty well known, as are some other surprising sequences.

 Here's one that will surprise people...

 For this source:

   int 
   foo(int x)
   {
 return x / 5;
   }

 the Dignus compiler (generating code for 32-bit z/OS) 
 generates:

* *** return x / 5;
 L 15,0(0,1)   ; x
 LR3,15;   .
 SRL   15,31(0);   .
 M 2,@lit_153_0 ;   .
 SRA   2,1(0)  ;   .
 ALR   2,15;   .
 LR15,2
   ...
 DS0D
@lit_153_0 DC  F'1717986919' 0x6667

 which is correct, and avoids the division instruction.
 It takes advantage of the fact the the M)ultiply instruction
 uses 2 signed 32-bit operands and produces a signed 64-bit
 result.

- Dave Rivers -

p.s. Even though we don't "know" the timing difference between
 division and multiplication, it's a sure bet that division
 takes a lot more time than multiplication on any hardware.
 So, best to avoid it if you can.

--
riv...@dignus.comWork: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com


Re: Unexpected C code

2022-04-20 Thread Bernd Oppolzer

Many thanks, comments below ...


Am 20.04.2022 um 04:17 schrieb Thomas David Rivers:

The "secret" is in the operation of the LPR and LCR
instructions for the 2's complement maximum negative
value (X'8000'): These notes in the Principles of
Operation give a hint:

  LPR:
An overflow condition occurs when the maximum negative
number is complemented; the number remains unchanged.


I did not see this hint in the PoOp; if I did, I would have understood 
much better;
So this special situation is the only case where LPR leaves a negative 
sign bit
in the target register? Which turns the name of the instruction (LPR) a 
bit strange

in this case ...


  LCR:
Zero and the maximum negative number remain unchanged.
An overflow condition occurs when the maximum negative number
is complemented.


Same here, LCR does not "load the complement" for the maximum
negative number. When I first learned about this instruction, I thought
that it would do something completely different; I later learned that
to get a real complement, you need XR with all ones.


So, as it happens, the LPR of the most negative number
(X'8000') produces X'8000' as its result (and
sets overflow, which is ignored.)  And the same thing
happens for the LCR instruction.


There is something new to learn every single day ...


Re: Unexpected C code

2022-04-20 Thread Ian Worthington
That's a great explanation Thomas.
I'm curious though:  how come both compilers produce this same sequence of 
instructions?  I'd have thought it was a rather obscure combination.  Is it 
perhaps more common than I'd suspected, or do GCC and Dignus have some common 
heritage in the back end?


Best wishes / Mejores deseos /  Meilleurs vœux

Ian ... 

On Wednesday, April 20, 2022, 04:17:32 AM GMT+2, Thomas David Rivers 
 wrote:  
 
 I thought I'd bring an explanation
to what's going on here... 

Let's consider the following short
C example (just to have something
to compile):

foo()
{
  unsigned char ovfl;
  int ccpm, carrybit;

  ccpm = bar(); carrybit=bar2();

  ovfl = (ccpm & carrybit) != 0;

  blah(ovfl);

}

The functions bar(), bar2() and blah() are simply
external to this source (compilation unit in C terms)
and are there so the optimizer doesn't have a clue
about the possible values of the variables.

When I compile this for z/OS (31-bit mode) with 
the Dignus compiler, I get this code:

* ***      ccpm = bar(); carrybit=bar2();
        L    15,@lit_153_0 ; bar
@@gen_label0 DS    0H 
        BALR  14,15
@@gen_label1 DS    0H 
        L    1,@lit_153_1 ; bar2
        LR    2,15
        LR    15,1
@@gen_label2 DS    0H 
        BALR  14,15
@@gen_label3 DS    0H 
* ***  
* ***      ovfl = (ccpm & carrybit) != 0;
        NR    2,15
        LPR  2,2
        LCR  2,2
        SRL  2,31(0)
* ***  
* ***      blah(ovfl);
        STC  2,80(0,13)  ; ovfl

which is similar to what's going on with GCC.
(The values happen to be in registers though.)

Now, how does this work?

  1) The two values are AND'd together
    (this is just a bit-wise/logical AND operation).

  2) The absolute value is taken (making the 2's
    complement sign bit a zero) with the LPR instruction.
    So we now have either a zero or non-zero (positive) value
    (or a special case which we'll see below.)

  3) The 2's complement of that is taken.  If the value
    is zero, the result is zero - otherwise the result
    is a negative value (and the sign-bit will be set)
    (or - another special case, which we'll see below.)

  4) The sign-bit is shifted right 31 times to result
    in either a X'' or X'0001' in the
    the final result.
    
So, what's going on in step #2 and why does that work? 
Especially if we consider that the result of the AND
sets the sign-bit?

Note that the only value from the AND that is interesting
is the situation where the AND results in the sign
bit being set, which presumably is cleared after the LPR.
Hence the confusion.

The "secret" is in the operation of the LPR and LCR
instructions for the 2's complement maximum negative 
value (X'8000'): These notes in the Principles of
Operation give a hint:

 LPR:
  An overflow condition occurs when the maximum negative
  number is complemented; the number remains unchanged.

 LCR:
  Zero and the maximum negative number remain unchanged.
  An overflow condition occurs when the maximum negative number
  is complemented.

So, as it happens, the LPR of the most negative number
(X'8000') produces X'8000' as its result (and
sets overflow, which is ignored.)  And the same thing
happens for the LCR instruction.

Going through the steps, when the result of the AND is
X'800', we get these values:

    LPR  ==>  X'8000'
    LCR  ==>  X'8000'
    SRL  ==>  X'0001'

And, for the X'000' value we have:

    LPR  ==>  X''
    LCR  ==>  X''
    SRL  ==>  X''

For any other situation where the AND operation
produces a negative value (the sign bit is set)
you'll have a value which isn't the most negative.
Thus some of the  lower-order bits (the non-sign-bit)
will be set.  If we have, for example, X'8xxx' then

    LPR  ==>  X'0xxx'
    LCR  ==>  X'8...'  (whatever the 2's complement of 0xxx is)
    SRL  ==>  X'0001'


Then we only need to consider the situation where
the result of the AND is non-zero but positive,
which is just an innocuous execution of the LPR
instruction, which does "nothing" and proceeds
as above.

It's a clever sequence of instructions to produce
a zero or non-zero value based on an input without
a branch.

    - Dave Rivers -

--
riv...@dignus.com                        Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com
  


Re: Unexpected C code

2022-04-20 Thread Ian Worthington
>  C programmers don't give a damn about overflows.  An unfortunate consequence,
> probably, of hardware architectures which, unlike 360, lack unsigned
> instructions, forcing compilers to generate signed instructions for
> unsigned operations.
I've spent more of the last week finding out more about integer overflow (non-) 
handling in C than I would have wished to, and certainly enough to last a 
lifetime.

In a nutshell the story appears to be that the C standards ("The nice thing 
about standards is that you have so many to choose from" - Tanenbaum) simply 
codified the variety of existing practice.  Where there was no consensus 
behavior was left "undefined". Thus we have:

C11 6.5/5

If an exceptional condition occurs during the evaluation of an expression (that 
is, if the result is not
mathematically defined or not in the range of representable values for its 
type), the behavior is undefined.
with the exception clause:
C11 6.2.5/9

The range of nonnegative values of a signed integer type is a subrange of the 
corresponding unsigned integer
type, and the representation of the same value in each type is the same. A 
computation involving unsigned
operands can never overflow, because a result that cannot be represented by the 
resulting unsigned integer
type is reduced modulo the number that is one greater than the largest value 
that can be represented by the
resulting type.
Thus unsigned integers wrap without warning, signed integers can do whatever 
their fancy takes (qv "nasal demons").
Come back Ada, all is forgiven,


Best wishes / Mejores deseos /  Meilleurs vœux

Ian ... 

On Wednesday, April 20, 2022, 02:32:44 AM GMT+2, Paul Gilmartin 
<0014e0e4a59b-dmarc-requ...@listserv.uga.edu> wrote:  
 
 On Apr 19, 2022, at 17:57:23, Bernd Oppolzer wrote:
> 
> LPR: if the register contains 0x8000, IMO the result will be zero (and 
> overflow),
>  
I'd expect 0x8000, with overflow.

> so you're right ... this will lead to a zero result. IMO, the overflow will 
> be ignored.
>  
C programmers don't give a damn about overflows.  An unfortunate consequence,
probably, of hardware architectures which, unlike 360, lack unsigned
instructions, forcing compilers to generate signed instructions for
unsigned operations.

> N result zero: LPR ... LCR ... SRL puts x'00' in R1
> N result X'8000': LPR overflow (zero) ... LCR ... SRL puts x'00' in R1
> N result otherwise non-zero: LPR non-zero positive ... LCR negative ... SRL 
> puts x'01' in R1
>  
Oops!  I forgot that one non-negate value with a complement.

> Is bitwise AND defined for signed ints, which are negative?
>  
AND doesn't care -- the sign is just one of 32 bits.

> IMO, this is difficult; the result depends on the number format (2s or 1s 
> complement).
> So, maybe, this is not defined in the C language; bit operations are for 
> unsigned ints, normally.
>  
Does the C standard require that shifts are equivalent to
multiply/divide by powers of 2? That pretty much implies
2/s complement.  Hmmm.  Sift right truncates toward -∞,
not toward zero.

-- 
gil
  


Re: Unexpected C code

2022-04-20 Thread Ian Worthington
> ccpm and carrybit are probably ints or unsigned ints,
> because of the L and N instructions, which read them.
> The final STC moves the rightmost 8 bits to the bool variable;
> bool (no C standard type) is probably a typedef which means char.

> I hope, I understood the coding correctly.
Indeed.  The full code, which I omitted to keep the focus, is:

    __uint32_t ccpm;
    const __uint32_t carrybit = 0x2000;
    __uint32_t sum = u32a;
    asm("alr %[r1],%[r2] \n\t"
        "ipm %[r3]"
        : [r1] "+r" (sum),    // output
          [r3] "=r" (ccpm)
        : [r2] "r"  (u32b)    // input
        :                    // clobbers
       );
    bool overflow = (ccpm & carrybit) != 0;    // check if carry bit set

bool is defined in stdbool.h as an alias for _Bool (a standard type, I believe, 
since since C99).  


I had to check up how LCR behaved with x80...0, and the answer (viz, leave it 
alone and set the overflow bit) somewhat surprised me.
It's certainly a mighty clever piece of code, one that I've not seen before.  
It brings up in my mind the question what exactly is the cost of a branch?  How 
many instructions can I use before they before more expensive than the branch 
they replace?

Best wishes / Mejores deseos /  Meilleurs vœux

Ian ... 

On Wednesday, April 20, 2022, 01:00:03 AM GMT+2, Bernd Oppolzer 
 wrote:  
 
 ccpm and carrybit are probably ints or unsigned ints,
because of the L and N instructions, which read them.

so, the & (bitwise AND) operation yields a nonzero result, if there is a 
one bit
in the same bit position in both operands. This nonzero result must be 
transferred
to a one byte value X'01', using some clever register operations.

And: yes, IMO the coding here tries to avoid branches and compares.

The solution LPR ... LCR ... SRL looks OK for me.
LPR keeps a nonzero result, but with a positive sign,
LCR does the same, but enforces a negative sign,
and SRL moves the sign to the rightmost bit position.

In contrast, a zero result of the N operation would stay zero
throughout the LPR / LCR sequence, and the SRL would move
a zero bit in the rightmost bit position.

The final STC moves the rightmost 8 bits to the bool variable;
bool (no C standard type) is probably a typedef which means char.

I hope, I understood the coding correctly.

Kind regards

Bernd



Am 19.04.2022 um 15:06 schrieb Ian Worthington:
> Noticed today that the GCC C compiler generated an unexpected sequence of 
> instructions for an AND and TEST:
>
>      bool overflow = (ccpm & carrybit) != 0;    // check if carry bit set
>   109          .loc 1 189 0
>   110 0078 5810B25C         l    %r1,604(%r11)    # D.7949, ccpm
>   111 007c 5410B26C         n    %r1,620(%r11)    # D.7949, carrybit
>   112 0080 1011             lpr    %r1,%r1    # tmp54, D.7949
>   113 0082 1311             lcr    %r1,%r1    # tmp55, tmp54
>   114 0084 8810001F         srl    %r1,31    # tmp56,
>   115 0088 4210B25B         stc    %r1,603(%r11)    # tmp56, overflow
>
> I can only guess this is to avoid the cost of a branch?  Or is there some 
> other advantage in this?
>
>
> Best wishes / Mejores deseos /  Meilleurs vœux
>
> Ian ...