I had compacted:

         LLC    R04,=X'040403020100'(R02) Number of leading zeros
         LTR    R04,R04               Q. Any leading zeros?
         JZ     L0642                 N. Edited value is good
         AHI    R04,-1                Y. Machine length

down to:

         LLC    R04,=X'0303020100FF'(R02) # of leading zeros (MINUS 1)
         CLIJE  R04,X'FF',L0642       Skip if no leading zeros

And there's no need to force an F sign if you going to use ED on the value.

ED works left to right, so it can be a PITA if you want an even number of 
digits and want to use the edit pattern at the final target field, if you can 
use the fill byte as a blank separator to a prior field. Using VCVD, you can 
force a F sign on the result. For editing fullwords into a 10 digit edit 
pattern, I use VSRP to shift the packed value 1 position to the left so it is 
properly aligned without having to resort to using a 11-digit pattern.

Robert
-----Original Message-----
From: IBM Mainframe Assembler List <ASSEMBLER-LIST@LISTSERV.UGA.EDU> On Behalf 
Of Schmitt, Michael
Sent: Wednesday, August 20, 2025 14:17
To: ASSEMBLER-LIST@LISTSERV.UGA.EDU
Subject: Re: Execute-Type Instructions

[part 4]
Huh?
This looks like the intent here is to avoid the ED instruction, and instead to 
what the ED would have done in code. So what does ED do?

It basically turns a number like this: PIC 9(6)V99 PACKED DECMAL into PIC 
ZZZ,ZZ9.99. However, our case was simpler. The mask was just ZZ9, so all we 
really need to do is replace the leading zeros with spaces. And there are only 
3 possibilities:

*       There are no leading zeros, so do nothing
*       There's 1 leading zero, so move one space
*       There's 2 leading zeros, so move 2 spaces.

So I think what this code is doing, somehow, is figuring out how many spaces 
need to be moved, and then setting up the display field as one space, followed 
by the number. Then it does an overlapping move (which we call byte 
progression) of that space, to the right for a length of x, where x is the 
number of zeros to overlay.

But there's a lot I don't understand here.

It may be that what we're seeing is where it generates code for the more 
complicated situation (where you have a more complex mask) but then because we 
only have 3 digits and no special mask characters, it optimizes away what it 
doesn't need. So we see some remnants of the full edit masks even though 
they're never used.

This also shows that there's a trade-off here. It is possible that this really 
is the absolute most efficient way to do this on a z13 machine. But it isn't 
what I'd write if I were coding it in assembler, because it is less 
understandable.

Just like we know that the optimizer may inline code repeatedly instead of 
calling a common section, or do loop unrolling, or other things that are more 
efficient but less maintainable if you were coding it that way in assembler.

Anyway, it looks like the z14 version (arch level 12) is identical to z13, 
except that:

CVD     R2,994(,R13)
OI      1001(,R13),X'0F'

is replaced by:

VCVD    VRF16,R2,0x3,2
VSTRL   VRF16,1033(,R13),0x1

The z14 has Vector Packed Decimal registers, which are a different set of 
registers than the floating point registers used by the Decimal Floating Point 
instructions.

VCVD is converting the binary R2 to packed decimal, VSTRL is Vector Store 
Rightmost with Length.

I don't see the point here. It is starting with a register and ending in 
storage in both cases. Only thing I can think of is that this avoids the OI 
instruction that is acting on storage.
And, why is it still using the decimal floating point instructions to count 
significant digits? There's a Vector Count Leading Zero Digits instruction!

Reply via email to