I originally just coded the obvious VUPKZ/MVI/TR like a UNPK/MVI/TR or 
UNPKA/MVI/TR instruction sequence, but found the performance was within 10% for 
all three variations.  It looks like most of the time is spent in the TR 
instruction. So, we need to eliminate the TR, and I came up with two ways to do 
hexadecimal display conversion using the vector facility without TR:

******** VECTOR UNPACK LOGICAL
         VL    VR03,WorkXL16        Load quadword source
         VL    VR06,=C'0123456789ABCDEF' "Translate" table
         VREPIB VR05,X'000F'        Initialize AND mask
*- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -*
         VUPLLB VR07,VR03           Pad each byte in LOW doubleword    X
                                    ... with a null byte on the left
         VSLD  VR02,VR07,VR07,4     Shift out high order nybble
         VN    VR04,VR02,VR05       Mask out L.O. source nybbles in    X
                                    ... H.O. vector register nybbles
         VO    VR08,VR04,VR07       Merge high/low nybble bytes
         VPERM VR01,VR06,VR06,VR08  Translate to hexadecimal digits
*- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -*
         VUPLHB VR07,VR03           Pad each byte in HIGH doubleword   X
                                    ... with a null byte on the left
         VSLD  VR02,VR07,VR07,4     Shift out high order nybble
         VN    VR04,VR02,VR05       Mask out L.O. source nybbles in    X
                                    ... H.O. vector register nybbles
         VO    VR08,VR04,VR07       Merge high/low nybble bytes
         VPERM VR00,VR06,VR06,VR08  Translate to hexadecimal digits
*- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -*
         VSTM  VR00,VR01,WorkCL32                   Store 32 hex digits
******** VECTOR MERGE
         VL    VR12,WorkXL16                       Load quadword source
         VL    VR15,=C'0123456789ABCDEF'              "Translate" table
         VERLLB VR14,VR12,4          Swap high/low nybbles within bytes
         VPERM VR13,VR15,VR15,VR12  Translate LOW nybbles to hex digits
         VPERM VR11,VR15,VR15,VR14 Translate HIGH nybbles to hex digits
         VMRHB VR09,VR11,VR13          Merge H.O. 16 hexadecimal digits
         VMRLB VR10,VR11,VR13          Merge L.O. 16 hexadecimal digits
         VSTEG VR09,WorkCL8a,0   Spread out hex digits, 8 per per store
         VSTEG VR09,WorkCL8b,1                   ...
         VSTEG VR10,WorkCL8c,0                   ...
         VSTEG VR10,WorkCL8d,1                   ...

Both versions of the vector code run 5-20 time faster than your classical 
UNPK/MVI/TR (5-20 nSec vs. 90-100 nSec on a z15 T02).
The UNPACK LOGICAL does 8 bytes at a time, so you can shorten the code if 
you're converting no more than 8 bytes.
The MERGE code always does all 16 bytes, but second (low) MERGE is not required 
if not converting more than 8 bytes.

In MERGE example, I'm using multiple VSTEG's instead of a VSTM to spread the 
output across 4 separate fields for legibility.
This code will convert 16 bytes at a time, unlike UNPKA and VUPKZ which are 
restricted to a maximum of 15 bytes. And there's no "extra" byte to clean up on 
the target side, or to worry about accessibility to on the source side.

A big thanks to Dan Greiner for his extremely helpful PowerPoints explaining 
how the vector instructions work.  But I still have no idea how we'd use half 
of those instructions in general coding. The VPERM instruction seems to be a 
(all register) TRANSLATE against a 32-byte table (in two vector registers), 
looking at only the L.O. 5 bits of each byte. I only needed to look at the 
bottom 4-bits, and achieved that by using the same register for both halves of 
the translate table.
It's probably too complicate to hand-code, but useful if you have hexadecimal 
display macro you can generate it from.

Robert Ngan
DXC Luxoft

Reply via email to