I originally just coded the obvious VUPKZ/MVI/TR like a UNPK/MVI/TR or UNPKA/MVI/TR instruction sequence, but found the performance was within 10% for all three variations. It looks like most of the time is spent in the TR instruction. So, we need to eliminate the TR, and I came up with two ways to do hexadecimal display conversion using the vector facility without TR:
******** VECTOR UNPACK LOGICAL VL VR03,WorkXL16 Load quadword source VL VR06,=C'0123456789ABCDEF' "Translate" table VREPIB VR05,X'000F' Initialize AND mask *- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -* VUPLLB VR07,VR03 Pad each byte in LOW doubleword X ... with a null byte on the left VSLD VR02,VR07,VR07,4 Shift out high order nybble VN VR04,VR02,VR05 Mask out L.O. source nybbles in X ... H.O. vector register nybbles VO VR08,VR04,VR07 Merge high/low nybble bytes VPERM VR01,VR06,VR06,VR08 Translate to hexadecimal digits *- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -* VUPLHB VR07,VR03 Pad each byte in HIGH doubleword X ... with a null byte on the left VSLD VR02,VR07,VR07,4 Shift out high order nybble VN VR04,VR02,VR05 Mask out L.O. source nybbles in X ... H.O. vector register nybbles VO VR08,VR04,VR07 Merge high/low nybble bytes VPERM VR00,VR06,VR06,VR08 Translate to hexadecimal digits *- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -* VSTM VR00,VR01,WorkCL32 Store 32 hex digits ******** VECTOR MERGE VL VR12,WorkXL16 Load quadword source VL VR15,=C'0123456789ABCDEF' "Translate" table VERLLB VR14,VR12,4 Swap high/low nybbles within bytes VPERM VR13,VR15,VR15,VR12 Translate LOW nybbles to hex digits VPERM VR11,VR15,VR15,VR14 Translate HIGH nybbles to hex digits VMRHB VR09,VR11,VR13 Merge H.O. 16 hexadecimal digits VMRLB VR10,VR11,VR13 Merge L.O. 16 hexadecimal digits VSTEG VR09,WorkCL8a,0 Spread out hex digits, 8 per per store VSTEG VR09,WorkCL8b,1 ... VSTEG VR10,WorkCL8c,0 ... VSTEG VR10,WorkCL8d,1 ... Both versions of the vector code run 5-20 time faster than your classical UNPK/MVI/TR (5-20 nSec vs. 90-100 nSec on a z15 T02). The UNPACK LOGICAL does 8 bytes at a time, so you can shorten the code if you're converting no more than 8 bytes. The MERGE code always does all 16 bytes, but second (low) MERGE is not required if not converting more than 8 bytes. In MERGE example, I'm using multiple VSTEG's instead of a VSTM to spread the output across 4 separate fields for legibility. This code will convert 16 bytes at a time, unlike UNPKA and VUPKZ which are restricted to a maximum of 15 bytes. And there's no "extra" byte to clean up on the target side, or to worry about accessibility to on the source side. A big thanks to Dan Greiner for his extremely helpful PowerPoints explaining how the vector instructions work. But I still have no idea how we'd use half of those instructions in general coding. The VPERM instruction seems to be a (all register) TRANSLATE against a 32-byte table (in two vector registers), looking at only the L.O. 5 bits of each byte. I only needed to look at the bottom 4-bits, and achieved that by using the same register for both halves of the translate table. It's probably too complicate to hand-code, but useful if you have hexadecimal display macro you can generate it from. Robert Ngan DXC Luxoft