I can't imagine any instruction sequence in any language performing a "Load Reversed with Mirrored Bytes" more efficiently in the Z/Architecture than a STG, TR for eight bytes and LRVG. Even though, the TR is probably micro-coded (I don't know about the LRVG), I can't see any loop that shifts and manipulates the data and repeats up to 63 times (assuming a very dense register) could outperform this. I wrote an algorithm using a FLOGR but except in the best cases (all 0s or many leading 0s), I can't imagine this running faster. And with negative numbers (-1 being the worst case), you would probably want to exclusive or with foxes before and after the operation to make the value more sparse.
However, in your initial post you talked about the above sequence involving the TR being complex. I assume you're talking about the translate table itself. When I need translate tables that are not "simple" and particularly error prone, I write a program to create it. I would quadword align the origin and result tables, do the tests and sets (in this case X'80' to 'X01', ... X'01' to X'80'), load the address of the result table in a register, DC H'0' to get an 0c1. I would set a slip and run the job. I could then format the dump and cut and paste (with a little manipulation) the table into an assembler source. In this case, if the first and last 16 bytes of the table are correct, the its probably 100% correct. I find the half hour I use doing this for "error prone" translate tables can save me hours debugging later. Kenneth -----Original Message----- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Wednesday, July 24, 2013 7:31 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a "reverse bits" hardware instruction? Thanks all. You're right, "just how fast DOES this code need to be?" And the answer is I should know, but I don't. I don't want to waste the customer's cycles. I am smart enough to know that I am too dumb to know how fast it needs to be. The right answer lies in profiling, and some other task has always been just a little higher priority than profiling. Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I hated it. I am very weak on the higher math relevant to programming. Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Andrew Rowley Sent: Wednesday, July 24, 2013 8:17 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a "reverse bits" hardware instruction? How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit- that-is-set One answer also talks about clearing the lowest set bit. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN