That sounds good. I wouldn't mind seeing an implementation in assembler with hard coded translate tables just for my education. Not interested in signed integers as I can't think of a use case for that.
On 24/07/2013, at 9:42 PM, Kenneth Wilkerson <redb...@austin.rr.com> wrote: > I can't imagine any instruction sequence in any language performing a "Load > Reversed with Mirrored Bytes" more efficiently in the Z/Architecture than a > STG, TR for eight bytes and LRVG. Even though, the TR is probably > micro-coded (I don't know about the LRVG), I can't see any loop that shifts > and manipulates the data and repeats up to 63 times (assuming a very dense > register) could outperform this. I wrote an algorithm using a FLOGR but > except in the best cases (all 0s or many leading 0s), I can't imagine this > running faster. And with negative numbers (-1 being the worst case), you > would probably want to exclusive or with foxes before and after the > operation to make the value more sparse. > > However, in your initial post you talked about the above sequence involving > the TR being complex. I assume you're talking about the translate table > itself. When I need translate tables that are not "simple" and particularly > error prone, I write a program to create it. I would quadword align the > origin and result tables, do the tests and sets (in this case X'80' to > 'X01', ... X'01' to X'80'), load the address of the result table in a > register, DC H'0' to get an 0c1. I would set a slip and run the job. I could > then format the dump and cut and paste (with a little manipulation) the > table into an assembler source. In this case, if the first and last 16 bytes > of the table are correct, the its probably 100% correct. I find the half > hour I use doing this for "error prone" translate tables can save me hours > debugging later. > > Kenneth > > -----Original Message----- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of Charles Mills > Sent: Wednesday, July 24, 2013 7:31 AM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Is there a "reverse bits" hardware instruction? > > Thanks all. > > You're right, "just how fast DOES this code need to be?" And the answer is I > should know, but I don't. I don't want to waste the customer's cycles. I am > smart enough to know that I am too dumb to know how fast it needs to be. The > right answer lies in profiling, and some other task has always been just a > little higher priority than profiling. > > Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I > hated it. I am very weak on the higher math relevant to programming. > > Charles > > -----Original Message----- > From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On > Behalf Of Andrew Rowley > Sent: Wednesday, July 24, 2013 8:17 AM > To: IBM-MAIN@LISTSERV.UA.EDU > Subject: Re: Is there a "reverse bits" hardware instruction? > > How fast does this code need to be? David's ffs64 looked pretty good to my > inexpert eye, I think you would have to be running it very frequently for > something to be measurably faster. > > There are some similar discussions here, including some branchless > techniques that probably would be faster (not necessarily detectably): > http://stackoverflow.com/questions/757059/position-of-least-significant-bit- > that-is-set > > One answer also talks about clearing the lowest set bit. > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, send email > to lists...@listserv.ua.edu with the message: INFO IBM-MAIN > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN