Re: Is there a "reverse bits" hardware instruction?

Kenneth Wilkerson Wed, 24 Jul 2013 06:43:50 -0700

I can't imagine any instruction sequence in any language performing a "Load
Reversed with Mirrored Bytes" more efficiently in the Z/Architecture than a
STG, TR for eight bytes and LRVG. Even though, the TR is probably
micro-coded (I don't know about the LRVG), I can't see any loop that shifts
and manipulates the data and repeats up to 63 times (assuming a very dense
register) could outperform this. I wrote an algorithm using a FLOGR but
except in the best cases (all 0s or many leading 0s), I can't imagine this
running faster.  And with negative numbers (-1 being the worst case),  you
would probably want to exclusive or with foxes before and after the
operation to make the value  more sparse.


However, in your initial post you talked about the above sequence involving
the TR being complex. I assume you're talking about the translate table
itself. When I need translate tables that are not "simple" and particularly
error prone, I write a program to create it. I would quadword align the
origin and result tables, do the tests and sets (in this case X'80' to
'X01', ... X'01' to X'80'), load the address of the result table in a
register, DC H'0' to get an 0c1. I would set a slip and run the job. I could
then format the dump and cut and paste (with a little manipulation) the
table into an assembler source. In this case, if the first and last 16 bytes
of the table are correct, the its probably 100% correct.  I find the half
hour I use doing this for "error prone" translate tables can save me hours
debugging later. 

Kenneth

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Charles Mills
Sent: Wednesday, July 24, 2013 7:31 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a "reverse bits" hardware instruction?

Thanks all.

You're right, "just how fast DOES this code need to be?" And the answer is I
should know, but I don't. I don't want to waste the customer's cycles. I am
smart enough to know that I am too dumb to know how fast it needs to be. The
right answer lies in profiling, and some other task has always been just a
little higher priority than profiling.

Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I
hated it. I am very weak on the higher math relevant to programming.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Andrew Rowley
Sent: Wednesday, July 24, 2013 8:17 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a "reverse bits" hardware instruction?

How fast does this code need to be? David's ffs64 looked pretty good to my
inexpert eye, I think you would have to be running it very frequently for
something to be measurably faster.

There are some similar discussions here, including some branchless
techniques that probably would be faster (not necessarily detectably):
http://stackoverflow.com/questions/757059/position-of-least-significant-bit-
that-is-set

One answer also talks about clearing the lowest set bit.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Is there a "reverse bits" hardware instruction?

Reply via email to