Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Victor Gil
Here's a single DC instruction I've posted May 2003 [search for what's your 
most difficult assembler concept]

BYTE_FLIP DS 0CL256
TDC 256AL1*-T)/001)-((*-T)/002)*2)*X'80'
   (((*-T)/002)-((*-T)/004)*2)*X'40'
   (((*-T)/004)-((*-T)/008)*2)*X'20'
   (((*-T)/008)-((*-T)/016)*2)*X'10'
   (((*-T)/016)-((*-T)/032)*2)*X'08'
   (((*-T)/032)-((*-T)/064)*2)*X'04'
   (((*-T)/064)-((*-T)/128)*2)*X'02'
   (((*-T)/128)-((*-T)/256)*2)*X'01')  

HTH,
-Victor-

---
The macro at the end of my reply will generate a reverse translate table. I
tested it enough to see that it looked right but it has nothing to do with
my original point. I've found this discussion interesting and it has given
me reason to play with something other than the complex process I'm working
on now. My original point, as was taken by Charles, was that I prefer
automatic processes to generate anything but the simplest translate tables.
I prefer to do it in a program and copy it from a dump into a program
because the table is static. I don't need to generate it every time I
assemble.

In regards to TROO, the original problem was stated as translating a 64 bit
register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice.
Though, I find your argument about the use of a TRTRE to simulate a FROGR
interesting.  It brings the whole bit reversal into perspective. I'm not a
big fan of translate instructions (I use them often enough), particularly
those that require facilities and facility enhancements,  unless you're
translating long strings and are willing to test the facility bits. I'm
sure that the translation and parsing facilities exist on most customer's
boxes by now but I emphasize most.  When I awakened this morning, I wrote
an algorithm to do a load reverse bytes and bits using FLOGR to drive the
process. I'm going to give the idea of a FROGR simulation more thought and
continue this exercise later.

MACRO
LABEL  REVTABLE ,
* Construct reverse bits translate table
LCLA I,J,K,L,M,N,O
LCLC X
LABEL  DS   0DLIKE EM DOUBLE WORD ALIGNED
I  SETA 0 STARTING VALUE
.TABLOOP ANOP  LOOP UNTIL TABLE IS DONE
K  SETA 1 NEED SIXTEEN ENTRIES PER LINE
X  SETC   'AL1('
AGO.X16LP
.X16NXT ANOP
X  SETC   'X'.'J'.','
.X16LP  ANOP   16 ENTRY LOOP
J  SETA 0 STARTING RESULT
L  SETA 1 STARTING ADDEND
M  SETA 1 8 BITS PER BYTE
N  SETA 128   STARTING COMPARAND X'80'
O  SETA ICOPY CURRENT BYTE TO REVERSE
.BYTELP ANOP
AIF  (O LT N).BYTEFT LESS THAN CURRENT - 0
O  SETA O-N
J  SETA J+L
.BYTEFT ANOP
L  SETA L*2   NEXT ADDEND
N  SETA N/2   NEXT COMPARAND
M  SETA M+1   NEED EIGHT BITS
AIF  (M LE 8).BYTELP
I  SETA I+1
K  SETA K+1
AIF  (K LE 16).X16NXT
X  SETC 'X'.'J'.')'
DC   X
AIF  (I LT 256).TABLOOP
MEND

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Joel C. Ewing
The overhead of a macro generated table versus static DC-generated table
would only be an issue if the code is reassembled on a very frequent
basis.  Even if that is the case, that still doesn't rule out using an
Assembler macro to generate the table for the first assembly, then using
cut  paste to put the generated DC's into the code and comment out the
macro invocation.  That way you can even keep the generation macro in
the code as documentation and not have to track or document a separate
program.

I had to consult the Assembler Reference to be sure I knew what I was
doing, but using the Arithmetic AND one can create a simpler version of
the generation macro that eliminates the innermost bit loop.  I tested
the version below with z390:

   MACRO
.*   GENERATE TRT TABLE, 16 BYTES PER DC, FOR REVERSING BITS IN BYTE
LABEL REVTBL
   LCLA N,R,SMAX
   LCLC X,SEP
LABEL DS  0D
N SETA 0  BYTE OFFSET IN TABLE
.TLOOP ANOP
SMAX   SETA N+16 LIMIT FOR NEXT DC STATEMENT GEN
X SETC '' CUMULATIVE AL1 CONSTANTS NXT DC
SEP   SETC '' NO COMMA NEEDED FOR 1ST AL1 IN DC
.SLOOP ANOP
.*  COMPUTE REVERSED BITS FOR N
R SETA ((N AND 128)/128)+((N AND 64)/32)+((N AND 32)/8)
R SETA R+((N AND 16)/2)+((N AND 8)*2)+((N AND 4)*8)
R SETA R+((N AND 2)*32)+((N AND 1)*128)
X SETC 'X.SEP.R'  ADD NEXT AL1 CONSTANT
SEP   SETC ','   NEXT TIME WILL NEED COMMA
N SETA N+1  ADVANCE NXT TABLE BYTE
   AIF (N LT SMAX).SLOOP  IF MORE FOR CURRENT DC
  DC AL1(X.)
   AIF (N LT 256).TLOOPIF MORE DC'S LEFT
   MEND

The first and last 16 generated table bytes from the above macro are:
   DC AL1(0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240)
...
   DC AL1(15,143,79,207,47,175,111,239,31,159,95,223,63,191,127,255)
which are consistent with the original macro.
Joel C. Ewing

On 07/24/2013 12:54 PM, Kenneth Wilkerson wrote:
 The macro at the end of my reply will generate a reverse translate table. I
 tested it enough to see that it looked right but it has nothing to do with
 my original point. I've found this discussion interesting and it has given
 me reason to play with something other than the complex process I'm working
 on now. My original point, as was taken by Charles, was that I prefer
 automatic processes to generate anything but the simplest translate tables.
 I prefer to do it in a program and copy it from a dump into a program
 because the table is static. I don't need to generate it every time I
 assemble.
 
 In regards to TROO, the original problem was stated as translating a 64 bit
 register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice.
 Though, I find your argument about the use of a TRTRE to simulate a FROGR
 interesting.  It brings the whole bit reversal into perspective. I'm not a
 big fan of translate instructions (I use them often enough), particularly
 those that require facilities and facility enhancements,  unless you're
 translating long strings and are willing to test the facility bits. I'm
 sure that the translation and parsing facilities exist on most customer's
 boxes by now but I emphasize most.  When I awakened this morning, I wrote
 an algorithm to do a load reverse bytes and bits using FLOGR to drive the
 process. I'm going to give the idea of a FROGR simulation more thought and
 continue this exercise later.
 
 MACRO
 LABEL  REVTABLE ,
 * Construct reverse bits translate table
 LCLA I,J,K,L,M,N,O
 LCLC X
 LABEL  DS   0DLIKE EM DOUBLE WORD ALIGNED
 I  SETA 0 STARTING VALUE
 .TABLOOP ANOP  LOOP UNTIL TABLE IS DONE
 K  SETA 1 NEED SIXTEEN ENTRIES PER LINE
 X  SETC   'AL1('
 AGO.X16LP
 .X16NXT ANOP
 X  SETC   'X'.'J'.','
 .X16LP  ANOP   16 ENTRY LOOP
 J  SETA 0 STARTING RESULT
 L  SETA 1 STARTING ADDEND
 M  SETA 1 8 BITS PER BYTE
 N  SETA 128   STARTING COMPARAND X'80'
 O  SETA ICOPY CURRENT BYTE TO REVERSE
 .BYTELP ANOP
 AIF  (O LT N).BYTEFT LESS THAN CURRENT - 0
 O  SETA O-N
 J  SETA J+L
 .BYTEFT ANOP
 L  SETA L*2   NEXT ADDEND
 N  SETA N/2   NEXT COMPARAND
 M  SETA M+1   NEED EIGHT BITS
 AIF  (M LE 8).BYTELP
 I  SETA I+1
 K  SETA K+1
 AIF  (K LE 16).X16NXT
 X  SETC 'X'.'J'.')'
 DC   X
 AIF  (I LT 256).TABLOOP
 MEND
 
 
 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
 Behalf Of John Gilmore
 Sent: Wednesday, July 24, 2013 10:29 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Is there a reverse bits hardware instruction?
 
 The construction of arbitrary translation tables can be error-prone, and
 when it is it is better done procedurally.  I use the HLASM macro language,
 which is entirely adequate to such tasks; mais à chacun son goût.
 
 Here, however, we have for a TROO only the 256 permutations taken two at a
 time

Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In
CAE1XxDGgBNuZr4gv30WtG27jj4jqVLjBM0-qmCXOf=rz8x6...@mail.gmail.com,
on 07/23/2013
   at 07:11 PM, John Gilmore jwgli...@gmail.com said:

Look at the Rotate instructions in the PrOp.  RLL will do come 
close to doing what you want to do.

FSVO close. Rotate has noting to do with reversing the bit order; it's
just a circular shift.

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In 000b01ce87ee$711f3580$535da080$@mcn.org, on 07/23/2013
   at 05:49 PM, Charles Mills charl...@mcn.org said:

I guess you could do it with TR (to reverse the bits) and then LRVG
(to reverse the bytes) but that is overly complex and probably slow
(IMHO).

It's faster than move inverse, translate and load (-;

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In 1482096876342980.wa.paulgboulderaim@listserv.ua.edu, on
07/24/2013
   at 09:25 AM, Paul Gilmartin paulgboul...@aim.com said:

One TR to reverse the bytes,

Why, Isn't LRVG a lor faster and more concise?

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In 51ef18aa.2060...@gmail.com, on 07/24/2013
   at 07:58 AM, David Crayford dcrayf...@gmail.com said:

There's a RLLG instruction to rotate the bits in a 64-bit integer.

It doesn't *reverse* the bits, which is what the OP needs.

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In 023901ce8876$0099c900$01cd5b00$@mcn.org, on 07/24/2013
   at 09:59 AM, Charles Mills charl...@mcn.org said:

Agreed in principle. I would probably use Rexx to create it.

I would create the TR table in a macro, or the inline equivalent.

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In
CAE1XxDHWvp88UB4cB69Z0JbDaFE4mtv4eB+ViH=s3xxcttf...@mail.gmail.com,
on 07/24/2013
   at 11:29 AM, John Gilmore jwgli...@gmail.com said:

Here, however, we have for a TROO

Why TROO rather than TR?

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Shmuel Metz (Seymour J.)
In
cae1xxdhosowjbjwsbdfaqjtt2namcqjsbg2y78y7heomz3a...@mail.gmail.com,
on 07/23/2013
   at 08:07 PM, John Gilmore jwgli...@gmail.com said:

I am happy to have David join in my recommendation, and I can imagine
a circumstance in which RLLG would be more useful than RLL, but RLL
itself does 64-bit rotations.

For ROTATE LEFT SINGLE LOGICAL (RLL), bits 0-31 of general registers
R1 and R3 remain unchanged.

-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 Atid/2http://patriot.net/~shmuel
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-25 Thread Joel C. Ewing
So, no need for any macro at all to generate the byte bit reversal TRT
table!  I must have skipped over or forgotten the 2003 posting.

This is just totally awesome, especially the redundant unary +'s to
force decent alignment for readability.  Had to actually run it with
PRINT DATA to convince myself it really worked, then study it more
closely.  Once you understand why it is syntactically correct and what
it is doing it seems obvious, but at first sight it is certainly
intimidating. Obviously usage in a production application should have
some explanatory comments so future maintainers don't panic at their
first exposure!
Joel C Ewing

On 07/25/2013 09:13 AM, Victor Gil wrote:
 Here's a single DC instruction I've posted May 2003 [search for what's your 
 most difficult assembler concept]
 
 BYTE_FLIP DS 0CL256
 TDC 256AL1*-T)/001)-((*-T)/002)*2)*X'80'
(((*-T)/002)-((*-T)/004)*2)*X'40'
(((*-T)/004)-((*-T)/008)*2)*X'20'
(((*-T)/008)-((*-T)/016)*2)*X'10'
(((*-T)/016)-((*-T)/032)*2)*X'08'
(((*-T)/032)-((*-T)/064)*2)*X'04'
(((*-T)/064)-((*-T)/128)*2)*X'02'
(((*-T)/128)-((*-T)/256)*2)*X'01')  
 
 HTH,
 -Victor-
 
 ---
 The macro at the end of my reply will generate a reverse translate table. I
 tested it enough to see that it looked right but it has nothing to do with
 my original point. I've found this discussion interesting and it has given
 me reason to play with something other than the complex process I'm working
 on now. My original point, as was taken by Charles, was that I prefer
 automatic processes to generate anything but the simplest translate tables.
 I prefer to do it in a program and copy it from a dump into a program
 because the table is static. I don't need to generate it every time I
 assemble.
 
 In regards to TROO, the original problem was stated as translating a 64 bit
 register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice.
 Though, I find your argument about the use of a TRTRE to simulate a FROGR
 interesting.  It brings the whole bit reversal into perspective. I'm not a
 big fan of translate instructions (I use them often enough), particularly
 those that require facilities and facility enhancements,  unless you're
 translating long strings and are willing to test the facility bits. I'm
 sure that the translation and parsing facilities exist on most customer's
 boxes by now but I emphasize most.  When I awakened this morning, I wrote
 an algorithm to do a load reverse bytes and bits using FLOGR to drive the
 process. I'm going to give the idea of a FROGR simulation more thought and
 continue this exercise later.
 
 MACRO
 LABEL  REVTABLE ,
 * Construct reverse bits translate table
 LCLA I,J,K,L,M,N,O
 LCLC X
 LABEL  DS   0DLIKE EM DOUBLE WORD ALIGNED
 I  SETA 0 STARTING VALUE
 .TABLOOP ANOP  LOOP UNTIL TABLE IS DONE
 K  SETA 1 NEED SIXTEEN ENTRIES PER LINE
 X  SETC   'AL1('
 AGO.X16LP
 .X16NXT ANOP
 X  SETC   'X'.'J'.','
 .X16LP  ANOP   16 ENTRY LOOP
 J  SETA 0 STARTING RESULT
 L  SETA 1 STARTING ADDEND
 M  SETA 1 8 BITS PER BYTE
 N  SETA 128   STARTING COMPARAND X'80'
 O  SETA ICOPY CURRENT BYTE TO REVERSE
 .BYTELP ANOP
 AIF  (O LT N).BYTEFT LESS THAN CURRENT - 0
 O  SETA O-N
 J  SETA J+L
 .BYTEFT ANOP
 L  SETA L*2   NEXT ADDEND
 N  SETA N/2   NEXT COMPARAND
 M  SETA M+1   NEED EIGHT BITS
 AIF  (M LE 8).BYTELP
 I  SETA I+1
 K  SETA K+1
 AIF  (K LE 16).X16NXT
 X  SETC 'X'.'J'.')'
 DC   X
 AIF  (I LT 256).TABLOOP
 MEND
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
 


-- 
Joel C. Ewing,Bentonville, AR   jcew...@acm.org 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread David L. Craig
On 13Jul24:1022+0800, David Crayford wrote:

 It's easy to do on X86 or ARM architectures which have instructions
 for scanning, counting and reversing bits
 https://github.com/facebook/hiphop-php/blob/master/hphp/util/bitops.h.
 Now I'm interested in what is the most efficient way to do this on
 zArch. I thought zArch was king of the CISC. ARM can do this in 4
 instructions and that's a RISC architecture.

A TR instruction that uses a 256-byte table of the reversed
byte values will handle up to 256 bytes, IIRC.
-- 
not cent from sell
May the LORD God bless you exceedingly abundantly!

Dave_Craig__
So the universe is not quite as you thought it was.
 You'd better rearrange your beliefs, then.
 Because you certainly can't rearrange the universe.
__--from_Nightfall_by_Asimov/Silverberg_

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Andrew Rowley
How fast does this code need to be? David's ffs64 looked pretty good to 
my inexpert eye, I think you would have to be running it very frequently 
for something to be measurably faster.


There are some similar discussions here, including some branchless 
techniques that probably would be faster (not necessarily detectably):

http://stackoverflow.com/questions/757059/position-of-least-significant-bit-that-is-set

One answer also talks about clearing the lowest set bit.

Andrew Rowley


On 24/07/2013 07:49, Charles Mills wrote:

Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If
I have a register that contains 01234567_89ABCDEF to convert it to
F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise
Load Reversed?

Yes, I am familiar with the Pops. Hard to find an instruction when you don't
know its name or even if it actually exists.

Why? Those of you following another thread I started know I am looking to
implement a 64-bit version of the UNIX library function ffs(), which returns
the bit number of the least significant one bit of a word. z architecture
provides the FLOGR instruction but it works MSB to LSB. I could potentially
live with that but it would introduce some new complications, one of them
being that counting from the LSB is much more compatible with how C promotes
integer types. If I could flip the bits of a word in one or two hardware
instructions I would have a solution. FLOGR provides an additional benefit
for me because I ultimately want to do the other chore that FLOGR does,
resetting the found bit.

I guess you could do it with TR (to reverse the bits) and then LRVG (to
reverse the bytes) but that is overly complex and probably slow (IMHO).

Thanks in advance,
Charles

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Charles Mills
Thanks all.

You're right, just how fast DOES this code need to be? And the answer is I
should know, but I don't. I don't want to waste the customer's cycles. I am
smart enough to know that I am too dumb to know how fast it needs to be. The
right answer lies in profiling, and some other task has always been just a
little higher priority than profiling.

Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I
hated it. I am very weak on the higher math relevant to programming.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Andrew Rowley
Sent: Wednesday, July 24, 2013 8:17 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

How fast does this code need to be? David's ffs64 looked pretty good to my
inexpert eye, I think you would have to be running it very frequently for
something to be measurably faster.

There are some similar discussions here, including some branchless
techniques that probably would be faster (not necessarily detectably):
http://stackoverflow.com/questions/757059/position-of-least-significant-bit-
that-is-set

One answer also talks about clearing the lowest set bit.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread David Crayford
Just for fun there's a good 64-bit reverse hack in Hank Warrens 
excellent Hackers Delight book based on Knuth's algorithm. Of course, 
native instructions should be faster.


Charles, I've sent you a couple of off list mails. Looks like I'm not 
getting past your spam filter.


/* How would you extend Knuth's algorithm to rotating a 64-bit quantity
on a 64-bit machine?
   A simple method is to first swap the two halves of the 64-bit word,
which takes one instruction if you have the rotate shift. Then, rotate
left 15 positions the two halves of the 64-bit word. This takes five
operations:

   x = (x  0x00010001)  15 | (x  0xFFFEFFFE)  17;

Then do the 10-swap, 4-swap, and 2-swap as in Knuth's code but with each
mask m doubled to the form mm. Thus this method adds five operations to
Knuth's code, making it 24 operations for the exclusive or version
(rev14a), if you have the rotate shift, or swap register halves.
   Another way is to develop code similar to Knuth's that first rotates
left 31 positions, and then does a 20-swap, an 8-swap, a 4-swap, and a
2-swap. This adds one stage of swapping, which takes seven operations as
in rev14, or 6 operations as in rev14a. Thus the simple method seems to
be best, in terms of operation count (but not if parallelism in rev14
can be used to advantage).
   First, the simple method. 24 operations if the swap (rotate 32)
counts as 1. */

unsigned long long rev15(unsigned long long x) {
   unsigned long long t;

   x = (x  32) | (x  32);   // Swap register halves.
   x = (x  0x00010001LL)  15 | // Rotate left
   (x  0xFFFEFFFELL)  17;  // 15.
   t = (x ^ (x  10))  0x003F801F003F801FLL;
   x = (t | (t  10)) ^ x;
   t = (x ^ (x  4))  0x0E0384210E038421LL;
   x = (t | (t  4)) ^ x;
   t = (x ^ (x  2))  0x2248884222488842LL;
   x = (t | (t  2)) ^ x;
   return x;
}

On 24/07/2013 8:30 PM, Charles Mills wrote:

Thanks all.

You're right, just how fast DOES this code need to be? And the answer is I
should know, but I don't. I don't want to waste the customer's cycles. I am
smart enough to know that I am too dumb to know how fast it needs to be. The
right answer lies in profiling, and some other task has always been just a
little higher priority than profiling.

Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I
hated it. I am very weak on the higher math relevant to programming.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Andrew Rowley
Sent: Wednesday, July 24, 2013 8:17 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

How fast does this code need to be? David's ffs64 looked pretty good to my
inexpert eye, I think you would have to be running it very frequently for
something to be measurably faster.

There are some similar discussions here, including some branchless
techniques that probably would be faster (not necessarily detectably):
http://stackoverflow.com/questions/757059/position-of-least-significant-bit-
that-is-set

One answer also talks about clearing the lowest set bit.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Kenneth Wilkerson
I can't imagine any instruction sequence in any language performing a Load
Reversed with Mirrored Bytes more efficiently in the Z/Architecture than a
STG, TR for eight bytes and LRVG. Even though, the TR is probably
micro-coded (I don't know about the LRVG), I can't see any loop that shifts
and manipulates the data and repeats up to 63 times (assuming a very dense
register) could outperform this. I wrote an algorithm using a FLOGR but
except in the best cases (all 0s or many leading 0s), I can't imagine this
running faster.  And with negative numbers (-1 being the worst case),  you
would probably want to exclusive or with foxes before and after the
operation to make the value  more sparse. 

However, in your initial post you talked about the above sequence involving
the TR being complex. I assume you're talking about the translate table
itself. When I need translate tables that are not simple and particularly
error prone, I write a program to create it. I would quadword align the
origin and result tables, do the tests and sets (in this case X'80' to
'X01', ... X'01' to X'80'), load the address of the result table in a
register, DC H'0' to get an 0c1. I would set a slip and run the job. I could
then format the dump and cut and paste (with a little manipulation) the
table into an assembler source. In this case, if the first and last 16 bytes
of the table are correct, the its probably 100% correct.  I find the half
hour I use doing this for error prone translate tables can save me hours
debugging later. 

Kenneth

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Charles Mills
Sent: Wednesday, July 24, 2013 7:31 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

Thanks all.

You're right, just how fast DOES this code need to be? And the answer is I
should know, but I don't. I don't want to waste the customer's cycles. I am
smart enough to know that I am too dumb to know how fast it needs to be. The
right answer lies in profiling, and some other task has always been just a
little higher priority than profiling.

Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I
hated it. I am very weak on the higher math relevant to programming.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Andrew Rowley
Sent: Wednesday, July 24, 2013 8:17 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

How fast does this code need to be? David's ffs64 looked pretty good to my
inexpert eye, I think you would have to be running it very frequently for
something to be measurably faster.

There are some similar discussions here, including some branchless
techniques that probably would be faster (not necessarily detectably):
http://stackoverflow.com/questions/757059/position-of-least-significant-bit-
that-is-set

One answer also talks about clearing the lowest set bit.

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Charles Mills
Agreed in principle. I would probably use Rexx to create it.

The complexity I referred to was the sum of a whole bunch of things that
would be necessary:

- linking out to assembler
- a 256-byte xlate table and TR
- LRVG
- FLOGR
- dealing somewhere with the zero case

Not the most complex thing ever, but a bunch of stuff. As I write mainframe
product code these days I always think one of these days this code may have
to be maintained by someone who is a lot more comfortable in C than with
assembler.

 hours debugging later

Or years of wrong results that might have a severe negative business impact
that was never detected, or was detected only after years. 

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Kenneth Wilkerson
Sent: Wednesday, July 24, 2013 9:43 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

I can't imagine any instruction sequence in any language performing a Load
Reversed with Mirrored Bytes more efficiently in the Z/Architecture than a
STG, TR for eight bytes and LRVG. Even though, the TR is probably
micro-coded (I don't know about the LRVG), I can't see any loop that shifts
and manipulates the data and repeats up to 63 times (assuming a very dense
register) could outperform this. I wrote an algorithm using a FLOGR but
except in the best cases (all 0s or many leading 0s), I can't imagine this
running faster.  And with negative numbers (-1 being the worst case),  you
would probably want to exclusive or with foxes before and after the
operation to make the value  more sparse. 

However, in your initial post you talked about the above sequence involving
the TR being complex. I assume you're talking about the translate table
itself. When I need translate tables that are not simple and particularly
error prone, I write a program to create it. I would quadword align the
origin and result tables, do the tests and sets (in this case X'80' to
'X01', ... X'01' to X'80'), load the address of the result table in a
register, DC H'0' to get an 0c1. I would set a slip and run the job. I could
then format the dump and cut and paste (with a little manipulation) the
table into an assembler source. In this case, if the first and last 16 bytes
of the table are correct, the its probably 100% correct.  I find the half
hour I use doing this for error prone translate tables can save me hours
debugging later. 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread retired mainframer
Could you please expand a little on how the position of the least
significant one bit affects integer promotion.  If the low order five bits
of a short are zero, won't the low order five bits of the corresponding int
be zero?  For that matter, even the position of the most significant one bit
shouldn't affect promotion since the promoted type is guaranteed to be at
least as large as the original type.

:: -Original Message-
:: From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
:: Behalf Of Charles Mills
:: Sent: Tuesday, July 23, 2013 2:49 PM
:: To: IBM-MAIN@LISTSERV.UA.EDU
:: Subject: Is there a reverse bits hardware instruction?

snip

:: Why? Those of you following another thread I started know I am looking
:: to
:: implement a 64-bit version of the UNIX library function ffs(), which
:: returns
:: the bit number of the least significant one bit of a word. z
:: architecture
:: provides the FLOGR instruction but it works MSB to LSB. I could
:: potentially
:: live with that but it would introduce some new complications, one of
:: them
:: being that counting from the LSB is much more compatible with how C
:: promotes
:: integer types. If I could flip the bits of a word in one or two hardware

snip

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread David Crayford
That sounds good. I wouldn't mind seeing an implementation in assembler with 
hard coded translate tables just for my education.  Not interested in signed 
integers as I can't think of a use case for that. 

On 24/07/2013, at 9:42 PM, Kenneth Wilkerson redb...@austin.rr.com wrote:

 I can't imagine any instruction sequence in any language performing a Load
 Reversed with Mirrored Bytes more efficiently in the Z/Architecture than a
 STG, TR for eight bytes and LRVG. Even though, the TR is probably
 micro-coded (I don't know about the LRVG), I can't see any loop that shifts
 and manipulates the data and repeats up to 63 times (assuming a very dense
 register) could outperform this. I wrote an algorithm using a FLOGR but
 except in the best cases (all 0s or many leading 0s), I can't imagine this
 running faster.  And with negative numbers (-1 being the worst case),  you
 would probably want to exclusive or with foxes before and after the
 operation to make the value  more sparse. 
 
 However, in your initial post you talked about the above sequence involving
 the TR being complex. I assume you're talking about the translate table
 itself. When I need translate tables that are not simple and particularly
 error prone, I write a program to create it. I would quadword align the
 origin and result tables, do the tests and sets (in this case X'80' to
 'X01', ... X'01' to X'80'), load the address of the result table in a
 register, DC H'0' to get an 0c1. I would set a slip and run the job. I could
 then format the dump and cut and paste (with a little manipulation) the
 table into an assembler source. In this case, if the first and last 16 bytes
 of the table are correct, the its probably 100% correct.  I find the half
 hour I use doing this for error prone translate tables can save me hours
 debugging later. 
 
 Kenneth
 
 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
 Behalf Of Charles Mills
 Sent: Wednesday, July 24, 2013 7:31 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Is there a reverse bits hardware instruction?
 
 Thanks all.
 
 You're right, just how fast DOES this code need to be? And the answer is I
 should know, but I don't. I don't want to waste the customer's cycles. I am
 smart enough to know that I am too dumb to know how fast it needs to be. The
 right answer lies in profiling, and some other task has always been just a
 little higher priority than profiling.
 
 Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I
 hated it. I am very weak on the higher math relevant to programming.
 
 Charles
 
 -Original Message-
 From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
 Behalf Of Andrew Rowley
 Sent: Wednesday, July 24, 2013 8:17 AM
 To: IBM-MAIN@LISTSERV.UA.EDU
 Subject: Re: Is there a reverse bits hardware instruction?
 
 How fast does this code need to be? David's ffs64 looked pretty good to my
 inexpert eye, I think you would have to be running it very frequently for
 something to be measurably faster.
 
 There are some similar discussions here, including some branchless
 techniques that probably would be faster (not necessarily detectably):
 http://stackoverflow.com/questions/757059/position-of-least-significant-bit-
 that-is-set
 
 One answer also talks about clearing the lowest set bit.
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions, send email
 to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
 
 --
 For IBM-MAIN subscribe / signoff / archive access instructions,
 send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Paul Gilmartin
On Wed, 24 Jul 2013 04:27:16 -0400, David L. Craig wrote:

A TR instruction that uses a 256-byte table of the reversed
byte values will handle up to 256 bytes, IIRC.
 
One TR to reverse the bytes, then a second TR to reverse
the bits, plus various MVC for setup.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Paul Gilmartin
On Wed, 24 Jul 2013 08:42:55 -0500, Kenneth Wilkerson wrote:

However, in your initial post you talked about the above sequence involving
the TR being complex. I assume you're talking about the translate table
itself. When I need translate tables that are not simple and particularly
error prone, I write a program to create it. I would quadword align the
origin and result tables, do the tests and sets (in this case X'80' to
'X01', ... X'01' to X'80'), load the address of the result table in a
register, DC H'0' to get an 0c1. I would set a slip and run the job. I could
then format the dump and cut and paste (with a little manipulation) the
table into an assembler source. In this case, if the first and last 16 bytes
of the table are correct, the its probably 100% correct.  I find the half
hour I use doing this for error prone translate tables can save me hours
debugging later.
 
Another contributor to this list would probably achieve the result with a
HLASM macro.

Not every programmer is authorized to set SLIPs, I believe.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread John McKown
MVCIN will reverse the bytes.

On Wed, Jul 24, 2013 at 9:25 AM, Paul Gilmartin paulgboul...@aim.comwrote:

 On Wed, 24 Jul 2013 04:27:16 -0400, David L. Craig wrote:
 
 A TR instruction that uses a 256-byte table of the reversed
 byte values will handle up to 256 bytes, IIRC.
 
 One TR to reverse the bytes, then a second TR to reverse
 the bits, plus various MVC for setup.

 -- gil--


This is a test of the Emergency Broadcast System. If this had been an
actual emergency, do you really think we'd stick around to tell you?

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread John Gilmore
The construction of arbitrary translation tables can be error-prone,
and when it is it is better done procedurally.  I use the HLASM macro
language, which is entirely adequate to such tasks; mais à chacun son
goût.

Here, however, we have for a TROO only the 256 permutations taken two
at a time of the sixteen hexadecimal digits, viz.,

0==0, 1==8, 2==4, 3==c,
4==2, 5==a, 6==6, 7==e,
8==1, 9==9, a==5, b==d,
c==3, d==b, e==7, f==f

and they can be enumerated readily, by program or manually (and
certainly without resort to cut-and-paste from a dump).

Symmetries can also be exploited, and the whole thing can be
arithmetized, but to do either would put mathematics dropouts at a
disadvantage.

The problem CAN be addressed with left circular shifts/left rotations,
but they must be nested (and iterated for long bit strings).  The TROO
turns out to be faster, particularly for those long bit strings.

The problem of bit-string reversal has its own interest, but if its
purpose is in effect to simulate a FROGR using a FLOGR, then other
approaches are possible.

Specifically, a TRTRE, Translate and Test Reverse Extended, can be
used.  It proceeds from right to left, high to low storage address, in
a byte string only until it finds a non-zero value in its table that
corresponds to the current byte's rank.

Permutations taken two at a time of the hexadecimal digit-codes

==, 0
0001==0001, 1
0010==0010, 2
0011==0001, 1
0100==0011, 3
0101==0001, 1
0110==0010, 2
0111==0001, 1
1000==0100, 4
1001==0001, 1
1010==0010, 2
1011==0001, 1
1100==0011, 3
1101==0001, 1
1110==0010, 2
==0001, 1

in which zero indicates the absence of a one bit and a non-zero value
indicates both the presence of a one bit and its one-origin offset
from the rightmost bit position.  The permutations/code points

x'N0',  N = 1, 2, . . . , f

need 'special' treatment, 8 must be added to the values shown above for them.

Unsurprisingly, this turns out to be faster than reversal followed by FLOG.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Charles Mills
Sure. Basically a C function prototyped as ffs64(uint64_t val) cannot tell
whether val started out as a char or a doubleword. (It could with overloads
but let's not go there.)

The fact that an MSB-based ffs64() would return 57 for 

unsigned char foo = 0x40;
int bar = ffs64(foo) 

is a little counterintuitive.

Charles
-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of retired mainframer
Sent: Wednesday, July 24, 2013 10:09 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

Could you please expand a little on how the position of the least
significant one bit affects integer promotion.  If the low order five bits
of a short are zero, won't the low order five bits of the corresponding int
be zero?  For that matter, even the position of the most significant one bit
shouldn't affect promotion since the promoted type is guaranteed to be at
least as large as the original type.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Is there a reverse bits hardware instruction?

2013-07-24 Thread Lynd, Eugene C.
Of course there is.  XI (EXCLUSIVE OR IMMEDIATE) and XC (EXCLUSIVE OR
CHARACTER).  

Where's  Steve Comstock when you need him?

 

Gene Lynd  


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Charles Mills
He's reading the op?

Charles
Composed on a mobile: please excuse my brevity 

Lynd, Eugene C. eugene.c.l...@saic.com wrote:

Of course there is.  XI (EXCLUSIVE OR IMMEDIATE) and XC (EXCLUSIVE OR
CHARACTER).  

Where's  Steve Comstock when you need him?

 

Gene Lynd  


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Steve Comstock

On 7/24/2013 11:59 AM, Charles Mills wrote:

He's reading the op?

Charles
Composed on a mobile: please excuse my brevity

Lynd, Eugene C. eugene.c.l...@saic.com wrote:


Of course there is.  XI (EXCLUSIVE OR IMMEDIATE) and XC (EXCLUSIVE OR
CHARACTER).

Where's  Steve Comstock when you need him?



Gene Lynd



He's still here. :-)


I came late to the thread and it was not really clear
to me what the OP wanted, so I decided to remain silent.

And, I've been extremely busy. Not training, unfortunately,
but I am now the chair of the Denver Takayama Sister City
Committee, a volunteer task that takes an incredible amount
of time.

--

Kind regards,

-Steve Comstock
The Trainer's Friend, Inc.

303-355-2752
http://www.trainersfriend.com

* To get a good Return on your Investment, first make an investment!
  + Training your people is an excellent investment

* Try our tool for calculating your Return On Investment
for training dollars at
  http://www.trainersfriend.com/ROI/roi.html

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-24 Thread Bill Godfrey
On Wed, 24 Jul 2013 12:54:06 -0500, Kenneth Wilkerson wrote:

The macro at the end of my reply will generate a reverse translate table.
(snip)

MACRO
LABEL  REVTABLE ,
* Construct reverse bits translate table
(snip)

That's a beauty.

Bill

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Is there a reverse bits hardware instruction?

2013-07-23 Thread Charles Mills
Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If
I have a register that contains 01234567_89ABCDEF to convert it to
F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise
Load Reversed?

Yes, I am familiar with the Pops. Hard to find an instruction when you don't
know its name or even if it actually exists.

Why? Those of you following another thread I started know I am looking to
implement a 64-bit version of the UNIX library function ffs(), which returns
the bit number of the least significant one bit of a word. z architecture
provides the FLOGR instruction but it works MSB to LSB. I could potentially
live with that but it would introduce some new complications, one of them
being that counting from the LSB is much more compatible with how C promotes
integer types. If I could flip the bits of a word in one or two hardware
instructions I would have a solution. FLOGR provides an additional benefit
for me because I ultimately want to do the other chore that FLOGR does,
resetting the found bit.

I guess you could do it with TR (to reverse the bits) and then LRVG (to
reverse the bytes) but that is overly complex and probably slow (IMHO).

Thanks in advance,
Charles 

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread John Gilmore
Look at the Rotate instructions in the PrOp.  RLL will do come close
to doing what you want  to do.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread Paul Gilmartin
On Tue, 23 Jul 2013 17:49:00 -0400, Charles Mills wrote:

Why? Those of you following another thread I started know I am looking to
implement a 64-bit version of the UNIX library function ffs(), which returns
the bit number of the least significant one bit of a word. z architecture
provides the FLOGR instruction but it works MSB to LSB. 
 
Subtract 1.  XOR with the original value.  The leftmost 1 bit of the result
now occupies the position of the rightmost 1 bit of the original value
AND with the original value yields the mask to reset that bit.

Zero must be treated as a special case.

--gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread John P. Baker
Gil,

The maximum negative value must also be treated as a special case.

John P. Baker

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Paul Gilmartin
Sent: Tuesday, July 23, 2013 7:13 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

On Tue, 23 Jul 2013 17:49:00 -0400, Charles Mills wrote:

Why? Those of you following another thread I started know I am looking 
to implement a 64-bit version of the UNIX library function ffs(), which 
returns the bit number of the least significant one bit of a word. z 
architecture provides the FLOGR instruction but it works MSB to LSB.
 
Subtract 1.  XOR with the original value.  The leftmost 1 bit of the result now 
occupies the position of the rightmost 1 bit of the original value AND with the 
original value yields the mask to reset that bit.

Zero must be treated as a special case.

--gil

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread Paul Gilmartin
On Tue, 23 Jul 2013 19:22:49 -0400, John P. Baker wrote:

The maximum negative value must also be treated as a special case.
 
I concluded that in a mental simulation after I pressed SEND.  This
is why boundary conditions should be tested.

I also should have said Subtract Logical rather than Subtract, if
overflow matters.  This is why environmental variations should
be tested.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread David Crayford
There's a RLLG instruction to rotate the bits in a 64-bit integer. I 
know that you want to call such a routine from C/C++. If you need to 
write ffs64() in assembler
you may find the linkage overhead of calling the routine is far greater 
than a slower implementation that has been inlined. It's about time IBM 
enhanced C/C++

to support __asm for non-Metal/C compiles.

On 24/07/2013 5:49 AM, Charles Mills wrote:

Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If
I have a register that contains 01234567_89ABCDEF to convert it to
F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise
Load Reversed?

Yes, I am familiar with the Pops. Hard to find an instruction when you don't
know its name or even if it actually exists.

Why? Those of you following another thread I started know I am looking to
implement a 64-bit version of the UNIX library function ffs(), which returns
the bit number of the least significant one bit of a word. z architecture
provides the FLOGR instruction but it works MSB to LSB. I could potentially
live with that but it would introduce some new complications, one of them
being that counting from the LSB is much more compatible with how C promotes
integer types. If I could flip the bits of a word in one or two hardware
instructions I would have a solution. FLOGR provides an additional benefit
for me because I ultimately want to do the other chore that FLOGR does,
resetting the found bit.

I guess you could do it with TR (to reverse the bits) and then LRVG (to
reverse the bytes) but that is overly complex and probably slow (IMHO).

Thanks in advance,
Charles

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread John Gilmore
I am happy to have David join in my recommendation, and I can imagine
a circumstance in which RLLG would be more useful than RLL, but RLL
itself does 64-bit rotations.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread Charles Mills
_asm and or one of those pseudo-assembler/library routines like __tr() for
more machine instructions.

Yes, overhead is definitely a negative for calls, especially if (like some
of us) you have not exactly gotten around to embracing XPLINK.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of David Crayford
Sent: Tuesday, July 23, 2013 7:59 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

There's a RLLG instruction to rotate the bits in a 64-bit integer. I know
that you want to call such a routine from C/C++. If you need to write
ffs64() in assembler you may find the linkage overhead of calling the
routine is far greater than a slower implementation that has been inlined.
It's about time IBM enhanced C/C++ to support __asm for non-Metal/C
compiles.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread Charles Mills
Hmmm. Have to think that one through. Thanks.

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Paul Gilmartin
Sent: Tuesday, July 23, 2013 7:13 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

On Tue, 23 Jul 2013 17:49:00 -0400, Charles Mills wrote:

Why? Those of you following another thread I started know I am looking 
to implement a 64-bit version of the UNIX library function ffs(), which 
returns the bit number of the least significant one bit of a word. z 
architecture provides the FLOGR instruction but it works MSB to LSB.
 
Subtract 1.  XOR with the original value.  The leftmost 1 bit of the result now 
occupies the position of the rightmost 1 bit of the original value AND with the 
original value yields the mask to reset that bit.

Zero must be treated as a special case.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread David Crayford

On 24/07/2013 8:07 AM, John Gilmore wrote:

I am happy to have David join in my recommendation, and I can imagine
a circumstance in which RLLG would be more useful than RLL, but RLL
itself does 64-bit rotations.


Sorry John, I didn't see your recommendation. Of course you are right 
RLL can rotate up to 64 bits. In C/C++
I compile in both LP32 and LP64 so one would need to implement both 
variants with #ifdef macros.




John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread David Crayford

On 24/07/2013 8:19 AM, Charles Mills wrote:

_asm and or one of those pseudo-assembler/library routines like __tr() for
more machine instructions.


There's is no built-in for RLL. I've got a requirement with IBM for 
__asm() support. I'm hoping it's going to make it in z/OS 2.1.



Yes, overhead is definitely a negative for calls, especially if (like some
of us) you have not exactly gotten around to embracing XPLINK.


Even XPLINK is significantly slower than inline. While it saves time by 
not moving as much data when saving registers branching to the function
usually results in a cache miss. It comes down to trade-offs and if you 
are micro optimizing code linkage overhead is significant.



Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of David Crayford
Sent: Tuesday, July 23, 2013 7:59 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

There's a RLLG instruction to rotate the bits in a 64-bit integer. I know
that you want to call such a routine from C/C++. If you need to write
ffs64() in assembler you may find the linkage overhead of calling the
routine is far greater than a slower implementation that has been inlined.
It's about time IBM enhanced C/C++ to support __asm for non-Metal/C
compiles.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread Charles Mills
Not sure how rotation would accomplish reversal. 

(I had looked at the rotate and shift instructions. Thought perhaps there
was a rotate until you shift out a bit and tell me how far you shifted
instruction.)

Charles

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of John Gilmore
Sent: Tuesday, July 23, 2013 7:11 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Is there a reverse bits hardware instruction?

Look at the Rotate instructions in the PrOp.  RLL will do come close to
doing what you want  to do.

John Gilmore, Ashland, MA 01721 - USA

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: Is there a reverse bits hardware instruction?

2013-07-23 Thread David Crayford
It's easy to do on X86 or ARM architectures which have instructions for 
scanning, counting and reversing bits 
https://github.com/facebook/hiphop-php/blob/master/hphp/util/bitops.h.
Now I'm interested in what is the most efficient way to do this on 
zArch. I thought zArch was king of the CISC. ARM can do this in 4 
instructions and that's a RISC architecture.


On 24/07/2013 5:49 AM, Charles Mills wrote:

Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If
I have a register that contains 01234567_89ABCDEF to convert it to
F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise
Load Reversed?

Yes, I am familiar with the Pops. Hard to find an instruction when you don't
know its name or even if it actually exists.

Why? Those of you following another thread I started know I am looking to
implement a 64-bit version of the UNIX library function ffs(), which returns
the bit number of the least significant one bit of a word. z architecture
provides the FLOGR instruction but it works MSB to LSB. I could potentially
live with that but it would introduce some new complications, one of them
being that counting from the LSB is much more compatible with how C promotes
integer types. If I could flip the bits of a word in one or two hardware
instructions I would have a solution. FLOGR provides an additional benefit
for me because I ultimately want to do the other chore that FLOGR does,
resetting the found bit.

I guess you could do it with TR (to reverse the bits) and then LRVG (to
reverse the bytes) but that is overly complex and probably slow (IMHO).

Thanks in advance,
Charles

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN