Re: Is there a reverse bits hardware instruction?
Here's a single DC instruction I've posted May 2003 [search for what's your most difficult assembler concept] BYTE_FLIP DS 0CL256 TDC 256AL1*-T)/001)-((*-T)/002)*2)*X'80' (((*-T)/002)-((*-T)/004)*2)*X'40' (((*-T)/004)-((*-T)/008)*2)*X'20' (((*-T)/008)-((*-T)/016)*2)*X'10' (((*-T)/016)-((*-T)/032)*2)*X'08' (((*-T)/032)-((*-T)/064)*2)*X'04' (((*-T)/064)-((*-T)/128)*2)*X'02' (((*-T)/128)-((*-T)/256)*2)*X'01') HTH, -Victor- --- The macro at the end of my reply will generate a reverse translate table. I tested it enough to see that it looked right but it has nothing to do with my original point. I've found this discussion interesting and it has given me reason to play with something other than the complex process I'm working on now. My original point, as was taken by Charles, was that I prefer automatic processes to generate anything but the simplest translate tables. I prefer to do it in a program and copy it from a dump into a program because the table is static. I don't need to generate it every time I assemble. In regards to TROO, the original problem was stated as translating a 64 bit register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice. Though, I find your argument about the use of a TRTRE to simulate a FROGR interesting. It brings the whole bit reversal into perspective. I'm not a big fan of translate instructions (I use them often enough), particularly those that require facilities and facility enhancements, unless you're translating long strings and are willing to test the facility bits. I'm sure that the translation and parsing facilities exist on most customer's boxes by now but I emphasize most. When I awakened this morning, I wrote an algorithm to do a load reverse bytes and bits using FLOGR to drive the process. I'm going to give the idea of a FROGR simulation more thought and continue this exercise later. MACRO LABEL REVTABLE , * Construct reverse bits translate table LCLA I,J,K,L,M,N,O LCLC X LABEL DS 0DLIKE EM DOUBLE WORD ALIGNED I SETA 0 STARTING VALUE .TABLOOP ANOP LOOP UNTIL TABLE IS DONE K SETA 1 NEED SIXTEEN ENTRIES PER LINE X SETC 'AL1(' AGO.X16LP .X16NXT ANOP X SETC 'X'.'J'.',' .X16LP ANOP 16 ENTRY LOOP J SETA 0 STARTING RESULT L SETA 1 STARTING ADDEND M SETA 1 8 BITS PER BYTE N SETA 128 STARTING COMPARAND X'80' O SETA ICOPY CURRENT BYTE TO REVERSE .BYTELP ANOP AIF (O LT N).BYTEFT LESS THAN CURRENT - 0 O SETA O-N J SETA J+L .BYTEFT ANOP L SETA L*2 NEXT ADDEND N SETA N/2 NEXT COMPARAND M SETA M+1 NEED EIGHT BITS AIF (M LE 8).BYTELP I SETA I+1 K SETA K+1 AIF (K LE 16).X16NXT X SETC 'X'.'J'.')' DC X AIF (I LT 256).TABLOOP MEND -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
The overhead of a macro generated table versus static DC-generated table would only be an issue if the code is reassembled on a very frequent basis. Even if that is the case, that still doesn't rule out using an Assembler macro to generate the table for the first assembly, then using cut paste to put the generated DC's into the code and comment out the macro invocation. That way you can even keep the generation macro in the code as documentation and not have to track or document a separate program. I had to consult the Assembler Reference to be sure I knew what I was doing, but using the Arithmetic AND one can create a simpler version of the generation macro that eliminates the innermost bit loop. I tested the version below with z390: MACRO .* GENERATE TRT TABLE, 16 BYTES PER DC, FOR REVERSING BITS IN BYTE LABEL REVTBL LCLA N,R,SMAX LCLC X,SEP LABEL DS 0D N SETA 0 BYTE OFFSET IN TABLE .TLOOP ANOP SMAX SETA N+16 LIMIT FOR NEXT DC STATEMENT GEN X SETC '' CUMULATIVE AL1 CONSTANTS NXT DC SEP SETC '' NO COMMA NEEDED FOR 1ST AL1 IN DC .SLOOP ANOP .* COMPUTE REVERSED BITS FOR N R SETA ((N AND 128)/128)+((N AND 64)/32)+((N AND 32)/8) R SETA R+((N AND 16)/2)+((N AND 8)*2)+((N AND 4)*8) R SETA R+((N AND 2)*32)+((N AND 1)*128) X SETC 'X.SEP.R' ADD NEXT AL1 CONSTANT SEP SETC ',' NEXT TIME WILL NEED COMMA N SETA N+1 ADVANCE NXT TABLE BYTE AIF (N LT SMAX).SLOOP IF MORE FOR CURRENT DC DC AL1(X.) AIF (N LT 256).TLOOPIF MORE DC'S LEFT MEND The first and last 16 generated table bytes from the above macro are: DC AL1(0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240) ... DC AL1(15,143,79,207,47,175,111,239,31,159,95,223,63,191,127,255) which are consistent with the original macro. Joel C. Ewing On 07/24/2013 12:54 PM, Kenneth Wilkerson wrote: The macro at the end of my reply will generate a reverse translate table. I tested it enough to see that it looked right but it has nothing to do with my original point. I've found this discussion interesting and it has given me reason to play with something other than the complex process I'm working on now. My original point, as was taken by Charles, was that I prefer automatic processes to generate anything but the simplest translate tables. I prefer to do it in a program and copy it from a dump into a program because the table is static. I don't need to generate it every time I assemble. In regards to TROO, the original problem was stated as translating a 64 bit register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice. Though, I find your argument about the use of a TRTRE to simulate a FROGR interesting. It brings the whole bit reversal into perspective. I'm not a big fan of translate instructions (I use them often enough), particularly those that require facilities and facility enhancements, unless you're translating long strings and are willing to test the facility bits. I'm sure that the translation and parsing facilities exist on most customer's boxes by now but I emphasize most. When I awakened this morning, I wrote an algorithm to do a load reverse bytes and bits using FLOGR to drive the process. I'm going to give the idea of a FROGR simulation more thought and continue this exercise later. MACRO LABEL REVTABLE , * Construct reverse bits translate table LCLA I,J,K,L,M,N,O LCLC X LABEL DS 0DLIKE EM DOUBLE WORD ALIGNED I SETA 0 STARTING VALUE .TABLOOP ANOP LOOP UNTIL TABLE IS DONE K SETA 1 NEED SIXTEEN ENTRIES PER LINE X SETC 'AL1(' AGO.X16LP .X16NXT ANOP X SETC 'X'.'J'.',' .X16LP ANOP 16 ENTRY LOOP J SETA 0 STARTING RESULT L SETA 1 STARTING ADDEND M SETA 1 8 BITS PER BYTE N SETA 128 STARTING COMPARAND X'80' O SETA ICOPY CURRENT BYTE TO REVERSE .BYTELP ANOP AIF (O LT N).BYTEFT LESS THAN CURRENT - 0 O SETA O-N J SETA J+L .BYTEFT ANOP L SETA L*2 NEXT ADDEND N SETA N/2 NEXT COMPARAND M SETA M+1 NEED EIGHT BITS AIF (M LE 8).BYTELP I SETA I+1 K SETA K+1 AIF (K LE 16).X16NXT X SETC 'X'.'J'.')' DC X AIF (I LT 256).TABLOOP MEND -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John Gilmore Sent: Wednesday, July 24, 2013 10:29 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? The construction of arbitrary translation tables can be error-prone, and when it is it is better done procedurally. I use the HLASM macro language, which is entirely adequate to such tasks; mais à chacun son goût. Here, however, we have for a TROO only the 256 permutations taken two at a time
Re: Is there a reverse bits hardware instruction?
In CAE1XxDGgBNuZr4gv30WtG27jj4jqVLjBM0-qmCXOf=rz8x6...@mail.gmail.com, on 07/23/2013 at 07:11 PM, John Gilmore jwgli...@gmail.com said: Look at the Rotate instructions in the PrOp. RLL will do come close to doing what you want to do. FSVO close. Rotate has noting to do with reversing the bit order; it's just a circular shift. -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
In 000b01ce87ee$711f3580$535da080$@mcn.org, on 07/23/2013 at 05:49 PM, Charles Mills charl...@mcn.org said: I guess you could do it with TR (to reverse the bits) and then LRVG (to reverse the bytes) but that is overly complex and probably slow (IMHO). It's faster than move inverse, translate and load (-; -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
In 1482096876342980.wa.paulgboulderaim@listserv.ua.edu, on 07/24/2013 at 09:25 AM, Paul Gilmartin paulgboul...@aim.com said: One TR to reverse the bytes, Why, Isn't LRVG a lor faster and more concise? -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
In 51ef18aa.2060...@gmail.com, on 07/24/2013 at 07:58 AM, David Crayford dcrayf...@gmail.com said: There's a RLLG instruction to rotate the bits in a 64-bit integer. It doesn't *reverse* the bits, which is what the OP needs. -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
In 023901ce8876$0099c900$01cd5b00$@mcn.org, on 07/24/2013 at 09:59 AM, Charles Mills charl...@mcn.org said: Agreed in principle. I would probably use Rexx to create it. I would create the TR table in a macro, or the inline equivalent. -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
In CAE1XxDHWvp88UB4cB69Z0JbDaFE4mtv4eB+ViH=s3xxcttf...@mail.gmail.com, on 07/24/2013 at 11:29 AM, John Gilmore jwgli...@gmail.com said: Here, however, we have for a TROO Why TROO rather than TR? -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
In cae1xxdhosowjbjwsbdfaqjtt2namcqjsbg2y78y7heomz3a...@mail.gmail.com, on 07/23/2013 at 08:07 PM, John Gilmore jwgli...@gmail.com said: I am happy to have David join in my recommendation, and I can imagine a circumstance in which RLLG would be more useful than RLL, but RLL itself does 64-bit rotations. For ROTATE LEFT SINGLE LOGICAL (RLL), bits 0-31 of general registers R1 and R3 remain unchanged. -- Shmuel (Seymour J.) Metz, SysProg and JOAT Atid/2http://patriot.net/~shmuel We don't care. We don't have to care, we're Congress. (S877: The Shut up and Eat Your spam act of 2003) -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
So, no need for any macro at all to generate the byte bit reversal TRT table! I must have skipped over or forgotten the 2003 posting. This is just totally awesome, especially the redundant unary +'s to force decent alignment for readability. Had to actually run it with PRINT DATA to convince myself it really worked, then study it more closely. Once you understand why it is syntactically correct and what it is doing it seems obvious, but at first sight it is certainly intimidating. Obviously usage in a production application should have some explanatory comments so future maintainers don't panic at their first exposure! Joel C Ewing On 07/25/2013 09:13 AM, Victor Gil wrote: Here's a single DC instruction I've posted May 2003 [search for what's your most difficult assembler concept] BYTE_FLIP DS 0CL256 TDC 256AL1*-T)/001)-((*-T)/002)*2)*X'80' (((*-T)/002)-((*-T)/004)*2)*X'40' (((*-T)/004)-((*-T)/008)*2)*X'20' (((*-T)/008)-((*-T)/016)*2)*X'10' (((*-T)/016)-((*-T)/032)*2)*X'08' (((*-T)/032)-((*-T)/064)*2)*X'04' (((*-T)/064)-((*-T)/128)*2)*X'02' (((*-T)/128)-((*-T)/256)*2)*X'01') HTH, -Victor- --- The macro at the end of my reply will generate a reverse translate table. I tested it enough to see that it looked right but it has nothing to do with my original point. I've found this discussion interesting and it has given me reason to play with something other than the complex process I'm working on now. My original point, as was taken by Charles, was that I prefer automatic processes to generate anything but the simplest translate tables. I prefer to do it in a program and copy it from a dump into a program because the table is static. I don't need to generate it every time I assemble. In regards to TROO, the original problem was stated as translating a 64 bit register. A STG, TR for 8 bytes, followed by a LRVG will more than suffice. Though, I find your argument about the use of a TRTRE to simulate a FROGR interesting. It brings the whole bit reversal into perspective. I'm not a big fan of translate instructions (I use them often enough), particularly those that require facilities and facility enhancements, unless you're translating long strings and are willing to test the facility bits. I'm sure that the translation and parsing facilities exist on most customer's boxes by now but I emphasize most. When I awakened this morning, I wrote an algorithm to do a load reverse bytes and bits using FLOGR to drive the process. I'm going to give the idea of a FROGR simulation more thought and continue this exercise later. MACRO LABEL REVTABLE , * Construct reverse bits translate table LCLA I,J,K,L,M,N,O LCLC X LABEL DS 0DLIKE EM DOUBLE WORD ALIGNED I SETA 0 STARTING VALUE .TABLOOP ANOP LOOP UNTIL TABLE IS DONE K SETA 1 NEED SIXTEEN ENTRIES PER LINE X SETC 'AL1(' AGO.X16LP .X16NXT ANOP X SETC 'X'.'J'.',' .X16LP ANOP 16 ENTRY LOOP J SETA 0 STARTING RESULT L SETA 1 STARTING ADDEND M SETA 1 8 BITS PER BYTE N SETA 128 STARTING COMPARAND X'80' O SETA ICOPY CURRENT BYTE TO REVERSE .BYTELP ANOP AIF (O LT N).BYTEFT LESS THAN CURRENT - 0 O SETA O-N J SETA J+L .BYTEFT ANOP L SETA L*2 NEXT ADDEND N SETA N/2 NEXT COMPARAND M SETA M+1 NEED EIGHT BITS AIF (M LE 8).BYTELP I SETA I+1 K SETA K+1 AIF (K LE 16).X16NXT X SETC 'X'.'J'.')' DC X AIF (I LT 256).TABLOOP MEND -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- Joel C. Ewing,Bentonville, AR jcew...@acm.org -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On 13Jul24:1022+0800, David Crayford wrote: It's easy to do on X86 or ARM architectures which have instructions for scanning, counting and reversing bits https://github.com/facebook/hiphop-php/blob/master/hphp/util/bitops.h. Now I'm interested in what is the most efficient way to do this on zArch. I thought zArch was king of the CISC. ARM can do this in 4 instructions and that's a RISC architecture. A TR instruction that uses a 256-byte table of the reversed byte values will handle up to 256 bytes, IIRC. -- not cent from sell May the LORD God bless you exceedingly abundantly! Dave_Craig__ So the universe is not quite as you thought it was. You'd better rearrange your beliefs, then. Because you certainly can't rearrange the universe. __--from_Nightfall_by_Asimov/Silverberg_ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit-that-is-set One answer also talks about clearing the lowest set bit. Andrew Rowley On 24/07/2013 07:49, Charles Mills wrote: Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If I have a register that contains 01234567_89ABCDEF to convert it to F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise Load Reversed? Yes, I am familiar with the Pops. Hard to find an instruction when you don't know its name or even if it actually exists. Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. I could potentially live with that but it would introduce some new complications, one of them being that counting from the LSB is much more compatible with how C promotes integer types. If I could flip the bits of a word in one or two hardware instructions I would have a solution. FLOGR provides an additional benefit for me because I ultimately want to do the other chore that FLOGR does, resetting the found bit. I guess you could do it with TR (to reverse the bits) and then LRVG (to reverse the bytes) but that is overly complex and probably slow (IMHO). Thanks in advance, Charles -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Thanks all. You're right, just how fast DOES this code need to be? And the answer is I should know, but I don't. I don't want to waste the customer's cycles. I am smart enough to know that I am too dumb to know how fast it needs to be. The right answer lies in profiling, and some other task has always been just a little higher priority than profiling. Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I hated it. I am very weak on the higher math relevant to programming. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Andrew Rowley Sent: Wednesday, July 24, 2013 8:17 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit- that-is-set One answer also talks about clearing the lowest set bit. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Just for fun there's a good 64-bit reverse hack in Hank Warrens excellent Hackers Delight book based on Knuth's algorithm. Of course, native instructions should be faster. Charles, I've sent you a couple of off list mails. Looks like I'm not getting past your spam filter. /* How would you extend Knuth's algorithm to rotating a 64-bit quantity on a 64-bit machine? A simple method is to first swap the two halves of the 64-bit word, which takes one instruction if you have the rotate shift. Then, rotate left 15 positions the two halves of the 64-bit word. This takes five operations: x = (x 0x00010001) 15 | (x 0xFFFEFFFE) 17; Then do the 10-swap, 4-swap, and 2-swap as in Knuth's code but with each mask m doubled to the form mm. Thus this method adds five operations to Knuth's code, making it 24 operations for the exclusive or version (rev14a), if you have the rotate shift, or swap register halves. Another way is to develop code similar to Knuth's that first rotates left 31 positions, and then does a 20-swap, an 8-swap, a 4-swap, and a 2-swap. This adds one stage of swapping, which takes seven operations as in rev14, or 6 operations as in rev14a. Thus the simple method seems to be best, in terms of operation count (but not if parallelism in rev14 can be used to advantage). First, the simple method. 24 operations if the swap (rotate 32) counts as 1. */ unsigned long long rev15(unsigned long long x) { unsigned long long t; x = (x 32) | (x 32); // Swap register halves. x = (x 0x00010001LL) 15 | // Rotate left (x 0xFFFEFFFELL) 17; // 15. t = (x ^ (x 10)) 0x003F801F003F801FLL; x = (t | (t 10)) ^ x; t = (x ^ (x 4)) 0x0E0384210E038421LL; x = (t | (t 4)) ^ x; t = (x ^ (x 2)) 0x2248884222488842LL; x = (t | (t 2)) ^ x; return x; } On 24/07/2013 8:30 PM, Charles Mills wrote: Thanks all. You're right, just how fast DOES this code need to be? And the answer is I should know, but I don't. I don't want to waste the customer's cycles. I am smart enough to know that I am too dumb to know how fast it needs to be. The right answer lies in profiling, and some other task has always been just a little higher priority than profiling. Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I hated it. I am very weak on the higher math relevant to programming. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Andrew Rowley Sent: Wednesday, July 24, 2013 8:17 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit- that-is-set One answer also talks about clearing the lowest set bit. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
I can't imagine any instruction sequence in any language performing a Load Reversed with Mirrored Bytes more efficiently in the Z/Architecture than a STG, TR for eight bytes and LRVG. Even though, the TR is probably micro-coded (I don't know about the LRVG), I can't see any loop that shifts and manipulates the data and repeats up to 63 times (assuming a very dense register) could outperform this. I wrote an algorithm using a FLOGR but except in the best cases (all 0s or many leading 0s), I can't imagine this running faster. And with negative numbers (-1 being the worst case), you would probably want to exclusive or with foxes before and after the operation to make the value more sparse. However, in your initial post you talked about the above sequence involving the TR being complex. I assume you're talking about the translate table itself. When I need translate tables that are not simple and particularly error prone, I write a program to create it. I would quadword align the origin and result tables, do the tests and sets (in this case X'80' to 'X01', ... X'01' to X'80'), load the address of the result table in a register, DC H'0' to get an 0c1. I would set a slip and run the job. I could then format the dump and cut and paste (with a little manipulation) the table into an assembler source. In this case, if the first and last 16 bytes of the table are correct, the its probably 100% correct. I find the half hour I use doing this for error prone translate tables can save me hours debugging later. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Wednesday, July 24, 2013 7:31 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? Thanks all. You're right, just how fast DOES this code need to be? And the answer is I should know, but I don't. I don't want to waste the customer's cycles. I am smart enough to know that I am too dumb to know how fast it needs to be. The right answer lies in profiling, and some other task has always been just a little higher priority than profiling. Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I hated it. I am very weak on the higher math relevant to programming. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Andrew Rowley Sent: Wednesday, July 24, 2013 8:17 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit- that-is-set One answer also talks about clearing the lowest set bit. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Agreed in principle. I would probably use Rexx to create it. The complexity I referred to was the sum of a whole bunch of things that would be necessary: - linking out to assembler - a 256-byte xlate table and TR - LRVG - FLOGR - dealing somewhere with the zero case Not the most complex thing ever, but a bunch of stuff. As I write mainframe product code these days I always think one of these days this code may have to be maintained by someone who is a lot more comfortable in C than with assembler. hours debugging later Or years of wrong results that might have a severe negative business impact that was never detected, or was detected only after years. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Kenneth Wilkerson Sent: Wednesday, July 24, 2013 9:43 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? I can't imagine any instruction sequence in any language performing a Load Reversed with Mirrored Bytes more efficiently in the Z/Architecture than a STG, TR for eight bytes and LRVG. Even though, the TR is probably micro-coded (I don't know about the LRVG), I can't see any loop that shifts and manipulates the data and repeats up to 63 times (assuming a very dense register) could outperform this. I wrote an algorithm using a FLOGR but except in the best cases (all 0s or many leading 0s), I can't imagine this running faster. And with negative numbers (-1 being the worst case), you would probably want to exclusive or with foxes before and after the operation to make the value more sparse. However, in your initial post you talked about the above sequence involving the TR being complex. I assume you're talking about the translate table itself. When I need translate tables that are not simple and particularly error prone, I write a program to create it. I would quadword align the origin and result tables, do the tests and sets (in this case X'80' to 'X01', ... X'01' to X'80'), load the address of the result table in a register, DC H'0' to get an 0c1. I would set a slip and run the job. I could then format the dump and cut and paste (with a little manipulation) the table into an assembler source. In this case, if the first and last 16 bytes of the table are correct, the its probably 100% correct. I find the half hour I use doing this for error prone translate tables can save me hours debugging later. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Could you please expand a little on how the position of the least significant one bit affects integer promotion. If the low order five bits of a short are zero, won't the low order five bits of the corresponding int be zero? For that matter, even the position of the most significant one bit shouldn't affect promotion since the promoted type is guaranteed to be at least as large as the original type. :: -Original Message- :: From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On :: Behalf Of Charles Mills :: Sent: Tuesday, July 23, 2013 2:49 PM :: To: IBM-MAIN@LISTSERV.UA.EDU :: Subject: Is there a reverse bits hardware instruction? snip :: Why? Those of you following another thread I started know I am looking :: to :: implement a 64-bit version of the UNIX library function ffs(), which :: returns :: the bit number of the least significant one bit of a word. z :: architecture :: provides the FLOGR instruction but it works MSB to LSB. I could :: potentially :: live with that but it would introduce some new complications, one of :: them :: being that counting from the LSB is much more compatible with how C :: promotes :: integer types. If I could flip the bits of a word in one or two hardware snip -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
That sounds good. I wouldn't mind seeing an implementation in assembler with hard coded translate tables just for my education. Not interested in signed integers as I can't think of a use case for that. On 24/07/2013, at 9:42 PM, Kenneth Wilkerson redb...@austin.rr.com wrote: I can't imagine any instruction sequence in any language performing a Load Reversed with Mirrored Bytes more efficiently in the Z/Architecture than a STG, TR for eight bytes and LRVG. Even though, the TR is probably micro-coded (I don't know about the LRVG), I can't see any loop that shifts and manipulates the data and repeats up to 63 times (assuming a very dense register) could outperform this. I wrote an algorithm using a FLOGR but except in the best cases (all 0s or many leading 0s), I can't imagine this running faster. And with negative numbers (-1 being the worst case), you would probably want to exclusive or with foxes before and after the operation to make the value more sparse. However, in your initial post you talked about the above sequence involving the TR being complex. I assume you're talking about the translate table itself. When I need translate tables that are not simple and particularly error prone, I write a program to create it. I would quadword align the origin and result tables, do the tests and sets (in this case X'80' to 'X01', ... X'01' to X'80'), load the address of the result table in a register, DC H'0' to get an 0c1. I would set a slip and run the job. I could then format the dump and cut and paste (with a little manipulation) the table into an assembler source. In this case, if the first and last 16 bytes of the table are correct, the its probably 100% correct. I find the half hour I use doing this for error prone translate tables can save me hours debugging later. Kenneth -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Charles Mills Sent: Wednesday, July 24, 2013 7:31 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? Thanks all. You're right, just how fast DOES this code need to be? And the answer is I should know, but I don't. I don't want to waste the customer's cycles. I am smart enough to know that I am too dumb to know how fast it needs to be. The right answer lies in profiling, and some other task has always been just a little higher priority than profiling. Thanks! Great link! The De Bruijn thing is amazing. I was a math minor but I hated it. I am very weak on the higher math relevant to programming. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Andrew Rowley Sent: Wednesday, July 24, 2013 8:17 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? How fast does this code need to be? David's ffs64 looked pretty good to my inexpert eye, I think you would have to be running it very frequently for something to be measurably faster. There are some similar discussions here, including some branchless techniques that probably would be faster (not necessarily detectably): http://stackoverflow.com/questions/757059/position-of-least-significant-bit- that-is-set One answer also talks about clearing the lowest set bit. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On Wed, 24 Jul 2013 04:27:16 -0400, David L. Craig wrote: A TR instruction that uses a 256-byte table of the reversed byte values will handle up to 256 bytes, IIRC. One TR to reverse the bytes, then a second TR to reverse the bits, plus various MVC for setup. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On Wed, 24 Jul 2013 08:42:55 -0500, Kenneth Wilkerson wrote: However, in your initial post you talked about the above sequence involving the TR being complex. I assume you're talking about the translate table itself. When I need translate tables that are not simple and particularly error prone, I write a program to create it. I would quadword align the origin and result tables, do the tests and sets (in this case X'80' to 'X01', ... X'01' to X'80'), load the address of the result table in a register, DC H'0' to get an 0c1. I would set a slip and run the job. I could then format the dump and cut and paste (with a little manipulation) the table into an assembler source. In this case, if the first and last 16 bytes of the table are correct, the its probably 100% correct. I find the half hour I use doing this for error prone translate tables can save me hours debugging later. Another contributor to this list would probably achieve the result with a HLASM macro. Not every programmer is authorized to set SLIPs, I believe. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
MVCIN will reverse the bytes. On Wed, Jul 24, 2013 at 9:25 AM, Paul Gilmartin paulgboul...@aim.comwrote: On Wed, 24 Jul 2013 04:27:16 -0400, David L. Craig wrote: A TR instruction that uses a 256-byte table of the reversed byte values will handle up to 256 bytes, IIRC. One TR to reverse the bytes, then a second TR to reverse the bits, plus various MVC for setup. -- gil-- This is a test of the Emergency Broadcast System. If this had been an actual emergency, do you really think we'd stick around to tell you? Maranatha! John McKown -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
The construction of arbitrary translation tables can be error-prone, and when it is it is better done procedurally. I use the HLASM macro language, which is entirely adequate to such tasks; mais à chacun son goût. Here, however, we have for a TROO only the 256 permutations taken two at a time of the sixteen hexadecimal digits, viz., 0==0, 1==8, 2==4, 3==c, 4==2, 5==a, 6==6, 7==e, 8==1, 9==9, a==5, b==d, c==3, d==b, e==7, f==f and they can be enumerated readily, by program or manually (and certainly without resort to cut-and-paste from a dump). Symmetries can also be exploited, and the whole thing can be arithmetized, but to do either would put mathematics dropouts at a disadvantage. The problem CAN be addressed with left circular shifts/left rotations, but they must be nested (and iterated for long bit strings). The TROO turns out to be faster, particularly for those long bit strings. The problem of bit-string reversal has its own interest, but if its purpose is in effect to simulate a FROGR using a FLOGR, then other approaches are possible. Specifically, a TRTRE, Translate and Test Reverse Extended, can be used. It proceeds from right to left, high to low storage address, in a byte string only until it finds a non-zero value in its table that corresponds to the current byte's rank. Permutations taken two at a time of the hexadecimal digit-codes ==, 0 0001==0001, 1 0010==0010, 2 0011==0001, 1 0100==0011, 3 0101==0001, 1 0110==0010, 2 0111==0001, 1 1000==0100, 4 1001==0001, 1 1010==0010, 2 1011==0001, 1 1100==0011, 3 1101==0001, 1 1110==0010, 2 ==0001, 1 in which zero indicates the absence of a one bit and a non-zero value indicates both the presence of a one bit and its one-origin offset from the rightmost bit position. The permutations/code points x'N0', N = 1, 2, . . . , f need 'special' treatment, 8 must be added to the values shown above for them. Unsurprisingly, this turns out to be faster than reversal followed by FLOG. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Sure. Basically a C function prototyped as ffs64(uint64_t val) cannot tell whether val started out as a char or a doubleword. (It could with overloads but let's not go there.) The fact that an MSB-based ffs64() would return 57 for unsigned char foo = 0x40; int bar = ffs64(foo) is a little counterintuitive. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of retired mainframer Sent: Wednesday, July 24, 2013 10:09 AM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? Could you please expand a little on how the position of the least significant one bit affects integer promotion. If the low order five bits of a short are zero, won't the low order five bits of the corresponding int be zero? For that matter, even the position of the most significant one bit shouldn't affect promotion since the promoted type is guaranteed to be at least as large as the original type. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Is there a reverse bits hardware instruction?
Of course there is. XI (EXCLUSIVE OR IMMEDIATE) and XC (EXCLUSIVE OR CHARACTER). Where's Steve Comstock when you need him? Gene Lynd -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
He's reading the op? Charles Composed on a mobile: please excuse my brevity Lynd, Eugene C. eugene.c.l...@saic.com wrote: Of course there is. XI (EXCLUSIVE OR IMMEDIATE) and XC (EXCLUSIVE OR CHARACTER). Where's Steve Comstock when you need him? Gene Lynd -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On 7/24/2013 11:59 AM, Charles Mills wrote: He's reading the op? Charles Composed on a mobile: please excuse my brevity Lynd, Eugene C. eugene.c.l...@saic.com wrote: Of course there is. XI (EXCLUSIVE OR IMMEDIATE) and XC (EXCLUSIVE OR CHARACTER). Where's Steve Comstock when you need him? Gene Lynd He's still here. :-) I came late to the thread and it was not really clear to me what the OP wanted, so I decided to remain silent. And, I've been extremely busy. Not training, unfortunately, but I am now the chair of the Denver Takayama Sister City Committee, a volunteer task that takes an incredible amount of time. -- Kind regards, -Steve Comstock The Trainer's Friend, Inc. 303-355-2752 http://www.trainersfriend.com * To get a good Return on your Investment, first make an investment! + Training your people is an excellent investment * Try our tool for calculating your Return On Investment for training dollars at http://www.trainersfriend.com/ROI/roi.html -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On Wed, 24 Jul 2013 12:54:06 -0500, Kenneth Wilkerson wrote: The macro at the end of my reply will generate a reverse translate table. (snip) MACRO LABEL REVTABLE , * Construct reverse bits translate table (snip) That's a beauty. Bill -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Is there a reverse bits hardware instruction?
Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If I have a register that contains 01234567_89ABCDEF to convert it to F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise Load Reversed? Yes, I am familiar with the Pops. Hard to find an instruction when you don't know its name or even if it actually exists. Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. I could potentially live with that but it would introduce some new complications, one of them being that counting from the LSB is much more compatible with how C promotes integer types. If I could flip the bits of a word in one or two hardware instructions I would have a solution. FLOGR provides an additional benefit for me because I ultimately want to do the other chore that FLOGR does, resetting the found bit. I guess you could do it with TR (to reverse the bits) and then LRVG (to reverse the bytes) but that is overly complex and probably slow (IMHO). Thanks in advance, Charles -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Look at the Rotate instructions in the PrOp. RLL will do come close to doing what you want to do. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On Tue, 23 Jul 2013 17:49:00 -0400, Charles Mills wrote: Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. Subtract 1. XOR with the original value. The leftmost 1 bit of the result now occupies the position of the rightmost 1 bit of the original value AND with the original value yields the mask to reset that bit. Zero must be treated as a special case. --gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Gil, The maximum negative value must also be treated as a special case. John P. Baker -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Paul Gilmartin Sent: Tuesday, July 23, 2013 7:13 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? On Tue, 23 Jul 2013 17:49:00 -0400, Charles Mills wrote: Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. Subtract 1. XOR with the original value. The leftmost 1 bit of the result now occupies the position of the rightmost 1 bit of the original value AND with the original value yields the mask to reset that bit. Zero must be treated as a special case. --gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On Tue, 23 Jul 2013 19:22:49 -0400, John P. Baker wrote: The maximum negative value must also be treated as a special case. I concluded that in a mental simulation after I pressed SEND. This is why boundary conditions should be tested. I also should have said Subtract Logical rather than Subtract, if overflow matters. This is why environmental variations should be tested. -- gil -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
There's a RLLG instruction to rotate the bits in a 64-bit integer. I know that you want to call such a routine from C/C++. If you need to write ffs64() in assembler you may find the linkage overhead of calling the routine is far greater than a slower implementation that has been inlined. It's about time IBM enhanced C/C++ to support __asm for non-Metal/C compiles. On 24/07/2013 5:49 AM, Charles Mills wrote: Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If I have a register that contains 01234567_89ABCDEF to convert it to F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise Load Reversed? Yes, I am familiar with the Pops. Hard to find an instruction when you don't know its name or even if it actually exists. Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. I could potentially live with that but it would introduce some new complications, one of them being that counting from the LSB is much more compatible with how C promotes integer types. If I could flip the bits of a word in one or two hardware instructions I would have a solution. FLOGR provides an additional benefit for me because I ultimately want to do the other chore that FLOGR does, resetting the found bit. I guess you could do it with TR (to reverse the bits) and then LRVG (to reverse the bytes) but that is overly complex and probably slow (IMHO). Thanks in advance, Charles -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
I am happy to have David join in my recommendation, and I can imagine a circumstance in which RLLG would be more useful than RLL, but RLL itself does 64-bit rotations. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
_asm and or one of those pseudo-assembler/library routines like __tr() for more machine instructions. Yes, overhead is definitely a negative for calls, especially if (like some of us) you have not exactly gotten around to embracing XPLINK. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, July 23, 2013 7:59 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? There's a RLLG instruction to rotate the bits in a 64-bit integer. I know that you want to call such a routine from C/C++. If you need to write ffs64() in assembler you may find the linkage overhead of calling the routine is far greater than a slower implementation that has been inlined. It's about time IBM enhanced C/C++ to support __asm for non-Metal/C compiles. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Hmmm. Have to think that one through. Thanks. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Paul Gilmartin Sent: Tuesday, July 23, 2013 7:13 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? On Tue, 23 Jul 2013 17:49:00 -0400, Charles Mills wrote: Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. Subtract 1. XOR with the original value. The leftmost 1 bit of the result now occupies the position of the rightmost 1 bit of the original value AND with the original value yields the mask to reset that bit. Zero must be treated as a special case. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On 24/07/2013 8:07 AM, John Gilmore wrote: I am happy to have David join in my recommendation, and I can imagine a circumstance in which RLLG would be more useful than RLL, but RLL itself does 64-bit rotations. Sorry John, I didn't see your recommendation. Of course you are right RLL can rotate up to 64 bits. In C/C++ I compile in both LP32 and LP64 so one would need to implement both variants with #ifdef macros. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
On 24/07/2013 8:19 AM, Charles Mills wrote: _asm and or one of those pseudo-assembler/library routines like __tr() for more machine instructions. There's is no built-in for RLL. I've got a requirement with IBM for __asm() support. I'm hoping it's going to make it in z/OS 2.1. Yes, overhead is definitely a negative for calls, especially if (like some of us) you have not exactly gotten around to embracing XPLINK. Even XPLINK is significantly slower than inline. While it saves time by not moving as much data when saving registers branching to the function usually results in a cache miss. It comes down to trade-offs and if you are micro optimizing code linkage overhead is significant. Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of David Crayford Sent: Tuesday, July 23, 2013 7:59 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? There's a RLLG instruction to rotate the bits in a 64-bit integer. I know that you want to call such a routine from C/C++. If you need to write ffs64() in assembler you may find the linkage overhead of calling the routine is far greater than a slower implementation that has been inlined. It's about time IBM enhanced C/C++ to support __asm for non-Metal/C compiles. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
Not sure how rotation would accomplish reversal. (I had looked at the rotate and shift instructions. Thought perhaps there was a rotate until you shift out a bit and tell me how far you shifted instruction.) Charles -Original Message- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of John Gilmore Sent: Tuesday, July 23, 2013 7:11 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Is there a reverse bits hardware instruction? Look at the Rotate instructions in the PrOp. RLL will do come close to doing what you want to do. John Gilmore, Ashland, MA 01721 - USA -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
Re: Is there a reverse bits hardware instruction?
It's easy to do on X86 or ARM architectures which have instructions for scanning, counting and reversing bits https://github.com/facebook/hiphop-php/blob/master/hphp/util/bitops.h. Now I'm interested in what is the most efficient way to do this on zArch. I thought zArch was king of the CISC. ARM can do this in 4 instructions and that's a RISC architecture. On 24/07/2013 5:49 AM, Charles Mills wrote: Is there a quick way to reverse MSB to LSB the bits of a 64-bit register? If I have a register that contains 01234567_89ABCDEF to convert it to F7B3D591_E6A2C480? (I think I got that at least mostly right.) A bit-wise Load Reversed? Yes, I am familiar with the Pops. Hard to find an instruction when you don't know its name or even if it actually exists. Why? Those of you following another thread I started know I am looking to implement a 64-bit version of the UNIX library function ffs(), which returns the bit number of the least significant one bit of a word. z architecture provides the FLOGR instruction but it works MSB to LSB. I could potentially live with that but it would introduce some new complications, one of them being that counting from the LSB is much more compatible with how C promotes integer types. If I could flip the bits of a word in one or two hardware instructions I would have a solution. FLOGR provides an additional benefit for me because I ultimately want to do the other chore that FLOGR does, resetting the found bit. I guess you could do it with TR (to reverse the bits) and then LRVG (to reverse the bytes) but that is overly complex and probably slow (IMHO). Thanks in advance, Charles -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN