[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

via GitHub Mon, 03 Jul 2023 05:58:09 -0700


tang-hi commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618213152


   I believe that the Lucene 5.0 format may not be appropriate due to its 
rounding up of bitsPerValue. For example, it uses 8 bits to compress a 3-bit 
value, resulting in larger index files. However, I have already implemented a 
vectorized version of various compression formats that maintain the same size. 
The outcome appears promising 
   
   | Benchmark | Type | Iterations | Mean ops/us | Error |
   | --- | --- | --- | --- | --- |
   | Encode1 | thrpt | 15 | 37.023 | 5.060 |
   | VectorizedEncode1 | thrpt | 15 | 53.261 | 0.850 |
   | Encode2 | thrpt | 15 | 45.017 | 0.887 |
   | VectorizedEncode2 | thrpt | 15 | 56.111 | 2.234 |
   | Encode3 | thrpt | 15 | 38.750 | 0.521 |
   | VectorizedEncode3 | thrpt | 15 | 49.589 | 2.518 |
   | Encode4 | thrpt | 15 | 48.074 | 1.390 |
   | VectorizedEncode4 | thrpt | 15 | 55.654 | 2.580 |
   | Encode5 | thrpt | 15 | 35.517 | 0.815 |
   | VectorizedEncode5 | thrpt | 15 | 50.508 | 0.887 |
   | Encode6 | thrpt | 15 | 38.698 | 0.301 |
   | VectorizedEncode6 | thrpt | 15 | 48.876 | 0.926 |
   | Encode7 | thrpt | 15 | 36.697 | 0.942 |
   | VectorizedEncode7 | thrpt | 15 | 50.016 | 2.425 |
   | Encode8 | thrpt | 15 | 59.180 | 2.208 |
   | VectorizedEncode8 | thrpt | 15 | 54.895 | 0.497 |
   | Encode9 | thrpt | 15 | 24.215 | 0.670 |
   | VectorizedEncode9 | thrpt | 15 | 49.292 | 0.616 |
   | Encode10 | thrpt | 15 | 25.509 | 0.274 |
   | VectorizedEncode10 | thrpt | 15 | 46.777 | 0.762 |
   | Encode11 | thrpt | 15 | 25.165 | 0.635 |
   | VectorizedEncode11 | thrpt | 15 | 46.798 | 2.554 |
   | Encode12 | thrpt | 15 | 29.170 | 0.671 |
   | VectorizedEncode12 | thrpt | 15 | 47.331 | 0.994 |
   | Encode13 | thrpt | 15 | 23.749 | 1.126 |
   | VectorizedEncode13 | thrpt | 15 | 46.587 | 2.468 |
   | Encode14 | thrpt | 15 | 27.283 | 0.235 |
   | VectorizedEncode14 | thrpt | 15 | 44.704 | 0.805 |
   | Encode15 | thrpt | 15 | 27.459 | 1.035 |
   | VectorizedEncode15 | thrpt | 15 | 45.335 | 3.178 |
   | Encode16 | thrpt | 15 | 58.192 | 0.557 |
   | VectorizedEncode16 | thrpt | 15 | 52.698 | 0.918 |
   | Encode17 | thrpt | 15 | 16.265 | 0.168 |
   | VectorizedEncode17 | thrpt | 15 | 45.757 | 2.126 |
   | Encode18 | thrpt | 15 | 15.261 | 0.167 |
   | VectorizedEncode18 | thrpt | 15 | 44.386 | 0.807 |
   | Encode19 | thrpt | 15 | 12.531 | 0.138 |
   | VectorizedEncode19 | thrpt | 15 | 45.403 | 0.854 |
   | Encode20 | thrpt | 15 | 15.863 | 0.351 |
   | VectorizedEncode20 | thrpt | 15 | 42.607 | 3.698 |
   | Encode21 | thrpt | 15 | 15.772 | 0.154 |
   | VectorizedEncode21 | thrpt | 15 | 45.122 | 0.777 |
   | Encode22 | thrpt | 15 | 15.863 | 0.210 |
   | VectorizedEncode22 | thrpt | 15 | 42.802 | 1.240 |
   | Encode23 | thrpt | 15 | 15.638 | 0.095 |
   | VectorizedEncode23 | thrpt | 15 | 44.411 | 0.536 |
   | Encode24 | thrpt | 15 | 17.091 | 0.713 |
   | VectorizedEncode24 | thrpt | 15 | 42.151 | 2.151 |
   | Encode25 | thrpt | 15 | 15.206 | 0.163 |
   | VectorizedEncode25 | thrpt | 15 | 43.440 | 2.078 |
   | Encode26 | thrpt | 15 | 15.110 | 0.188 |
   | VectorizedEncode26 | thrpt | 15 | 40.758 | 0.416 |
   | Encode27 | thrpt | 15 | 14.794 | 0.192 |
   | VectorizedEncode27 | thrpt | 15 | 43.261 | 0.494 |
   | Encode28 | thrpt | 15 | 17.531 | 0.393 |
   | VectorizedEncode28 | thrpt | 15 | 41.578 | 0.838 |
   | Encode29 | thrpt | 15 | 14.423 | 0.173 |
   | VectorizedEncode29 | thrpt | 15 | 36.044 | 10.191 |
   | Encode30 | thrpt | 15 | 17.426 | 0.297 |
   | VectorizedEncode30 | thrpt | 15 | 40.087 | 0.791 |
   | Encode31 | thrpt | 15 | 18.489 | 0.180 |
   | VectorizedEncode31 | thrpt | 15 | 42.166 | 0.625 |
   | Encode32 | thrpt | 15 | 47.742 | 4.446 |
   | VectorizedEncode32 | thrpt | 15 | 54.260 | 1.806 |
   
   the code is straightforward, as shown below
   ```Java
   public void encode(long[] values, int bitsPerValue, long[] output) {
           int MASK = (int) ((1L << bitsPerValue) - 1);
   
   
           int bitsRemaining = 64;
           int upto = 0;
           int totalCompressedLine = 2 * bitsPerValue;
           int next = 0;
   
           LongVector input = LongVector.zero(LANE4_SPECIES);
           while (next < 128) {
               if (bitsRemaining >= bitsPerValue) {
                   input = input.or(LongVector.fromArray(LANE4_SPECIES, values, 
next).and(MASK)
                           .lanewise(VectorOperators.LSHL, bitsRemaining - 
bitsPerValue));
                   bitsRemaining -= bitsPerValue;
               } else {
                   LongVector valueVector = LongVector.fromArray(LANE4_SPECIES, 
values, next).and(MASK);
                   input = input.or(valueVector.lanewise(VectorOperators.LSHR, 
bitsPerValue - bitsRemaining));
                   input.intoArray(output, upto);
                   upto += numEncodeLength;
                   input = valueVector.lanewise(VectorOperators.LSHL, 64 - 
bitsPerValue + bitsRemaining);
                   bitsRemaining -= bitsPerValue;
                   bitsRemaining += 64;
               }
   
               if (bitsRemaining == 0) {
                   input.intoArray(output, upto);
                   upto += numEncodeLength;
                   input = LongVector.zero(LANE4_SPECIES);
                   bitsRemaining = 64;
               }
               next += 4;
           }
   
   
           if (totalCompressedLine % 4 != 0) {
               input.intoArray(output, upto);
              output[totalCompressedLine -2] |= (output[totalCompressedLine ] 
>>> 32);
              output[totalCompressedLine - 1] |= (output[totalCompressedLine + 
1] >>> 32);
           }
   
       }
   
   
       public void decode(int bitsPerValue, long[] input, long[] output) {
           long MASK = (int) ((1L << bitsPerValue) - 1);
   
   
           int upto = 0;
           int next = 0;
           int totalCompressedLine = 2 * bitsPerValue;
           int bitsRemaining = 64;
           LongVector inputVector = LongVector.fromArray(LANE4_SPECIES, input, 
next);
           next += 4;
   
           if (totalCompressedLine % 4 != 0) {
               input[totalCompressedLine] = ((input[totalCompressedLine - 2] & 
LOW) << 32);
               input[totalCompressedLine + 1] = ((input[totalCompressedLine - 
1] & LOW) << 32);
               input[totalCompressedLine -2] &= HIGH;
               input[totalCompressedLine - 1] &= HIGH;
           }
   
           while (upto < 128) {
               if (bitsRemaining >= bitsPerValue) {
                   LongVector res = inputVector.and(MASK << (bitsRemaining - 
bitsPerValue))
                           .lanewise(VectorOperators.LSHR, bitsRemaining - 
bitsPerValue);
                   bitsRemaining -= bitsPerValue;
                   res.intoArray(output, upto);
                   upto += 4;
               } else {
                   int bitDiff = bitsPerValue - bitsRemaining;
                   LongVector res = inputVector.and(MASK >> (bitsPerValue - 
bitsRemaining))
                           .lanewise(VectorOperators.LSHL, bitDiff);
   
                   inputVector = LongVector.fromArray(LANE4_SPECIES, input, 
next);
                   next += 4;
                   var temp = inputVector.and((MASK >> bitsRemaining) << (64 - 
bitDiff) );
                   res = res.or(temp.lanewise(VectorOperators.LSHR, 64 - 
bitDiff));
                   res.intoArray(output, upto);
                   upto += 4;
                   bitsRemaining -= bitsPerValue;
                   bitsRemaining += 64;
               }
   
               if (bitsRemaining  == 0) {
                   inputVector = LongVector.fromArray(LANE4_SPECIES, input, 
next);
                   next += 4;
                   bitsRemaining = 64;
               }
           }
   
       }
   
   ```
   
   I will proceed with testing the scalar version and decoding. Once everything 
is prepared, I will submit a pull request this week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

Reply via email to