tang-hi commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618213152
I believe that the Lucene 5.0 format may not be appropriate due to its
rounding up of bitsPerValue. For example, it uses 8 bits to compress a 3-bit
value, resulting in larger index files. However, I have already implemented a
vectorized version of various compression formats that maintain the same size.
The outcome appears promising
| Benchmark | Type | Iterations | Mean ops/us | Error |
| --- | --- | --- | --- | --- |
| Encode1 | thrpt | 15 | 37.023 | 5.060 |
| VectorizedEncode1 | thrpt | 15 | 53.261 | 0.850 |
| Encode2 | thrpt | 15 | 45.017 | 0.887 |
| VectorizedEncode2 | thrpt | 15 | 56.111 | 2.234 |
| Encode3 | thrpt | 15 | 38.750 | 0.521 |
| VectorizedEncode3 | thrpt | 15 | 49.589 | 2.518 |
| Encode4 | thrpt | 15 | 48.074 | 1.390 |
| VectorizedEncode4 | thrpt | 15 | 55.654 | 2.580 |
| Encode5 | thrpt | 15 | 35.517 | 0.815 |
| VectorizedEncode5 | thrpt | 15 | 50.508 | 0.887 |
| Encode6 | thrpt | 15 | 38.698 | 0.301 |
| VectorizedEncode6 | thrpt | 15 | 48.876 | 0.926 |
| Encode7 | thrpt | 15 | 36.697 | 0.942 |
| VectorizedEncode7 | thrpt | 15 | 50.016 | 2.425 |
| Encode8 | thrpt | 15 | 59.180 | 2.208 |
| VectorizedEncode8 | thrpt | 15 | 54.895 | 0.497 |
| Encode9 | thrpt | 15 | 24.215 | 0.670 |
| VectorizedEncode9 | thrpt | 15 | 49.292 | 0.616 |
| Encode10 | thrpt | 15 | 25.509 | 0.274 |
| VectorizedEncode10 | thrpt | 15 | 46.777 | 0.762 |
| Encode11 | thrpt | 15 | 25.165 | 0.635 |
| VectorizedEncode11 | thrpt | 15 | 46.798 | 2.554 |
| Encode12 | thrpt | 15 | 29.170 | 0.671 |
| VectorizedEncode12 | thrpt | 15 | 47.331 | 0.994 |
| Encode13 | thrpt | 15 | 23.749 | 1.126 |
| VectorizedEncode13 | thrpt | 15 | 46.587 | 2.468 |
| Encode14 | thrpt | 15 | 27.283 | 0.235 |
| VectorizedEncode14 | thrpt | 15 | 44.704 | 0.805 |
| Encode15 | thrpt | 15 | 27.459 | 1.035 |
| VectorizedEncode15 | thrpt | 15 | 45.335 | 3.178 |
| Encode16 | thrpt | 15 | 58.192 | 0.557 |
| VectorizedEncode16 | thrpt | 15 | 52.698 | 0.918 |
| Encode17 | thrpt | 15 | 16.265 | 0.168 |
| VectorizedEncode17 | thrpt | 15 | 45.757 | 2.126 |
| Encode18 | thrpt | 15 | 15.261 | 0.167 |
| VectorizedEncode18 | thrpt | 15 | 44.386 | 0.807 |
| Encode19 | thrpt | 15 | 12.531 | 0.138 |
| VectorizedEncode19 | thrpt | 15 | 45.403 | 0.854 |
| Encode20 | thrpt | 15 | 15.863 | 0.351 |
| VectorizedEncode20 | thrpt | 15 | 42.607 | 3.698 |
| Encode21 | thrpt | 15 | 15.772 | 0.154 |
| VectorizedEncode21 | thrpt | 15 | 45.122 | 0.777 |
| Encode22 | thrpt | 15 | 15.863 | 0.210 |
| VectorizedEncode22 | thrpt | 15 | 42.802 | 1.240 |
| Encode23 | thrpt | 15 | 15.638 | 0.095 |
| VectorizedEncode23 | thrpt | 15 | 44.411 | 0.536 |
| Encode24 | thrpt | 15 | 17.091 | 0.713 |
| VectorizedEncode24 | thrpt | 15 | 42.151 | 2.151 |
| Encode25 | thrpt | 15 | 15.206 | 0.163 |
| VectorizedEncode25 | thrpt | 15 | 43.440 | 2.078 |
| Encode26 | thrpt | 15 | 15.110 | 0.188 |
| VectorizedEncode26 | thrpt | 15 | 40.758 | 0.416 |
| Encode27 | thrpt | 15 | 14.794 | 0.192 |
| VectorizedEncode27 | thrpt | 15 | 43.261 | 0.494 |
| Encode28 | thrpt | 15 | 17.531 | 0.393 |
| VectorizedEncode28 | thrpt | 15 | 41.578 | 0.838 |
| Encode29 | thrpt | 15 | 14.423 | 0.173 |
| VectorizedEncode29 | thrpt | 15 | 36.044 | 10.191 |
| Encode30 | thrpt | 15 | 17.426 | 0.297 |
| VectorizedEncode30 | thrpt | 15 | 40.087 | 0.791 |
| Encode31 | thrpt | 15 | 18.489 | 0.180 |
| VectorizedEncode31 | thrpt | 15 | 42.166 | 0.625 |
| Encode32 | thrpt | 15 | 47.742 | 4.446 |
| VectorizedEncode32 | thrpt | 15 | 54.260 | 1.806 |
the code is straightforward, as shown below
```Java
public void encode(long[] values, int bitsPerValue, long[] output) {
int MASK = (int) ((1L << bitsPerValue) - 1);
int bitsRemaining = 64;
int upto = 0;
int totalCompressedLine = 2 * bitsPerValue;
int next = 0;
LongVector input = LongVector.zero(LANE4_SPECIES);
while (next < 128) {
if (bitsRemaining >= bitsPerValue) {
input = input.or(LongVector.fromArray(LANE4_SPECIES, values,
next).and(MASK)
.lanewise(VectorOperators.LSHL, bitsRemaining -
bitsPerValue));
bitsRemaining -= bitsPerValue;
} else {
LongVector valueVector = LongVector.fromArray(LANE4_SPECIES,
values, next).and(MASK);
input = input.or(valueVector.lanewise(VectorOperators.LSHR,
bitsPerValue - bitsRemaining));
input.intoArray(output, upto);
upto += numEncodeLength;
input = valueVector.lanewise(VectorOperators.LSHL, 64 -
bitsPerValue + bitsRemaining);
bitsRemaining -= bitsPerValue;
bitsRemaining += 64;
}
if (bitsRemaining == 0) {
input.intoArray(output, upto);
upto += numEncodeLength;
input = LongVector.zero(LANE4_SPECIES);
bitsRemaining = 64;
}
next += 4;
}
if (totalCompressedLine % 4 != 0) {
input.intoArray(output, upto);
output[totalCompressedLine -2] |= (output[totalCompressedLine ]
>>> 32);
output[totalCompressedLine - 1] |= (output[totalCompressedLine +
1] >>> 32);
}
}
public void decode(int bitsPerValue, long[] input, long[] output) {
long MASK = (int) ((1L << bitsPerValue) - 1);
int upto = 0;
int next = 0;
int totalCompressedLine = 2 * bitsPerValue;
int bitsRemaining = 64;
LongVector inputVector = LongVector.fromArray(LANE4_SPECIES, input,
next);
next += 4;
if (totalCompressedLine % 4 != 0) {
input[totalCompressedLine] = ((input[totalCompressedLine - 2] &
LOW) << 32);
input[totalCompressedLine + 1] = ((input[totalCompressedLine -
1] & LOW) << 32);
input[totalCompressedLine -2] &= HIGH;
input[totalCompressedLine - 1] &= HIGH;
}
while (upto < 128) {
if (bitsRemaining >= bitsPerValue) {
LongVector res = inputVector.and(MASK << (bitsRemaining -
bitsPerValue))
.lanewise(VectorOperators.LSHR, bitsRemaining -
bitsPerValue);
bitsRemaining -= bitsPerValue;
res.intoArray(output, upto);
upto += 4;
} else {
int bitDiff = bitsPerValue - bitsRemaining;
LongVector res = inputVector.and(MASK >> (bitsPerValue -
bitsRemaining))
.lanewise(VectorOperators.LSHL, bitDiff);
inputVector = LongVector.fromArray(LANE4_SPECIES, input,
next);
next += 4;
var temp = inputVector.and((MASK >> bitsRemaining) << (64 -
bitDiff) );
res = res.or(temp.lanewise(VectorOperators.LSHR, 64 -
bitDiff));
res.intoArray(output, upto);
upto += 4;
bitsRemaining -= bitsPerValue;
bitsRemaining += 64;
}
if (bitsRemaining == 0) {
inputVector = LongVector.fromArray(LANE4_SPECIES, input,
next);
next += 4;
bitsRemaining = 64;
}
}
}
```
I will proceed with testing the scalar version and decoding. Once everything
is prepared, I will submit a pull request this week
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]