costin commented on pull request #453: URL: https://github.com/apache/lucene/pull/453#issuecomment-973999178
I have tighten the implementation a bit, removing an extra field adding some constants and following more the style of Packed64 with regards to the conditionals. In addition updated the benchmark to differentiate between consecutive get/set and spare (get/set) where different parts of memory are being read. This has a big impact on the VHLB performance (almost double) while the other implementations don't exhibit much difference, making them more consistent, for example: ``` Packed64Benchmark.packed64VarHandleLongByte_Consecutive 23 10240 thrpt 3 16121.142 ± 836.602 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 23 10240 thrpt 3 28436.567 ± 771.609 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 23 10240 thrpt 3 42751.522 ± 2367.785 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 23 10240 thrpt 3 40369.377 ± 1737.285 ops/s Packed64Benchmark.packed64_Consecutive 23 10240 thrpt 3 52004.882 ± 2006.942 ops/s Packed64Benchmark.packed64_Sparse 23 10240 thrpt 3 44671.486 ± 1567.467 ops/s ``` It might be that the sparse benchmark is not adequate enough (the operations happen from the outside in, which should give slight advantage towards the middle due to data locality). Below is the full benchmark: ``` Benchmark (bpv) (size) Mode Cnt Score Error Units Packed64Benchmark.packed64VarHandleLongByte_Consecutive 1 10240 thrpt 3 38301.741 ± 655.443 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 4 10240 thrpt 3 25817.622 ± 16839.449 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 5 10240 thrpt 3 21086.108 ± 1738.288 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 8 10240 thrpt 3 16114.364 ± 1042.073 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 11 10240 thrpt 3 16056.271 ± 2397.599 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 16 10240 thrpt 3 17125.401 ± 1574.413 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 23 10240 thrpt 3 16063.316 ± 453.476 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 25 10240 thrpt 3 16046.605 ± 315.952 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 31 10240 thrpt 3 16017.969 ± 894.789 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 32 10240 thrpt 3 19653.331 ± 1231.635 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 47 10240 thrpt 3 15992.127 ± 1909.132 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 59 10240 thrpt 3 17114.009 ± 6263.882 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 61 10240 thrpt 3 15822.847 ± 8200.733 ops/s Packed64Benchmark.packed64VarHandleLongByte_Consecutive 64 10240 thrpt 3 40686.026 ± 2245.477 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 1 10240 thrpt 3 49795.721 ± 1016.448 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 4 10240 thrpt 3 37455.051 ± 1059.990 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 5 10240 thrpt 3 34629.635 ± 730.416 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 8 10240 thrpt 3 28438.560 ± 364.251 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 11 10240 thrpt 3 28240.196 ± 752.703 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 16 10240 thrpt 3 29998.199 ± 1275.620 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 23 10240 thrpt 3 28481.596 ± 1537.821 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 25 10240 thrpt 3 28585.727 ± 948.670 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 31 10240 thrpt 3 28002.335 ± 1701.436 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 32 10240 thrpt 3 34116.362 ± 421.667 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 47 10240 thrpt 3 28258.341 ± 1065.642 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 59 10240 thrpt 3 29776.379 ± 469.763 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 61 10240 thrpt 3 28820.101 ± 3651.373 ops/s Packed64Benchmark.packed64VarHandleLongByte_Sparse 64 10240 thrpt 3 57477.947 ± 3698.974 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 1 10240 thrpt 3 32689.162 ± 1387.629 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 4 10240 thrpt 3 35393.931 ± 1447.491 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 5 10240 thrpt 3 40258.860 ± 2152.352 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 8 10240 thrpt 3 35385.111 ± 347.894 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 11 10240 thrpt 3 45596.088 ± 990.686 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 16 10240 thrpt 3 35012.112 ± 9437.142 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 23 10240 thrpt 3 47095.570 ± 905.039 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 25 10240 thrpt 3 31985.949 ± 590.707 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 31 10240 thrpt 3 42513.815 ± 1896.300 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 32 10240 thrpt 3 35779.268 ± 956.749 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 47 10240 thrpt 3 30137.376 ± 516.086 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 59 10240 thrpt 3 25869.023 ± 1372.035 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 61 10240 thrpt 3 24951.079 ± 345.293 ops/s Packed64Benchmark.packed64VarHandleLongLong_Consecutive 64 10240 thrpt 3 35496.562 ± 2744.344 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 1 10240 thrpt 3 53103.799 ± 564.971 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 4 10240 thrpt 3 53134.834 ± 763.798 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 5 10240 thrpt 3 42840.502 ± 547.945 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 8 10240 thrpt 3 53661.284 ± 2760.469 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 11 10240 thrpt 3 42976.618 ± 4369.925 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 16 10240 thrpt 3 53319.723 ± 8308.326 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 23 10240 thrpt 3 40894.483 ± 772.369 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 25 10240 thrpt 3 40646.463 ± 1482.981 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 31 10240 thrpt 3 40995.711 ± 1436.172 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 32 10240 thrpt 3 53884.020 ± 2006.964 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 47 10240 thrpt 3 29708.915 ± 657.222 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 59 10240 thrpt 3 35063.658 ± 1556.863 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 61 10240 thrpt 3 28871.592 ± 366.623 ops/s Packed64Benchmark.packed64VarHandleLongLong_Sparse 64 10240 thrpt 3 53333.751 ± 15511.237 ops/s Packed64Benchmark.packed64_Consecutive 1 10240 thrpt 3 36595.316 ± 1213.323 ops/s Packed64Benchmark.packed64_Consecutive 4 10240 thrpt 3 39777.028 ± 953.027 ops/s Packed64Benchmark.packed64_Consecutive 5 10240 thrpt 3 46119.465 ± 211.767 ops/s Packed64Benchmark.packed64_Consecutive 8 10240 thrpt 3 39886.892 ± 1574.036 ops/s Packed64Benchmark.packed64_Consecutive 11 10240 thrpt 3 53222.462 ± 1847.024 ops/s Packed64Benchmark.packed64_Consecutive 16 10240 thrpt 3 39897.330 ± 1012.499 ops/s Packed64Benchmark.packed64_Consecutive 23 10240 thrpt 3 50924.631 ± 2607.771 ops/s Packed64Benchmark.packed64_Consecutive 25 10240 thrpt 3 51179.396 ± 4118.732 ops/s Packed64Benchmark.packed64_Consecutive 31 10240 thrpt 3 49350.911 ± 2652.142 ops/s Packed64Benchmark.packed64_Consecutive 32 10240 thrpt 3 40245.046 ± 1506.183 ops/s Packed64Benchmark.packed64_Consecutive 47 10240 thrpt 3 43794.173 ± 4789.862 ops/s Packed64Benchmark.packed64_Consecutive 59 10240 thrpt 3 41248.048 ± 2533.885 ops/s Packed64Benchmark.packed64_Consecutive 61 10240 thrpt 3 42538.675 ± 3462.201 ops/s Packed64Benchmark.packed64_Consecutive 64 10240 thrpt 3 40091.319 ± 962.752 ops/s Packed64Benchmark.packed64_Sparse 1 10240 thrpt 3 60621.693 ± 6289.038 ops/s Packed64Benchmark.packed64_Sparse 4 10240 thrpt 3 63121.896 ± 5265.890 ops/s Packed64Benchmark.packed64_Sparse 5 10240 thrpt 3 49445.705 ± 1348.639 ops/s Packed64Benchmark.packed64_Sparse 8 10240 thrpt 3 63533.078 ± 5166.244 ops/s Packed64Benchmark.packed64_Sparse 11 10240 thrpt 3 47624.192 ± 5238.701 ops/s Packed64Benchmark.packed64_Sparse 16 10240 thrpt 3 64148.964 ± 820.543 ops/s Packed64Benchmark.packed64_Sparse 23 10240 thrpt 3 44013.707 ± 3850.643 ops/s Packed64Benchmark.packed64_Sparse 25 10240 thrpt 3 44837.906 ± 2231.909 ops/s Packed64Benchmark.packed64_Sparse 31 10240 thrpt 3 44638.184 ± 1887.982 ops/s Packed64Benchmark.packed64_Sparse 32 10240 thrpt 3 63831.488 ± 1325.146 ops/s Packed64Benchmark.packed64_Sparse 47 10240 thrpt 3 41508.192 ± 3339.770 ops/s Packed64Benchmark.packed64_Sparse 59 10240 thrpt 3 40849.655 ± 2445.001 ops/s Packed64Benchmark.packed64_Sparse 61 10240 thrpt 3 36888.602 ± 1342.537 ops/s Packed64Benchmark.packed64_Sparse 64 10240 thrpt 3 64386.395 ± 1694.784 ops/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org