On Sun, Jul 30, 2023 at 11:17 AM Bruno Roustant <bruno.roust...@gmail.com> wrote:
Interesting coincidence, I'm currently working on a learned index on sorted > keys that can advantageously replace binary search. > It is very compact (additional space of 2% of the sorted key array, e.g. > 40KB for 200MB of keys), and it is between 2x to 3x faster than binary > search for the rank/indexOf methods. By design there are nearly no > branches: the index of a key is approximated by using hierarchical linear > segment computation. > The PGM-Index paper is there: https://pgm.di.unipi.it/ > And my implementation is here, just submitted in HPPC: > https://github.com/carrotsearch/hppc/pull/39 > Wow, this looks very relevant to Lucene! Could this index be used for faster implementation of our skip lists? Even though they are static (computed once at segment-write time) vs dynamic/online that these learned indices are also able to handle, it looks like learned indices are still better than simple binary search, and quite a bit more compact, for static cases. Our "best linear fit" approximation to compress monotonic longs (DirectMonoticWriter/Reader) looks like a simple example of these learned indices too. Mike McCandless http://blog.mikemccandless.com