[ https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382404#comment-15382404 ]
Adrien Grand commented on LUCENE-7371: -------------------------------------- The benchmarks are reporting interesting changes, some seem to perform slightly faster now, like IntNRQ (http://people.apache.org/~mikemccand/lucenebench/IntNRQ.html) or the geo3d distance filter (http://people.apache.org/~mikemccand/geobench.html#search-distance) but some others seem to perform a bit slower like the 10-gon filter (http://people.apache.org/~mikemccand/geobench.html#search-poly_10) or the 10 nearest points (http://people.apache.org/~mikemccand/geobench.html#search-nearest_10). The fact that it is not consistently slower or faster is due to the distribution of points in the blocks that need to be read I think (the more unique leading bytes, the more expensive the read). Given that the slow down is not general to all benchmarks and that the size reduction is significant I don't think this should be reverted, but let me know if you think otherwise. (For the record many benchmarks look slower on July 17th but I don't think this is related to this change, for instance even phrases got slower http://people.apache.org/~mikemccand/lucenebench/Phrase.html) > BKDReader could compress values better > -------------------------------------- > > Key: LUCENE-7371 > URL: https://issues.apache.org/jira/browse/LUCENE-7371 > Project: Lucene - Core > Issue Type: Bug > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch > > > For compressing values, BKDReader only relies on shared prefixes in a block. > We could probably easily do better. For instance there are only 256 possible > values for the first byte of the dimension that the values are sorted by, yet > we use a block size of 1024. So by using something simple like run-length > compression we could save 6 bits per value on average. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org