[ https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459689#comment-17459689 ]
Feng Guo commented on LUCENE-10315: ----------------------------------- The optimization can only be triggered when {{count == BKDConfig#DEFAULT_MAX_POINTS_IN_LEAF_NODE}}, This is fragile because users can customize the {{maxPointsInLeaf}} in the Codec, leading the optimization meaningless. Here are some ways i can think of to address this: 1. Directly drop the support of customizing {{maxPointsInLeaf}}, like what we do in postings. 2. Generate a series of ForUtils, like {{ForUitil128}}, {{ForUitil256}}, {{ForUitil512}}, {{ForUtil1024}} ... and make some notes to hint users to choose values from them. > Speed up BKD leaf block ids codec by a 512 ints ForUtil > ------------------------------------------------------- > > Key: LUCENE-10315 > URL: https://issues.apache.org/jira/browse/LUCENE-10315 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Feng Guo > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I > benchmarked this optimization by mocking some random LongPoint and querying > them with PointInSetQuery. > *Benchmark Result* > |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff > percentage| > |100000000|32|1|51.44|148.26|188.22%| > |100000000|32|2|26.8|101.88|280.15%| > |100000000|32|4|14.04|53.52|281.20%| > |100000000|32|8|7.04|28.54|305.40%| > |100000000|32|16|3.54|14.61|312.71%| > |100000000|128|1|110.56|350.26|216.81%| > |100000000|128|8|16.6|89.81|441.02%| > |100000000|128|16|8.45|48.07|468.88%| > |100000000|128|32|4.2|25.35|503.57%| > |100000000|128|64|2.13|13.02|511.27%| > |100000000|1024|1|536.19|843.88|57.38%| > |100000000|1024|8|109.71|251.89|129.60%| > |100000000|1024|32|33.24|104.11|213.21%| > |100000000|1024|128|8.87|30.47|243.52%| > |100000000|1024|512|2.24|8.3|270.54%| > |100000000|8192|1|3333.33|5000|50.00%| > |100000000|8192|32|139.47|214.59|53.86%| > |100000000|8192|128|54.59|109.23|100.09%| > |100000000|8192|512|15.61|36.15|131.58%| > |100000000|8192|2048|4.11|11.14|171.05%| > |100000000|1048576|1|2597.4|3030.3|16.67%| > |100000000|1048576|32|314.96|371.75|18.03%| > |100000000|1048576|128|99.7|116.28|16.63%| > |100000000|1048576|512|30.5|37.15|21.80%| > |100000000|1048576|2048|10.38|12.3|18.50%| > |100000000|8388608|1|2564.1|3174.6|23.81%| > |100000000|8388608|32|196.27|238.95|21.75%| > |100000000|8388608|128|55.36|68.03|22.89%| > |100000000|8388608|512|15.58|19.24|23.49%| > |100000000|8388608|2048|4.56|5.71|25.22%| > The indices size is reduced for low cardinality fields and flat for high > cardinality fields. > {code:java} > 113M index_100000000_doc_32_cardinality_baseline > 114M index_100000000_doc_32_cardinality_candidate > 140M index_100000000_doc_128_cardinality_baseline > 133M index_100000000_doc_128_cardinality_candidate > 193M index_100000000_doc_1024_cardinality_baseline > 174M index_100000000_doc_1024_cardinality_candidate > 241M index_100000000_doc_8192_cardinality_baseline > 233M index_100000000_doc_8192_cardinality_candidate > 314M index_100000000_doc_1048576_cardinality_baseline > 315M index_100000000_doc_1048576_cardinality_candidate > 392M index_100000000_doc_8388608_cardinality_baseline > 391M index_100000000_doc_8388608_cardinality_candidate > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org