[ https://issues.apache.org/jira/browse/LUCENE-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401203#comment-17401203 ]
Greg Miller commented on LUCENE-10033: -------------------------------------- I also tried an iteration that lazily applies the GCD and offset when fetching values. There's no intelligence where it might sometimes do it in bulk and sometimes lazily apply it. I also only do it if there's no delta compression, since GCD/offset needs to be applied to all values before the value being accessed if delta decompression is needed, so to keep things simple, I just apply GCD/offset/delta in bulk if there's delta. I pushed this iteration to a remote branch [here|https://github.com/gsmiller/lucene/tree/dv_blocks_working] if you're interested (note that the change is pretty quick-and-dirty and {{TestDocValuesEncoder}} is failing with this tweak since I didn't update the tests). Here are results on our same internal benchmark. Not much of an improvement, but slightly better. # red-line qpg regressed by ~10% # latency overall increased on average by 15.8% (12% p50 / 13% p99.9) # our facet counting phase increased in latency on average by 29% (13% p50 / 36% p99.9) I'm also wondering if delta decompression could be done a bit more efficiently by keeping two values packed per long and getting some slight parallelization in delta decoding like {{PForDelta}} does. Might look into this a little later. > Encode doc values in smaller blocks of values, like postings > ------------------------------------------------------------ > > Key: LUCENE-10033 > URL: https://issues.apache.org/jira/browse/LUCENE-10033 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > This is a follow-up to the discussion on this thread: > https://lists.apache.org/thread.html/r7b757074d5f02874ce3a295b0007dff486bc10d08fb0b5e5a4ba72c5%40%3Cdev.lucene.apache.org%3E. > Our current approach for doc values uses large blocks of 16k values where > values can be decompressed independently, using DirectWriter/DirectReader. > This is a bit inefficient in some cases, e.g. a single outlier can grow the > number of bits per value for the entire block, we can't easily use run-length > compression, etc. Plus, it encourages using a different sub-class for every > compression technique, which puts pressure on the JVM. > We'd like to move to an approach that would be more similar to postings with > smaller blocks (e.g. 128 values) whose values get all decompressed at once > (using SIMD instructions), with skip data within blocks in order to > efficiently skip to arbitrary doc IDs (or maybe still use jump tables as > today's doc values, and as discussed here for postings: > https://lists.apache.org/thread.html/r7c3cb7ab143fd4ecbc05c04064d10ef9fb50c5b4d6479b0f35732677%40%3Cdev.lucene.apache.org%3E). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org