[ https://issues.apache.org/jira/browse/LUCENE-10556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536285#comment-17536285 ]
Michael McCandless commented on LUCENE-10556: --------------------------------------------- {quote}I'm not sure if we should change the MP in the benchmark though, since so many users do use TieredMP (the default). {quote} Or, maybe we need to improve TMP's defaults!!! If the floor segment MB size is causing too much O(N^2) behavior we should fix that default ... > Relax the maximum dirtiness for stored fields and term vectors? > --------------------------------------------------------------- > > Key: LUCENE-10556 > URL: https://issues.apache.org/jira/browse/LUCENE-10556 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > > Stored fields and term vectors compress data and have merge-time > optimizations to copy compressed data directly instead of decompressing and > recompressing over and over again. However, sometimes incomplete blocks get > carried over (typically the last block of a flushed segment) and so these > file formats keep track of how "dirty" their current blocks are to know > whether stored fields / term vectors for a segment should be re-compressed. > Currently the logic is to recompress if more than 1% of the blocks are > incomplete, or if the total number of missing documents across incomplete > blocks is more than the configured maximum number of documents per block. > I'd be interested in evaluating what the compression ratio would be if we > relaxed these conditions a bit, e.g. by allowing up to 5% dirtiness. My gut > feeling is that the compression ratio could be barely worse while index-time > CPU usage could be significantly improved. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org