[ 
https://issues.apache.org/jira/browse/LUCENE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197465#comment-17197465
 ] 

Adrien Grand commented on LUCENE-9529:
--------------------------------------

I played with this idea. On a synthetic benchmark that indexes 1M docs with 
only one stored field per doc of about 2kB and maxBufferedDocs=1000, indexing 
time went from 13.5 secs to 12 secs.

> Larger stored fields block sizes mean we're more likely to disable optimized 
> bulk merging
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9529
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Whenever possible when merging stored fields, Lucene tries to copy the 
> compressed data instead of decompressing the source segment to then 
> re-compressing in the destination segment. A problem with this approach is 
> that if some blocks are incomplete (typically the last block of a segment) 
> then it remains incomplete in the destination segment too, and if we do it 
> for too long we end up with a bad compression ratio. So Lucene keeps track of 
> these incomplete blocks, and makes sure to keep a ratio of incomplete blocks 
> below 1%.
> But as we increased the block size, it has become more likely to have a high 
> ratio of incomplete blocks. E.g. if you have a segment with 1MB of stored 
> fields, with 16kB blocks like before, you have 63 complete blocks and 1 
> incomplete block, or 1.6%. But now with ~512kB blocks, you have one complete 
> block and 1 incomplete block, ie. 50%.
> I'm not sure how to fix it or even whether it should be fixed but wanted to 
> open an issue to track this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to