[jira] [Commented] (LUCENE-9529) Larger stored fields block sizes mean we're more likely to disable optimized bulk merging

ASF subversion and git services (Jira) Thu, 17 Sep 2020 10:12:26 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197844#comment-17197844
 ]


ASF subversion and git services commented on LUCENE-9529:
---------------------------------------------------------

Commit 830bd186a8d72ce6cc96f2856c269ef02e98d3c5 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=830bd18 ]

LUCENE-9529: Track dirtiness of stored fields via a number of docs, not chunks. 
(#1882)

The problem of tracking dirtiness via numbers of chunks is that larger
chunks make stored fields readers more likely to be considered dirty, so
I'm trying to work around it by tracking numbers of docs instead.


> Larger stored fields block sizes mean we're more likely to disable optimized 
> bulk merging
> -----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9529
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Whenever possible when merging stored fields, Lucene tries to copy the 
> compressed data instead of decompressing the source segment to then 
> re-compressing in the destination segment. A problem with this approach is 
> that if some blocks are incomplete (typically the last block of a segment) 
> then it remains incomplete in the destination segment too, and if we do it 
> for too long we end up with a bad compression ratio. So Lucene keeps track of 
> these incomplete blocks, and makes sure to keep a ratio of incomplete blocks 
> below 1%.
> But as we increased the block size, it has become more likely to have a high 
> ratio of incomplete blocks. E.g. if you have a segment with 1MB of stored 
> fields, with 16kB blocks like before, you have 63 complete blocks and 1 
> incomplete block, or 1.6%. But now with ~512kB blocks, you have one complete 
> block and 1 incomplete block, ie. 50%.
> I'm not sure how to fix it or even whether it should be fixed but wanted to 
> open an issue to track this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9529) Larger stored fields block sizes mean we're more likely to disable optimized bulk merging

Reply via email to