[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

Simon Willnauer (Jira) Mon, 22 Jun 2020 01:21:25 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141811#comment-17141811
 ]


Simon Willnauer commented on LUCENE-8962:
-----------------------------------------

{quote}
I'm surprised to see that 
IndexWriterConfig.DEFAULT_MAX_COMMIT_MERGE_WAIT_SECONDS is 0. After all, the 
default is not to merge on commit so if someone configured a merge policy to do 
this, shouldn't we wait? I think by default we should wait indefinitely. It's 
up to the developer/configure-er of the merge policy to find cheap merges (if 
any). Keeping it at 0 creates a gotcha for yet another setting that's required 
to merge on commit.
{quote}

I don't think we should. It's such an expert setting and unless we implement it 
ourself I think we should keep it that way. I'd assume if we make it non-zero 
by default that our MPs implement it and I can use it right away. If we decide 
to do this in the future we can keep the default and implement that method by 
default without enabling the feature. If we set it to non-zero that option goes 
away.

> Can we merge small segments during refresh, for faster searching?
> -----------------------------------------------------------------
>
>                 Key: LUCENE-8962
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8962
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Priority: Major
>             Fix For: 8.6
>
>         Attachments: LUCENE-8962_demo.png, failed-tests.patch
>
>          Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

Reply via email to