[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028222#comment-17028222
 ] 

Michael McCandless commented on LUCENE-8962:
--------------------------------------------

{quote}Yes, it's not "realistic", but my objective in this code snippet was 
merely to demonstrate that the _combination_ of a merge policy and a merge 
scheduler have the ability to affect the searchable segments on commit when 
using the NRT Reader/Searcher.
{quote}
OK indeed you are right – that particular combination will in fact make the 
"merged after commit" segments visible to the subsequent searchers.
{quote}Apparently it doesn't work if a normal (non-NRT Reader/Searcher) is 
opened; I can see that.  Maybe this is a shortcoming of IndexWriter; why 
shouldn't IW be consistent on this matter?
{quote}
I don't think this is a shortcoming of IW.  Rather, this is the salient 
difference between non-NRT and NRT readers – the latter get to see the "latest" 
in-memory segments changes in {{IndexWriter}} while the former see only 
precisely what was last committed.
{quote}It's not apparent to me that we need a new method on the MergePolicy 
when the MergeTrigger parameter is able to differentiate the types of merges so 
that a MP is able to behave differently depending on the circumstance.  Am I 
unclear on this?
{quote}
Yeah I think you are right!  That would be a nice simplification.  Probably 
this can just be folded into the existing {{MergePolicy}} API as a different 
{{MergeTrigger}}.  Though then I wonder why e.g. {{forceMerge}} or 
{{expungeDeletes}} are not also simply different triggers ... [~msfroh] what do 
you think?

> Can we merge small segments during refresh, for faster searching?
> -----------------------------------------------------------------
>
>                 Key: LUCENE-8962
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8962
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Priority: Major
>         Attachments: LUCENE-8962_demo.png
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to