[ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500073#comment-13500073
 ] 

Uwe Schindler edited comment on LUCENE-4560 at 11/19/12 7:43 AM:
-----------------------------------------------------------------

We had something similar in the past (called PayloadProcessor), which was 
removed completely in 4.0 (without "replacement"). The reason was, that the 
stuff can be implemented inside a FilterAtomicReader and used with 
IW#addIndexes(IndexReader...). I agree with Shai, that this should be enough 
for most cases, especially as gradually merging segments can corrumpt your 
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the "wrapping" should be 
done in the MP and not on IndexWriter level (the number of settings in IWConfig 
is already too big). So the main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like 
UpgradeIndexMergePolicy, that wraps the AtomicReaders when creating the 
MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- delete the old segments manually (e.g. by deleting all documents)
- addIndexes the filtered segments
- start final maybeMerge()
- commit

Uwe
                
      was (Author: thetaphi):
    We had something similar in the past (called PayloadProcessor), which was 
removed completely in 4.0 (without "replacement"). The reason was, that the 
stuff can be implemented inside a FilterAtomicReader and used with 
IW#addIndexes(IndexReader...). I agree with Shai, that this should be enough 
for most cases, especially as gradually merging segments can corrumpt your 
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the "wrapping" should be 
done in the MP and not on IndexWriter level (the number of settings in IWConfig 
is already too big). So the main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like 
UpgradeIndexMergePolicy, that wraps the AtomicReaders when creating the 
MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- addIndexes the filtered segments
- delete the old segments manually (e.g. by deleting all documents)
- start final maybeMerge()
- commit

Uwe
                  
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are 
> have different options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating 
> indexed data to new index settings, support pruning/enhancing existing data 
> gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to