[jira] [Comment Edited] (LUCENE-4560) Support Filtering Segments During Merge

2012-11-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500073#comment-13500073
 ] 

Uwe Schindler edited comment on LUCENE-4560 at 11/19/12 7:43 AM:
-

We had something similar in the past (called PayloadProcessor), which was 
removed completely in 4.0 (without replacement). The reason was, that the 
stuff can be implemented inside a FilterAtomicReader and used with 
IW#addIndexes(IndexReader...). I agree with Shai, that this should be enough 
for most cases, especially as gradually merging segments can corrumpt your 
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the wrapping should be 
done in the MP and not on IndexWriter level (the number of settings in IWConfig 
is already too big). So the main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like 
UpgradeIndexMergePolicy, that wraps the AtomicReaders when creating the 
MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- delete the old segments manually (e.g. by deleting all documents)
- addIndexes the filtered segments
- start final maybeMerge()
- commit

Uwe

  was (Author: thetaphi):
We had something similar in the past (called PayloadProcessor), which was 
removed completely in 4.0 (without replacement). The reason was, that the 
stuff can be implemented inside a FilterAtomicReader and used with 
IW#addIndexes(IndexReader...). I agree with Shai, that this should be enough 
for most cases, especially as gradually merging segments can corrumpt your 
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the wrapping should be 
done in the MP and not on IndexWriter level (the number of settings in IWConfig 
is already too big). So the main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like 
UpgradeIndexMergePolicy, that wraps the AtomicReaders when creating the 
MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- addIndexes the filtered segments
- delete the old segments manually (e.g. by deleting all documents)
- start final maybeMerge()
- commit

Uwe
  
 Support Filtering Segments During Merge
 ---

 Key: LUCENE-4560
 URL: https://issues.apache.org/jira/browse/LUCENE-4560
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Tim Smith
 Attachments: LUCENE-4560.patch


 Spun off from LUCENE-4557
 It is desirable to be able to filter segments during merge.
 Most often, full reindex of content is not possible.
 Merging segments can sometimes have negative consequences when fields are 
 have different options (most restrictive option is forced during merge)
 Being able to filter segments during merges will allow gradually migrating 
 indexed data to new index settings, support pruning/enhancing existing data 
 gradually
 Use Cases:
 * Migrate IndexOptions for fields (See LUCENE-4557)
 * Gradually Remove index fields no longer used
 * Migrate indexed sort fields to DocValues
 * Support converting data types for indexed data
 * and so on
 patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4560) Support Filtering Segments During Merge

2012-11-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500073#comment-13500073
 ] 

Uwe Schindler edited comment on LUCENE-4560 at 11/19/12 7:48 AM:
-

We had something similar in the past (called PayloadProcessor), which was 
removed completely in 4.0 (without replacement). The reason was, that the 
stuff can be implemented inside a FilterAtomicReader and used with 
IW#addIndexes(IndexReader...). I agree with Shai, that this should be enough 
for most cases, especially as gradually merging segments can corrumpt your 
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the wrapping should be 
done in the MP and not on IndexWriter level (the number of settings in IWConfig 
is already too big). So the main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like 
UpgradeIndexMergePolicy, that wraps the AtomicReaders when creating the 
MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader, or leave as-is, 
if no upgrade is needed.
- delete the old segments manually (e.g. by deleting all documents)
- addIndexes the filtered segments (optionally one-by-one, so it will not merge 
all atomic readers into one new segment)
- commit

Uwe

  was (Author: thetaphi):
We had something similar in the past (called PayloadProcessor), which was 
removed completely in 4.0 (without replacement). The reason was, that the 
stuff can be implemented inside a FilterAtomicReader and used with 
IW#addIndexes(IndexReader...). I agree with Shai, that this should be enough 
for most cases, especially as gradually merging segments can corrumpt your 
index if you have an error.

If you really want to merge in-place:
Your patch has nice ideas from my perspective, only the wrapping should be 
done in the MP and not on IndexWriter level (the number of settings in IWConfig 
is already too big). So the main thing that needs to be done here is:
- Move the AtomicReader instances into MergePolicy.OneMerge
- As a result, you need to implement a custom wrapper-MergePolicy like 
UpgradeIndexMergePolicy, that wraps the AtomicReaders when creating the 
MergePolicy.OneMerge instances.

Another possible approach *without modification in Lucene core* is:
- open IndexWriter
- get NRT Reader and wrap with one or more FilterAtomicReader
- delete the old segments manually (e.g. by deleting all documents)
- addIndexes the filtered segments
- start final maybeMerge()
- commit

Uwe
  
 Support Filtering Segments During Merge
 ---

 Key: LUCENE-4560
 URL: https://issues.apache.org/jira/browse/LUCENE-4560
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Tim Smith
 Attachments: LUCENE-4560.patch


 Spun off from LUCENE-4557
 It is desirable to be able to filter segments during merge.
 Most often, full reindex of content is not possible.
 Merging segments can sometimes have negative consequences when fields are 
 have different options (most restrictive option is forced during merge)
 Being able to filter segments during merges will allow gradually migrating 
 indexed data to new index settings, support pruning/enhancing existing data 
 gradually
 Use Cases:
 * Migrate IndexOptions for fields (See LUCENE-4557)
 * Gradually Remove index fields no longer used
 * Migrate indexed sort fields to DocValues
 * Support converting data types for indexed data
 * and so on
 patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4560) Support Filtering Segments During Merge

2012-11-15 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13498087#comment-13498087
 ] 

Tim Smith edited comment on LUCENE-4560 at 11/15/12 4:03 PM:
-

Attaching patch

patch adds MergedSegmentFilter base class and adds config setter akin to 
IndexReaderWarmer on IndexWriterConfig (by all means, suggest better names)

SegmentMerger will use this (if specified) to filter any segments being merged

Test case included that uses filter to remove an indexed field during merge.





  was (Author: tsmith):
Attaching patch

patch adds MergeSegmentFilter base class and adds config setter akin to 
IndexReaderWarmer on IndexWriterConfig (by all means, suggest better names)

SegmentMerger will use this (if specified) to filter any segments being merged

Test case included that uses filter to remove an indexed field during merge.




  
 Support Filtering Segments During Merge
 ---

 Key: LUCENE-4560
 URL: https://issues.apache.org/jira/browse/LUCENE-4560
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Tim Smith
 Attachments: LUCENE-4560.patch


 Spun off from LUCENE-4557
 It is desirable to be able to filter segments during merge.
 Most often, full reindex of content is not possible.
 Merging segments can sometimes have negative consequences when fields are 
 have different options (most restrictive option is forced during merge)
 Being able to filter segments during merges will allow gradually migrating 
 indexed data to new index settings, support pruning/enhancing existing data 
 gradually
 Use Cases:
 * Migrate IndexOptions for fields (See LUCENE-4557)
 * Gradually Remove index fields no longer used
 * Migrate indexed sort fields to DocValues
 * Support converting data types for indexed data
 * and so on
 patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org