[jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge

Shai Erera (JIRA) Sun, 18 Nov 2012 11:02:59 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499876#comment-13499876
 ]


Shai Erera commented on LUCENE-4560:
------------------------------------

Thinking about this some more, I really don't thing it's a 'gradual' thing that 
you do to the index:

* Depending on the state of the index, this migration may not happen at all to 
some segments, typically very large segments and are not picked for merge 
anymore. So what will happen is that you'll have code in your app that will 
never be invoked after some time ... not a good sign to me.

* I won't want to have code in my app that lives there forever. Rather, I'd 
like to make a decision to remove field 'foo', run the process which removes it 
once, and be done with it, moving the code to some "tools" area that is never 
run again.
** With your approach, RemoveFieldReader will not go away, unless you can 
guarantee it ran on all segments, which is like forcing forceMerge(1) to run 
(note, it may not do what you want, per MP settings !), which is really like 
addIndexes
** Worse, today it's RemoveFieldReader, and tomorrow it will turn into 
RemoveFieldAndMigrateIndexOptionsReader, because as I wrote above, you cannot 
stop running that code if you cannot ensure that all segments have been 
migrated.

So I'm beginning to think that this process should not be an 
incremental/gradual/online thing, but rather an addIndexes type of process, 
that you run once, and know that you're done with it, until the next time where 
you need to rewrite the index, w/o actually re-indexing the content.

BTW, did you take a look at LUCENE-2632? It is about adding a FilteringCodec 
which filters the data that it writes/reads. Could it help you here? If so, I 
think that it has better chances to get committed, than the approach in this 
issue (Codecs are already an extension point...).
                
> Support Filtering Segments During Merge
> ---------------------------------------
>
>                 Key: LUCENE-4560
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4560
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Smith
>         Attachments: LUCENE-4560.patch
>
>
> Spun off from LUCENE-4557
> It is desirable to be able to filter segments during merge.
> Most often, full reindex of content is not possible.
> Merging segments can sometimes have negative consequences when fields are 
> have different options (most restrictive option is forced during merge)
> Being able to filter segments during merges will allow gradually migrating 
> indexed data to new index settings, support pruning/enhancing existing data 
> gradually
> Use Cases:
> * Migrate IndexOptions for fields (See LUCENE-4557)
> * Gradually Remove index fields no longer used
> * Migrate indexed sort fields to DocValues
> * Support converting data types for indexed data
> * and so on
> patch will be forthcoming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4560) Support Filtering Segments During Merge

Reply via email to