[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430322#comment-13430322 ]
Sivan Yogev commented on LUCENE-4258: ------------------------------------- Working on the details, it seems that we need to add a new layer of information for stacked segments. For each field that was added with REPLACE_FIELDS, we need to hold the documents in which a replace took place, with the number of the latest generation that had the replacement. Name this list the "generation vector". That way, TermDocs provided by StackedSegmentReader for a certain term is a special merge of that term's TermDocs for all stacked segments. The "special" part about it is that we ignore occurrences from documents in which the term's field was replaced in a later generation. An example. Assume we have doc 1 with title "I love bananas" and doc 2 with title "I love oranges", and the segment is flushed. We will have the following base segment (ignoring positions): bananas: doc 1 I: doc1, doc 2 love: doc 1, doc 2 oranges: doc2 Now we add to doc 1 additional title field "I hate apples", and replace the title of doc 2 with "I love lemons", and flush. We will have the following segment for generation 1: apples: doc 1 hate: doc 1 I: doc 1, doc 2 lemons: doc 2 love: doc 2 generation vector for field "title": (doc 2, generation 1) TermDocs for a few terms: * title:bananas : {1}, uses the TermDocs of the base segment and not affected by the field title generation vector. * title:oranges : {}, uses the TermDocs of the base segment, doc 2 title affected for generations < 1, and the generation is 0. * title:lemons : {2}, uses the TermDocs of generation 1. Doc 2 title affected for generations < 1, but the term appears in generation 1. * title:love : {1,2}, uses the TermDocs of both segments. Doc 2 title affected for generations < 1, but the term appears in generation 1. I propose to initially use PackedInts for the generation vector, since we know how many generations the curent segment has upon flushing. Later we might consider special treatment for sparse vectors. > Incremental Field Updates through Stacked Segments > -------------------------------------------------- > > Key: LUCENE-4258 > URL: https://issues.apache.org/jira/browse/LUCENE-4258 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Sivan Yogev > Original Estimate: 2,520h > Remaining Estimate: 2,520h > > Shai and I would like to start working on the proposal to Incremental Field > Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org