[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430322#comment-13430322
 ] 

Sivan Yogev commented on LUCENE-4258:
-------------------------------------

Working on the details, it seems that we need to add a new layer of information 
for stacked segments. For each field that was added with REPLACE_FIELDS, we 
need to hold the documents in which a replace took place, with the number of 
the latest generation that had the replacement. Name this list the "generation 
vector". That way, TermDocs provided by StackedSegmentReader for a certain term 
is a special merge of that term's TermDocs for all stacked segments. The 
"special" part about it is that we ignore occurrences from documents in which 
the term's field was replaced in a later generation.

An example. Assume we have doc 1 with title "I love bananas" and doc 2 with 
title "I love oranges", and the segment is flushed. We will have the following 
base segment (ignoring positions):

bananas: doc 1
I: doc1, doc 2
love: doc 1, doc 2
oranges: doc2

Now we add to doc 1 additional title field "I hate apples", and replace the 
title of doc 2 with "I love lemons", and flush. We will have the following 
segment for generation 1:

apples: doc 1
hate: doc 1
I: doc 1, doc 2
lemons: doc 2
love: doc 2
generation vector for field "title": (doc 2, generation 1)

TermDocs for a few terms: 
* title:bananas : {1}, uses the TermDocs of the base segment and not affected 
by the field title generation vector.
* title:oranges : {}, uses the TermDocs of the base segment, doc 2 title 
affected for generations < 1, and the generation is 0.
* title:lemons : {2}, uses the TermDocs of generation 1. Doc 2 title affected 
for generations < 1, but the term appears in generation 1.
* title:love : {1,2}, uses the TermDocs of both segments. Doc 2 title affected 
for generations < 1, but the term appears in generation 1.

I propose to initially use PackedInts for the generation vector, since we know 
how many generations the curent segment has upon flushing. Later we might 
consider special treatment for sparse vectors.

                
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
>                 Key: LUCENE-4258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Sivan Yogev
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to