[
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430322#comment-13430322
]
Sivan Yogev commented on LUCENE-4258:
-------------------------------------
Working on the details, it seems that we need to add a new layer of information
for stacked segments. For each field that was added with REPLACE_FIELDS, we
need to hold the documents in which a replace took place, with the number of
the latest generation that had the replacement. Name this list the "generation
vector". That way, TermDocs provided by StackedSegmentReader for a certain term
is a special merge of that term's TermDocs for all stacked segments. The
"special" part about it is that we ignore occurrences from documents in which
the term's field was replaced in a later generation.
An example. Assume we have doc 1 with title "I love bananas" and doc 2 with
title "I love oranges", and the segment is flushed. We will have the following
base segment (ignoring positions):
bananas: doc 1
I: doc1, doc 2
love: doc 1, doc 2
oranges: doc2
Now we add to doc 1 additional title field "I hate apples", and replace the
title of doc 2 with "I love lemons", and flush. We will have the following
segment for generation 1:
apples: doc 1
hate: doc 1
I: doc 1, doc 2
lemons: doc 2
love: doc 2
generation vector for field "title": (doc 2, generation 1)
TermDocs for a few terms:
* title:bananas : {1}, uses the TermDocs of the base segment and not affected
by the field title generation vector.
* title:oranges : {}, uses the TermDocs of the base segment, doc 2 title
affected for generations < 1, and the generation is 0.
* title:lemons : {2}, uses the TermDocs of generation 1. Doc 2 title affected
for generations < 1, but the term appears in generation 1.
* title:love : {1,2}, uses the TermDocs of both segments. Doc 2 title affected
for generations < 1, but the term appears in generation 1.
I propose to initially use PackedInts for the generation vector, since we know
how many generations the curent segment has upon flushing. Later we might
consider special treatment for sparse vectors.
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
> Key: LUCENE-4258
> URL: https://issues.apache.org/jira/browse/LUCENE-4258
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Sivan Yogev
> Original Estimate: 2,520h
> Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]