Shai Erera created LUCENE-5248:
----------------------------------

             Summary: Improve the data structure used in ReaderAndLiveDocs to 
hold the updates
                 Key: LUCENE-5248
                 URL: https://issues.apache.org/jira/browse/LUCENE-5248
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
            Reporter: Shai Erera
            Assignee: Shai Erera


Currently ReaderAndLiveDocs holds the updates in two structures:

+Map<String,Map<Integer,Long>>+
Holds a mapping from each field, to all docs that were updated and their 
values. This structure is updated when applyDeletes is called, and needs to 
satisfy several requirements:

# Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
in that order, and termA affects doc=100 and termB doc=2, then the updates are 
applied in that order, meaning we cannot rely on updates coming in order.
# Same document may be updated multiple times, either by same term (e.g. 
several calls to IW.updateNDV) or by different terms. Last update wins.
# Sequential read: when writing the updates to the Directory (fieldsConsumer), 
we iterate on the docs in-order and for each one check if it's updated and if 
not, pull its value from the current DV.
# A single update may affect several million documents, therefore need to be 
efficient w.r.t. memory consumption.

+Map<Integer,Map<String,Long>>+
Holds a mapping from a document, to all the fields that it was updated in and 
the updated value for each field. This is used by IW.commitMergedDeletes to 
apply the updates that came in while the segment was merging. The requirements 
this structure needs to satisfy are:

# Access in doc order: this is how commitMergedDeletes works.
# One-pass: we visit a document once (currently) and so if we can, it's better 
if we know all the fields in which it was updated. The updates are applied to 
the merged ReaderAndLiveDocs (where they are stored in the first structure 
mentioned above).

Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to