[ https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793224#comment-13793224 ]
Shai Erera commented on LUCENE-5248: ------------------------------------ bq. I added a unit test which reproduces and the fix. Will commit on LUCENE-5189. Sorry, it's a bug introduced in this patch so I'll fix here. > Improve the data structure used in ReaderAndLiveDocs to hold the updates > ------------------------------------------------------------------------ > > Key: LUCENE-5248 > URL: https://issues.apache.org/jira/browse/LUCENE-5248 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Shai Erera > Assignee: Shai Erera > Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, > LUCENE-5248.patch > > > Currently ReaderAndLiveDocs holds the updates in two structures: > +Map<String,Map<Integer,Long>>+ > Holds a mapping from each field, to all docs that were updated and their > values. This structure is updated when applyDeletes is called, and needs to > satisfy several requirements: > # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, > in that order, and termA affects doc=100 and termB doc=2, then the updates > are applied in that order, meaning we cannot rely on updates coming in order. > # Same document may be updated multiple times, either by same term (e.g. > several calls to IW.updateNDV) or by different terms. Last update wins. > # Sequential read: when writing the updates to the Directory > (fieldsConsumer), we iterate on the docs in-order and for each one check if > it's updated and if not, pull its value from the current DV. > # A single update may affect several million documents, therefore need to be > efficient w.r.t. memory consumption. > +Map<Integer,Map<String,Long>>+ > Holds a mapping from a document, to all the fields that it was updated in and > the updated value for each field. This is used by IW.commitMergedDeletes to > apply the updates that came in while the segment was merging. The > requirements this structure needs to satisfy are: > # Access in doc order: this is how commitMergedDeletes works. > # One-pass: we visit a document once (currently) and so if we can, it's > better if we know all the fields in which it was updated. The updates are > applied to the merged ReaderAndLiveDocs (where they are stored in the first > structure mentioned above). > Comments with proposals will follow next. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org