[
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-5189:
-------------------------------
Attachment: LUCENE-5189-updates-order.patch
While debugging LUCENE-5248, I've hit a bug when same terms update same doc
multiple times. E.g. if the updates are sent key'd by the following term
sequences: t1, t2, t1 -- the updated value was that of 't2' and not 't1'. This
is caused because LinkedHashMap traverses in insertion-order and when we
encounter the second reference of 't1', we should remove and re-add it to the
map. But the fix isn't that simple because BufDeletes currently holds a
Map<Term,Map<String,NumericUpdate>> (for each Term, all the fields that it
affects). We cannot remove and re-add a Term entry from the outer map, because
we will move all the affected fields to the end of the iteration, which is
wrong.
I changed the map to be Map<String,LinkedHashMap<Term,NumericUpdate>>
(per-field, all terms that update it, ordered) and wrote a simple testcase
which reproduces this. TestNumericDVUpdates passed for few hundred iterations.
> Numeric DocValues Updates
> -------------------------
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch,
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch,
> LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the
> amount of changes are immense and hard to follow/consume. The reason is that
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the
> values of all the documents in a segment for the updated field (similar to
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to
> update, yet requires many changes to core code which will also be useful for
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the
> changes.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]