[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
-------------------------------

    Attachment: LUCENE-5189-updates-order.patch

While debugging LUCENE-5248, I've hit a bug when same terms update same doc 
multiple times. E.g. if the updates are sent key'd by the following term 
sequences: t1, t2, t1 -- the updated value was that of 't2' and not 't1'. This 
is caused because LinkedHashMap traverses in insertion-order and when we 
encounter the second reference of 't1', we should remove and re-add it to the 
map. But the fix isn't that simple because BufDeletes currently holds a 
Map<Term,Map<String,NumericUpdate>> (for each Term, all the fields that it 
affects). We cannot remove and re-add a Term entry from the outer map, because 
we will move all the affected fields to the end of the iteration, which is 
wrong.

I changed the map to be Map<String,LinkedHashMap<Term,NumericUpdate>> 
(per-field, all terms that update it, ordered) and wrote a simple testcase 
which reproduces this. TestNumericDVUpdates passed for few hundred iterations.

> Numeric DocValues Updates
> -------------------------
>
>                 Key: LUCENE-5189
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5189
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to