[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

Shai Erera (JIRA) Wed, 20 Nov 2013 22:17:34 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828524#comment-13828524
 ]


Shai Erera commented on LUCENE-5189:
------------------------------------

You're right Simon. The updates are buffered in their raw form in memory until 
a flush is needed (e.g. commit(), or NRT-open). At that point they are resolved 
and written to the Directory. This is where it differs from deletes - while 
deletes are small enough to keep the resolved form in-memory, updates aren't - 
a single update can affect millions of documents, each takes a long (updated 
value) ... perhaps future work could be to distinguish between small and large 
updates, and keep the small updates still in memory. But I believe that will 
affect a lot more code, e.g. SegReader will now need to be aware of in-memory 
NDV and on-disk and do a kind of merge between them when an NDV is requested 
for such field ... it's not going to be pretty-looking code I imagine.

> Numeric DocValues Updates
> -------------------------
>
>                 Key: LUCENE-5189
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5189
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 4.6, 5.0
>
>         Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
> LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
> LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
> LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

Reply via email to