[
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828524#comment-13828524
]
Shai Erera commented on LUCENE-5189:
------------------------------------
You're right Simon. The updates are buffered in their raw form in memory until
a flush is needed (e.g. commit(), or NRT-open). At that point they are resolved
and written to the Directory. This is where it differs from deletes - while
deletes are small enough to keep the resolved form in-memory, updates aren't -
a single update can affect millions of documents, each takes a long (updated
value) ... perhaps future work could be to distinguish between small and large
updates, and keep the small updates still in memory. But I believe that will
affect a lot more code, e.g. SegReader will now need to be aware of in-memory
NDV and on-disk and do a kind of merge between them when an NDV is requested
for such field ... it's not going to be pretty-looking code I imagine.
> Numeric DocValues Updates
> -------------------------
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch,
> LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch,
> LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch,
> LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch,
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch,
> LUCENE-5189.patch, LUCENE-5189_process_events.patch,
> LUCENE-5189_process_events.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the
> amount of changes are immense and hard to follow/consume. The reason is that
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the
> values of all the documents in a segment for the updated field (similar to
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to
> update, yet requires many changes to core code which will also be useful for
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the
> changes.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]