[ 
https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5513:
-------------------------------

    Attachment: LUCENE-5513.patch

Patch:

* Add IW.updateBinaryDocValue
* Makes necessary changes to DW, BufferedUpdates(Stream), ReaderAndUpdates
* Add new BinaryUpdate and BinaryFieldUpdates
* Copied TestNumericDocValuesUpdates and changed to add BDV fields:
** I still add numbers as it makes asserting easy, but I encode them as VLongs, 
so we get different lengths of byte[]
** There are some tests still disabled, see below

Patch still doesn't handle updates that came in while a merge was in flight. 
The reason is that the code in IW.commitMergedDeletes is hairy and adding 
BinaryDV updates will make it even worse. So I want to refactor how the updates 
are represented internally, such that there is a single class DocValuesUpdates 
which captures all DV updates. Since the DV fields cannot overlap (a DV field 
cannot be both numeric and binary), I think it will allow us to use a single 
UpdatesIterator from IW.commitMergedDeletes. I'll take a look at that next and 
re-enable the tests after this has been resolved.

There's one thing to note -- binary DV updates are more expensive to apply than 
NDV updates, depends on the length of the BDV. I.e. when we rewrite the DV 
file, then for NDV we know we write at most 8 bytes per document (compressed), 
but for BDV we may write a large number of bytes per document. I prefer to 
leave that as an optimization for later. One idea I have (which applies to NDV 
as well) is to do that in a sparse DV, or add "stacked" DV files. Either will 
make the producing code more complex, and therefore I prefer to explore that 
later.

> Binary DocValues Updates
> ------------------------
>
>                 Key: LUCENE-5513
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5513
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/index
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: LUCENE-5513.patch
>
>
> LUCENE-5189 was a great move toward. I wish to continue. The reason for 
> having this feature is to have "join-index" - to write children docnums into 
> parent's binaryDV. I can try to proceed the implementation, but I'm not so 
> experienced in such deep Lucene internals. [~shaie], any hint to begin with 
> is much appreciated. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to