Hello, I'm still interested in having the answer to the following question :

In a 1-segment read-only index (that is built offline once and then frozen), is it possible to remap the docIds ?



I may have a (working but not optimal) answer to my original problem : I may use a MultiReader and 3 index to get the following composite index

docId   : document
-------------------------
1             : bookA
2             : bookB
....

M            : linkA
M+1       : linkB
...
N+1       :  sentenceA
N+2       : sentenceB
...
300000 :sentenceZZZ


This solution should be slower that if I only built 1 index while having the docId equal to the order in which I added the documents.










On 05/12/2014 06:01 PM, Olivier Binda wrote:
In a 1-segment (parallel) read-only index, that is built offline once (and then frozen), is it possible to remap the docIds as the last step (i.e... to have the exact same index, except that the docIds are all equal to the ord the docs where added to the index) ?

Say I have the read only index

docId   : document
1 : bookB
2 : sentenceB
3 : linkA
4 : linkC
5 : sentenceC
6 : sentenceA
7 : bookA
...
300000 : linkD

I would like to have instead the read-only index

docId   : document
1 : bookA
2 : bookB
....

M : linkA
M+1: linkB
...
N+1 : sentenceA
N+2 : sentenceB
...
300000:sentenceZZZ

This would allow me to reduce the amount of ram to cache the type of each document

-> without remapping, I need at least log2(types)* documents bits
here 2 * 300000 bits

-> with remapping, I need only to remember ints M and N

Also, if I need to cache 1 byte of metadata for each book

-> without remapping, I would need 1 byte * documents
here 300000 bytes

-> with remapping, I would only need 1 byte * books
here M - 1 bytes


I tried building such an index with LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I did something wrong), the docIds were always reshuffled (maybee because my index was big and I was over a threshold)



Best regards,
Olivier

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Reply via email to