In a 1-segment (parallel) read-only index, that is built offline once (and then frozen), is it possible to remap the docIds as the last step (i.e... to have the exact same index, except that the docIds are all equal to the ord the docs where added to the index) ?

Say I have the read only index

docId   : document
1 : bookB
2 : sentenceB
3 : linkA
4 : linkC
5 : sentenceC
6 : sentenceA
7 : bookA
...
300000 : linkD

I would like to have instead the read-only index

docId   : document
1 : bookA
2 : bookB
....

M : linkA
M+1: linkB
...
N+1 : sentenceA
N+2 : sentenceB
...
300000:sentenceZZZ

This would allow me to reduce the amount of ram to cache the type of each document

-> without remapping, I need at least log2(types)* documents bits
here 2 * 300000 bits

-> with remapping, I need only to remember ints M and N

Also, if I need to cache 1 byte of metadata for each book

-> without remapping, I would need 1 byte * documents
here 300000 bytes

-> with remapping, I would only need 1 byte * books
here M - 1 bytes


I tried building such an index with LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I did something wrong), the docIds were always reshuffled (maybee because my index was big and I was over a threshold)



Best regards,
Olivier

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to