The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That resulting index would have 1 segment and the docIDs would be in your order.
Mike McCandless http://blog.mikemccandless.com On Mon, May 12, 2014 at 12:01 PM, Olivier Binda <olivier.bi...@wanadoo.fr> wrote: > In a 1-segment (parallel) read-only index, that is built offline once (and > then frozen), > is it possible to remap the docIds as the last step (i.e... to have the > exact same index, except that the docIds are all equal to the ord the docs > where added to the index) ? > > Say I have the read only index > > docId : document > 1 : bookB > 2 : sentenceB > 3 : linkA > 4 : linkC > 5 : sentenceC > 6 : sentenceA > 7 : bookA > ... > 300000 : linkD > > I would like to have instead the read-only index > > docId : document > 1 : bookA > 2 : bookB > .... > > M : linkA > M+1: linkB > ... > N+1 : sentenceA > N+2 : sentenceB > ... > 300000:sentenceZZZ > > This would allow me to reduce the amount of ram to cache the type of each > document > > -> without remapping, I need at least log2(types)* documents bits > here 2 * 300000 bits > > -> with remapping, I need only to remember ints M and N > > Also, if I need to cache 1 byte of metadata for each book > > -> without remapping, I would need 1 byte * documents > here 300000 bytes > > -> with remapping, I would only need 1 byte * books > here M - 1 bytes > > > I tried building such an index with LogMergePolicy/NoMergePolicy/extending > the ram buffer but (maybee I did something wrong), > the docIds were always reshuffled (maybee because my index was big and I was > over a threshold) > > > > Best regards, > Olivier > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org