Op Tuesday 23 September 2008 20:26:18 schreef Michael McCandless: > Paul Elschot wrote: > > So, adding a document offset from the documents/frequencies > > into the positions/payloads for each document would allow: > > - bulk copying of the position/payloads during merging, and > > - a more efficient implementation of TermPositions.skipTo() > > in that decoding the positions from the last available skip > > document to the target of skipTo() could be avoided. > > Is that correct? > > Yes, though this would also add cost of computing/writing/reading > that new offset, and would increase the index size. > > > That would indeed be invasive. > > Yes. I think our time would likely be better spent working on using > PForDelta for freq/prox.
To change the prox data to PForDelta, it's nice to be have bulk copies on prox working first. That would allow to change the total size of the prox data easily. But it appears to be easier to start with the doc/freq data, add more prox pointers there, and then change the prox data. PForDelta is fundamentally different from the existing index data because an encoded number cannot be accessed on a byte border. I don't know yet how to deal with that in the index data structures. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]