Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Michael McCandless Thu, 08 Jan 2009 02:35:44 -0800


robert engels wrote:

Then why not always write  segment.delXXXX, where XXXX is incremented.


This is what Lucene does today.  It's "write once".

Each file may be compressed or uncompressed based on the number ofdeletions it contains.


Lucene also does this.

Still, as Marvin pointed out, the cost of committing a delete is inproportion to either the number of deletes already on the segment (ifwritten sparse) or the number of documents in the segment (if writtennon-sparse). It doesn't scale well... though the constant factor maybe very small (ie may not matter that much in practice?). Withtombstones the commit cost would be in proportion to how many deletesyou did (scales perfectly), at the expense of added per-search costand search iterator state.

For realtime search this could be a good tradeoff to make (lowerlatency on add/delete -> refreshed searcher, at higher per-searchcost), but... in the realtime search discussion we are now thinkingthat the deletes live with the reader and are carried in RAM over tothe reopened reader (LUCENE-1314), bypassing having to commit to thefilesystem at all.

One downside to this is it's single-JRE only, ie to do distributedrealtime search you'd have to also re-apply the deletes to the headIndexReader on each JRE. (Whereas added docs would be written with asingle IndexWriter, and propagated via the filesystem ).

If we go forward with this model then indeed slowish commit times fornew deletes are less important since it's for crash recovery and notfor opening a new reader.

But we'd have many "control" issues to work through... eg how thereader can re-open against old segments right after a new merge iscommitted (because the newly merged segment isn't warmed yet), and,how IndexReader can open segments written by the writer but not trulycommitted (sync'd).


Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Reply via email to