robert engels wrote:

Then why not always write  segment.delXXXX, where XXXX is incremented.

This is what Lucene does today.  It's "write once".

Each file may be compressed or uncompressed based on the number of deletions it contains.

Lucene also does this.

Still, as Marvin pointed out, the cost of committing a delete is in proportion to either the number of deletes already on the segment (if written sparse) or the number of documents in the segment (if written non-sparse). It doesn't scale well... though the constant factor may be very small (ie may not matter that much in practice?). With tombstones the commit cost would be in proportion to how many deletes you did (scales perfectly), at the expense of added per-search cost and search iterator state.

For realtime search this could be a good tradeoff to make (lower latency on add/delete -> refreshed searcher, at higher per-search cost), but... in the realtime search discussion we are now thinking that the deletes live with the reader and are carried in RAM over to the reopened reader (LUCENE-1314), bypassing having to commit to the filesystem at all.

One downside to this is it's single-JRE only, ie to do distributed realtime search you'd have to also re-apply the deletes to the head IndexReader on each JRE. (Whereas added docs would be written with a single IndexWriter, and propagated via the filesystem ).

If we go forward with this model then indeed slowish commit times for new deletes are less important since it's for crash recovery and not for opening a new reader.

But we'd have many "control" issues to work through... eg how the reader can re-open against old segments right after a new merge is committed (because the newly merged segment isn't warmed yet), and, how IndexReader can open segments written by the writer but not truly committed (sync'd).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to