robert engels wrote:
Then why not always write segment.delXXXX, where XXXX is incremented.
This is what Lucene does today. It's "write once".
Each file may be compressed or uncompressed based on the number of
deletions it contains.
Lucene also does this.
Still, as Marvin pointed out, the cost of committing a delete is in
proportion to either the number of deletes already on the segment (if
written sparse) or the number of documents in the segment (if written
non-sparse). It doesn't scale well... though the constant factor may
be very small (ie may not matter that much in practice?). With
tombstones the commit cost would be in proportion to how many deletes
you did (scales perfectly), at the expense of added per-search cost
and search iterator state.
For realtime search this could be a good tradeoff to make (lower
latency on add/delete -> refreshed searcher, at higher per-search
cost), but... in the realtime search discussion we are now thinking
that the deletes live with the reader and are carried in RAM over to
the reopened reader (LUCENE-1314), bypassing having to commit to the
filesystem at all.
One downside to this is it's single-JRE only, ie to do distributed
realtime search you'd have to also re-apply the deletes to the head
IndexReader on each JRE. (Whereas added docs would be written with a
single IndexWriter, and propagated via the filesystem ).
If we go forward with this model then indeed slowish commit times for
new deletes are less important since it's for crash recovery and not
for opening a new reader.
But we'd have many "control" issues to work through... eg how the
reader can re-open against old segments right after a new merge is
committed (because the newly merged segment isn't warmed yet), and,
how IndexReader can open segments written by the writer but not truly
committed (sync'd).
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org