My point is that commit needs to be used in most applications, and
the commit in Lucene is very slow.
You don't have 2x the IO cost, mainly because only the log file needs
to be sync'd. The index only has to be sync'd eventually, in order
to prune the logfile - this can be done in the background, improving
the performance of update and commit cycle.
Also, writing the log file is very efficiently because it is an
append/sequential operation. Writing the segment files writes
multiple files - essentially causing random access writes.
I guess I don't see the benefit of 1044 if you can't guarantee the
index is at a certain point (you can by calling commit(), but it is
VERY slow).
I was thinking a better design is to serialize the documents/
operations to disk, and maintain an in memory index of updates/
removes, and then merge those indexes to the main when needed - using
a parallel reader on both in the mean-time.
On Feb 7, 2008, at 3:06 PM, Michael McCandless wrote:
robert engels wrote:
I might be misunderstanding 1044. There were several approaches,
and I am not certain what was the final???
The final approach (take 7) is to make the index consistent (sync
the files) after finishing a merge. Also, a new method ("commit")
is added which will force a synchronous sync while you wait. Close
also does this.
I reread the bug and am still a bit unclear.
If the segments are sync'd as part of the commit, then yes, that
would suffice. The merges don't need to commit, you just can't
delete the segments until the merge completes.
I think that building the segments, and syncing each segment -
since in most cases the caller is going to call commit as part of
each update, is going to be slower than writing the documents/
operations to a log file, but a lot depends on how Lucene is used
(interactive vs. batch, lots of updates vs. a few).
Well, and based on how frequently you prune the transaction log
(sync the real files). I think the 2X IO cost is going to make
performance worse with the transaction log.
I am not sure how deletions are impacted by all of this.
Should be fine? The *.del files need to be sync'd just like the
rest of the segments files.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]