But then you're back to syncing in a BG thread, right? We've come
full circle.
Asynchronously syncing give the best performance we've seen so far,
and so that's the current patch on LUCENE-1044 (using CMS's threads).
Using a transaction log would also require async. syncing, but then
would also add 2X IO cost of flushing and 2X disk usage between
commits.
I don't see how that could be faster. I expect it to perform quite a
bit worse.
Also, I tested system wide sync in LUCENE-1044 and found it no better
than syncing individual files synchronously (which was our worst
performance number). And I don't think Lucene should be doing a system
wide sync. There may be other processes doing IO whose buffers we
shouldn't, and don't need to, sync.
Mike
robert engels wrote:
Yes, but this pruning could be more efficient. On a background
thread, get current segment from segments file, call the system
wide sync ( e.g. System.exec("fsync"), then you can purge the
transaction logs for all segments up to that one. Since it is a
background operation, you are not blocking the writing of new
segments and tx logs.
On Feb 6, 2008, at 4:42 PM, Michael McCandless wrote:
robert engels wrote:
Do we have any way of determining if a segment is definitely OK/
VALID ?
The only way I know is the CheckIndex tool, and it's rather slow (and
it's not clear that it always catches all corruption).
If so, a much more efficient transactional system could be
developed.
Serialize the updates to a log file. Sync the log. Update the
lucene index WITHOUT any sync. Log file writing/sync is VERY
efficient since it is sequential, and a single file.
Upon open of the index, detect if index was not shutdown cleanly.
If so, determine the last valid segment, delete the bad segments,
and then perform the updates (from the log file) since the last
valid segment was written.
The detection could be a VERY slow operation, but this is ok,
since it should be rare, and then you will only pay this price on
the rare occasion, not on every update.
Wouldn't you still need to sync periodically, so you can prune the
transaction log? Else your transaction log is growing as fast as the
index? (You've doubled disk usage).
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]