Yonik Seeley wrote:
Good point! A reader could easily know that's it's dealing with an
unfinished segments file (since the file says how many segments it
has) and then sleep/retry until the file completes, which should be a
rare event. Note that such contention in the current Lucene (ie, on
the commit lock) results in a 1.0 second delay and then retry.
Though what if the writer has crashed and so the new segments file
will never be done? I guess reader could fallback to the previous
_(N-1) file after some time at the cost of more delay.
If it will happen so rarely, make it simpler and go directly for
segments_(N-1)... (treat it like your previous plan if segments_N.done
hadn't been written yet).
Yes, true, we could just fall back to the prior segments_(N-1) file in
this case. Though that means the reader will likely just hit an
IOException trying to load the segments (since a commit is "in
process") and then I'd have to re-retry against segments_N.
I think that approach would work but I'm still worried about the
interaction with filesystem caching. EG how much latency is added by
the caching before it realizes this file now has some more data?
Local filesystems don't have that problem.
Right, in local filesystems the caching is always "coherent" (thank
goodness) so you don't have these issues.
Still I think Lucene needs to aim to work with the "least common
denominator" of all filesystems (to cause the fewest problems for all
our "diverse" users using "diverse" filesystems).
Meaning, don't use certain filesystem operations / semantics /
feature, when possible. Not using "file renaming" is a good example
(since it's unreliable on windows). And, don't expect "cache
coherence for a file's contents" (by never re-using the same filename)
is another.
I've been using NFS as my "proxy" for "least common denominator" in
testing a native locking implementation (will submit patch soon) and
for testing lock-less commits.
Remote filesystems would hopefully check for new blocks on demand (as
you try to read it).
Well I wish I could share this optimism :) NFS clients at best try for
"close to open cache coherence" (and not true cache coherence since
it's too costly), eg described here for the recent Linux NFS clients:
http://nfs.sourceforge.net
I think some NFS clients don't achieve even that. I *think* if the
NFS client does implement this then you're right that re-opening the
file should flush the cache. But ... I'd rather not rely on the
caching at all if we can avoid it (by opening the file only once, when
it's complete). This way we don't require cache coherence on the file
contents ...
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]