Yonik Seeley wrote:
> >> But, I'm still renaming segments_N.new -> segments_N,
> >
> > Hmmm, remind me why you need the .new file? Why can't you just create
> > segments_N after you are finished writing all of the segments?
>
> Because there could be a reader that tries to read the file before it's
> done being written.  It would hit EOF and throw an IOException.

Ahh, right... unlikely (the segments file is pretty small), but possible.

Another alternative (since this changes the index format anyway) is to
put something in the segments file to detect if it's partially
written... something like the size of the file or the number of
segments.  I don't know if the extra complexity would be worth saving
the creation time of an extra file or not...

Hey wait... the segments file already has the number of segments.
Can't you tell if it's not yet complete?

Good point!  A reader could easily know that's it's dealing with an
unfinished segments file (since the file says how many segments it
has) and then sleep/retry until the file completes, which should be a
rare event.  Note that such contention in the current Lucene (ie, on
the commit lock) results in a 1.0 second delay and then retry.

Though what if the writer has crashed and so the new segments file
will never be done?  I guess reader could fallback to the previous
_(N-1) file after some time at the cost of more delay.

I think that approach would work but I'm still worried about the
interaction with filesystem caching.  EG how much latency is added by
the caching before it realizes this file now has some more data?  I'd
like to have a solution for Lucene whereby we are invariant to the
filesystem's caching policies.

I think the good news here is we have quite a few options on how to
never use file renaming in Lucene (thanks to its simplicity!).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to