Doron Cohen wrote:
I am not happy with complicating the readers like this, conceptually
adding back commit locks (for deletion), this time with a keep-a-life
thread, and again making readers not read-only.
To my understanding the only remaining issue with NFS is: a reader
might get an IO exception in case writer removed an old file that
the reader is using.
It is not a possible corruption that we try to solve, right?
For that I think it is not worth to add that stuff again.
A writer's "two steps" policy - delete only files that
"would have not been in use unless a reader did not refresh for X minutes"
is "fair enough" I think.
By "two steps" I mean, start measuring time not from when segment to be
deleted was created, but rather from when its "next generation" was
created.
Right, this was my original proposed deletion policy (below) for
things to work on NFS.
It does assume/require your application can refresh readers within the
specified time period. A commit (and any segments that then ref count
to zero) gets deleted after they have been "obsoleted" for more than X
minutes.
Even though it's not perfect (progress not perfection!), I like it the
best of the three options discussed on this thread so far because 1)
it leaves the readers read only, and 2) it should work on all versions
of NFS.
This would just be a different deletion policy, and it wouldn't be the
default one. We would leave the default as "keep only last commit
and delete old one immediately", for backwards compatibility.
Finally, an application can always make their own deletion policy
(subclass IndexFileDeleter) if they need to.
Mike
Michael McCandless <[EMAIL PROTECTED]> wrote on 18/01/2007
14:24:16:
Marvin Humphrey wrote:
On Jan 17, 2007, at 1:16 PM, Michael McCandless wrote:
This is the solution I have in mind for LUCENE-710: change the
IndexFileDeleter so that instead of always immediately deleting the
last commit when a new commit happens, allow some time before doing
so. This way readers have a chance to refresh. The actual time would
be settable by the developer. So if you set it to 6 hours, then, a
commit would remain usable for at least 6 hours after it had been
obsoleted by a new commit. This means if you can ensure your readers
refresh within 6 hours of a new commit happening, then the writer will
never delete an "in-use" commit.
I've been mulling this over. If you set the interval to 6 hours, and
there's a lot of churn (e.g. if you optimize frequently), you'll end up
with a lot of wasted disk space. On the flip side, the user has to set
up some sort of trigger for refreshing the IndexReaders anyway. It's
still not user-friendly by default, and we'd be polluting the API with
a
hateful workaround.
Well, 6 hours would be a long time for such a high turnover site.
They would presumably set the time to something like 10 minutes
instead.
I think we should decouple the deletion policy from commits. This way
developers could subclass and make their own deletion policy that
suits their application. The IndexFileDeleter base class would do all
the legwork to keep ref counts to all specific index files based on
all segments_N commits that are still "live". Then the deletion
policy just decides which commits should be deleted, when. (This is
roughly what's outlined in LUCENE-710).
The current policy is to delete all prior commits after a new commit
and that would remain the default.
Chuck's idea (reference counting via filesystem) would be another
policy. My proposal (delete by time after being obsoleted) would be
another policy, etc.
The real problem is NFS. For background, see
<http://nfs.sourceforge.net/#section_d>, item D2, which deals with NFS
and "delete on last close".
Now I wonder. Version 4 of the NFS protocol introduces state, so it's
possible to implement file locking. Can we lock a segments file, then
have IndexFileDeleter detect which segments are locked that way? And
if
that's the case, can we detect whether the locking mechanism is failing
and throw an exception if someone tries to use an earlier version of
NFS?
Locking and NFS makes me very nervous :)
I'd be cool with making it impossible to put an index on an NFS volume
prior to version 4. That puts the blame where it belongs.
Well, most times users have no control over which NFS server and/or
client version is in use, so I think taking this approach of "pinning
the blame" can only hurt our users. I would rather find a solution
that's more portable, if we can (like the ref counting idea Chuck
brought up).
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]