Marvin Humphrey wrote:
On Jan 17, 2007, at 1:16 PM, Michael McCandless wrote:
This is the solution I have in mind for LUCENE-710: change the
IndexFileDeleter so that instead of always immediately deleting the
last commit when a new commit happens, allow some time before doing
so. This way readers have a chance to refresh. The actual time would
be settable by the developer. So if you set it to 6 hours, then, a
commit would remain usable for at least 6 hours after it had been
obsoleted by a new commit. This means if you can ensure your readers
refresh within 6 hours of a new commit happening, then the writer will
never delete an "in-use" commit.
I've been mulling this over. If you set the interval to 6 hours, and
there's a lot of churn (e.g. if you optimize frequently), you'll end up
with a lot of wasted disk space. On the flip side, the user has to set
up some sort of trigger for refreshing the IndexReaders anyway. It's
still not user-friendly by default, and we'd be polluting the API with a
hateful workaround.
Well, 6 hours would be a long time for such a high turnover site.
They would presumably set the time to something like 10 minutes
instead.
I think we should decouple the deletion policy from commits. This way
developers could subclass and make their own deletion policy that
suits their application. The IndexFileDeleter base class would do all
the legwork to keep ref counts to all specific index files based on
all segments_N commits that are still "live". Then the deletion
policy just decides which commits should be deleted, when. (This is
roughly what's outlined in LUCENE-710).
The current policy is to delete all prior commits after a new commit
and that would remain the default.
Chuck's idea (reference counting via filesystem) would be another
policy. My proposal (delete by time after being obsoleted) would be
another policy, etc.
The real problem is NFS. For background, see
<http://nfs.sourceforge.net/#section_d>, item D2, which deals with NFS
and "delete on last close".
Now I wonder. Version 4 of the NFS protocol introduces state, so it's
possible to implement file locking. Can we lock a segments file, then
have IndexFileDeleter detect which segments are locked that way? And if
that's the case, can we detect whether the locking mechanism is failing
and throw an exception if someone tries to use an earlier version of NFS?
Locking and NFS makes me very nervous :)
I'd be cool with making it impossible to put an index on an NFS volume
prior to version 4. That puts the blame where it belongs.
Well, most times users have no control over which NFS server and/or
client version is in use, so I think taking this approach of "pinning
the blame" can only hurt our users. I would rather find a solution
that's more portable, if we can (like the ref counting idea Chuck
brought up).
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]