[
https://issues.apache.org/jira/browse/LUCENE-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466639
]
Doron Cohen commented on LUCENE-710:
------------------------------------
Michael McCandless wrote:
> The solution I have in mind abstracts away all tricky details of
> deleting files. EG something like:
>
> public class OnlyLastCommitDeleter extends IndexFileDeleter {
>
> void onInit(List commits) {
> onCommit(commits);
> }
>
> void onCommit(List commits) {
> if (commits.size() > 1) {
> for(int i=0;i<commits.size()-1;i++) {
> deleteCommit(commits.get(i));
> }
> }
> }
>
> Ie, the sole responsibility of the IndexFileDeleter subclass (policy)
> is to decide when to delete a commit. The rest of the details
> (figuring out what actual files can be deleted now that a given commit
> segments_N is deleted) are handled by the base class (with in-memory
> ref counting).
>
I don't really understand this interface and so I cannot see how
you intend to rewrite the IndexFileDeleter as you describe, but I
agree that if this can be done it is a better solution. So I am
okay with waiting for this approach to mature into code.
(I would prefer the DeletionPolicy to be a
pluggable *interface* and the IndexFileDeleter to be
an internal *class*, so that at least we do not expose now something
that would stand in our way in the future. But again, since I do not
fully understand your solution maybe please bear with me if this is
not making sense.)
> Implement "point in time" searching without relying on filesystem semantics
> ---------------------------------------------------------------------------
>
> Key: LUCENE-710
> URL: https://issues.apache.org/jira/browse/LUCENE-710
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.1
> Reporter: Michael McCandless
> Assigned To: Michael McCandless
> Priority: Minor
>
> This was touched on in recent discussion on dev list:
> http://www.gossamer-threads.com/lists/lucene/java-dev/41700#41700
> and then more recently on the user list:
> http://www.gossamer-threads.com/lists/lucene/java-user/42088
> Lucene's "point in time" searching currently relies on how the
> underlying storage handles deletion files that are held open for
> reading.
> This is highly variable across filesystems. For example, UNIX-like
> filesystems usually do "close on last delete", and Windows filesystem
> typically refuses to delete a file open for reading (so Lucene retries
> later). But NFS just removes the file out from under the reader, and
> for that reason "point in time" searching doesn't work on NFS
> (see LUCENE-673 ).
> With the lockless commits changes (LUCENE-701 ), it's quite simple to
> re-implement "point in time searching" so as to not rely on filesystem
> semantics: we can just keep more than the last segments_N file (as
> well as all files they reference).
> This is also in keeping with the design goal of "rely on as little as
> possible from the filesystem". EG with lockless we no longer re-use
> filenames (don't rely on filesystem cache being coherent) and we no
> longer use file renaming (because on Windows it can fails). This
> would be another step of not relying on semantics of "deleting open
> files". The less we require from filesystem the more portable Lucene
> will be!
> Where it gets interesting is what "policy" we would then use for
> removing segments_N files. The policy now is "remove all but the last
> one". I think we would keep this policy as the default. Then you
> could imagine other policies:
> * Keep past N day's worth
> * Keep the last N
> * Keep only those in active use by a reader somewhere (note: tricky
> how to reliably figure this out when readers have crashed, etc.)
> * Keep those "marked" as rollback points by some transaction, or
> marked explicitly as a "snaphshot".
> * Or, roll your own: the "policy" would be an interface or abstract
> class and you could make your own implementation.
> I think for this issue we could just create the framework
> (interface/abstract class for "policy" and invoke it from
> IndexFileDeleter) and then implement the current policy (delete all
> but most recent segments_N) as the default policy.
> In separate issue(s) we could then create the above more interesting
> policies.
> I think there are some important advantages to doing this:
> * "Point in time" searching would work on NFS (it doesn't now
> because NFS doesn't do "delete on last close"; see LUCENE-673 )
> and any other Directory implementations that don't work
> currently.
> * Transactional semantics become a possibility: you can set a
> snapshot, do a bunch of stuff to your index, and then rollback to
> the snapshot at a later time.
> * If a reader crashes or machine gets rebooted, etc, it could choose
> to re-open the snapshot it had previously been using, whereas now
> the reader must always switch to the last commit point.
> * Searchers could search the same snapshot for follow-on actions.
> Meaning, user does search, then next page, drill down (Solr),
> drill up, etc. These are each separate trips to the server and if
> searcher has been re-opened, user can get inconsistent results (=
> lost trust). But with, one series of search interactions could
> explicitly stay on the snapshot it had started with.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]