Re: Snapshot

Michael McCandless Wed, 25 Mar 2009 04:39:16 -0700

Marvin Humphrey <[email protected]> wrote:
> On Tue, Mar 24, 2009 at 07:57:48AM -0400, Michael McCandless wrote:
>
>> Will it do the same write-once lockless approach (snapshot_N) that Lucene 
>> does?
>
> More or less, I think.  Definitely I advocate embedding base-36 generation
> numbers in the filenames.


OK

> The crucial innovation of lockless commits was the retry logic in
> IndexReader.open(), which depends on the snapshot generation numbers.  That
> retry logic is in the KS prototype.

Yes.

> However, I have found it difficult to stop the caught exception from leaking
> memory in the event of a retry.  Hopefully we can fix that, but it's tricky.

This is tricky in Lucene, too.  You must go and close any of the
segments that did succeed in opening.

>> It still seems like storing per-segment metadata in the snapshot would
>> be necessary/helpful.
>
> As you surmised over in the "Segment" thread, that's in segmeta.json.

Right.  Though, since you store it w/ the segment, it can't be
versioned?  (Segment files are write once)?

Eg, you will store new deletions against segment X with segment Y
(when X's new deletions got flushed at the same time that segment Y
was flushed).  So, where will segment X's new delCount be recorded?

Also, what happens if I open a writer, do only deletes, and close?  Do
you flush an empty (no added docs) segment Y simply to record the new
deletions?

>> > Snapshot_Delete_Entry() does not delete the file from the index folder; 
>> > all it
>> > does is remove the filename from the next snapshot to be written.  Once the
>> > new snapshot has been committed, it is possible to identify candidates for
>> > deletion by determining which files are present in the old snapshot file 
>> > but
>> > gone from the new one.
>>
>> Are you just doing reference counting to determine deletable files?
>
> Yes and no. The logic currently resides in a class called "FilePurger":
>
>  * Don't delete any file listed in the most recent snapshot.
>  * Don't delete any file listed in any snapshot file that's read-locked.
>
> By default, Readers don't do any locking, so only the first part matters.
>
> If you turn on read-locking, the "is-this-snapshot-file-locked" test uses
> reference counts in the form of numbered dot-lock files -- though you can
> override the locking mechanism if you choose.
>
> However, the Snapshot class itself is agnostic about that.  It's just a list
> of files.
>
> In a little while, I'll propose an "IndexManager" class from which all
> merging and deletion policies flow.

OK, I'll stay tuned.

>> Will Lucy allow more than one snapshot to remain in the index?
>
> Sure.
>
> (Perhaps that would have been clear in my original post had I remembered to
> endorse the base-36 generation naming scheme.)
>
> The Snapshot class is supposed to be very simple and flexible.  Logically
> speaking, it's easy to leave more than one snapshot file around and to avoid
> deleting any file that's listed in an active snapshot.

OK.  Will snapshot allow user-defined (opaque to Lucy) metadata to be
recored inside it?

Mike

Re: Snapshot

Reply via email to