Marvin Humphrey <[email protected]> wrote: > On Tue, Mar 24, 2009 at 07:57:48AM -0400, Michael McCandless wrote: > >> Will it do the same write-once lockless approach (snapshot_N) that Lucene >> does? > > More or less, I think. Definitely I advocate embedding base-36 generation > numbers in the filenames.
OK > The crucial innovation of lockless commits was the retry logic in > IndexReader.open(), which depends on the snapshot generation numbers. That > retry logic is in the KS prototype. Yes. > However, I have found it difficult to stop the caught exception from leaking > memory in the event of a retry. Hopefully we can fix that, but it's tricky. This is tricky in Lucene, too. You must go and close any of the segments that did succeed in opening. >> It still seems like storing per-segment metadata in the snapshot would >> be necessary/helpful. > > As you surmised over in the "Segment" thread, that's in segmeta.json. Right. Though, since you store it w/ the segment, it can't be versioned? (Segment files are write once)? Eg, you will store new deletions against segment X with segment Y (when X's new deletions got flushed at the same time that segment Y was flushed). So, where will segment X's new delCount be recorded? Also, what happens if I open a writer, do only deletes, and close? Do you flush an empty (no added docs) segment Y simply to record the new deletions? >> > Snapshot_Delete_Entry() does not delete the file from the index folder; >> > all it >> > does is remove the filename from the next snapshot to be written. Once the >> > new snapshot has been committed, it is possible to identify candidates for >> > deletion by determining which files are present in the old snapshot file >> > but >> > gone from the new one. >> >> Are you just doing reference counting to determine deletable files? > > Yes and no. The logic currently resides in a class called "FilePurger": > > * Don't delete any file listed in the most recent snapshot. > * Don't delete any file listed in any snapshot file that's read-locked. > > By default, Readers don't do any locking, so only the first part matters. > > If you turn on read-locking, the "is-this-snapshot-file-locked" test uses > reference counts in the form of numbered dot-lock files -- though you can > override the locking mechanism if you choose. > > However, the Snapshot class itself is agnostic about that. It's just a list > of files. > > In a little while, I'll propose an "IndexManager" class from which all > merging and deletion policies flow. OK, I'll stay tuned. >> Will Lucy allow more than one snapshot to remain in the index? > > Sure. > > (Perhaps that would have been clear in my original post had I remembered to > endorse the base-36 generation naming scheme.) > > The Snapshot class is supposed to be very simple and flexible. Logically > speaking, it's easy to leave more than one snapshot file around and to avoid > deleting any file that's listed in an active snapshot. OK. Will snapshot allow user-defined (opaque to Lucy) metadata to be recored inside it? Mike
