On Tue, Mar 24, 2009 at 07:57:48AM -0400, Michael McCandless wrote: > Will it do the same write-once lockless approach (snapshot_N) that Lucene > does?
More or less, I think. Definitely I advocate embedding base-36 generation numbers in the filenames. The crucial innovation of lockless commits was the retry logic in IndexReader.open(), which depends on the snapshot generation numbers. That retry logic is in the KS prototype. However, I have found it difficult to stop the caught exception from leaking memory in the event of a retry. Hopefully we can fix that, but it's tricky. > It still seems like storing per-segment metadata in the snapshot would > be necessary/helpful. As you surmised over in the "Segment" thread, that's in segmeta.json. > > Snapshot_Delete_Entry() does not delete the file from the index folder; all > > it > > does is remove the filename from the next snapshot to be written. Once the > > new snapshot has been committed, it is possible to identify candidates for > > deletion by determining which files are present in the old snapshot file but > > gone from the new one. > > Are you just doing reference counting to determine deletable files? Yes and no. The logic currently resides in a class called "FilePurger": * Don't delete any file listed in the most recent snapshot. * Don't delete any file listed in any snapshot file that's read-locked. By default, Readers don't do any locking, so only the first part matters. If you turn on read-locking, the "is-this-snapshot-file-locked" test uses reference counts in the form of numbered dot-lock files -- though you can override the locking mechanism if you choose. However, the Snapshot class itself is agnostic about that. It's just a list of files. In a little while, I'll propose an "IndexManager" class from which all merging and deletion policies flow. > Will Lucy allow more than one snapshot to remain in the index? Sure. (Perhaps that would have been clear in my original post had I remembered to endorse the base-36 generation naming scheme.) The Snapshot class is supposed to be very simple and flexible. Logically speaking, it's easy to leave more than one snapshot file around and to avoid deleting any file that's listed in an active snapshot. Marvin Humphrey
