Otis, > You can remove the .lock file and try re-indexing or continuing > indexing where you left off. > I am not sure about the corrupt index. I have never seen it happen, > and I believe I recall reading some messages from Doug Cutting saying > that index should never be left in an inconsistent state.
Obviously never "should" be, but if something's pulling the rug out from under his JRE, changes could be only partially written, right? Or is the writing format in some sense transactionally safe? I've never worked directly on something like this, but I worked at a database software company where they used transaction semantics and a journaling scheme to fake a "bulletproof" file system. Is this how the index-writing code is implemented? In general, I can guess Doug's response - just torch the old index directory and rebuild it; Lucene's indexing is fast enough that you don't need to get clever. This seems to be Doug's stance in general (i.e. "don't get fancy, I already put all the fanciness you'll need into extremely fast indexing and searching"). So far, it seems to work :-). > I could be making this up, though, so I suggest you search through > lucene-user and lucene-dev archives on www.mail-archive.com. > A search for "corrupt" should do it. > Once you figure things out maybe you can post a summary here. I got a little curious, so I went and did the searches. There is exactly one message in each list archive (dev and users) with the keyword "corrupt" in it. The lucene-users instance is irrelevant: http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg00557.html The lucene-dev instance is more useful: http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00157.html It's a post from Doug, dated sept 27, 2001, about adding not just thread-safety but process-safety: It should be impossible to corrupt an index through the Lucene API. However if a Lucene process exits unexpectedly it can leave the index locked. The remedy is simply to, at a time when it is certain that no processes are accessing the index, remove all lock files. So it sounds like it's worth trying just removing the lock files. Hm, is there a way to come up with a "sanity check" you can run on an index to make sure it's not corrupted? This might be an excellent thing to reassure yourself with: something went wrong? Run a sanity check, if it fails just reindex. Steven J. Owens [EMAIL PROTECTED] -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>