Re: Help with Lucene Indexer crash recovery
: That said, it should never in fact cause index corruption, as far as I : know. Lucene is "semi-transactional": at any & all moments you should : be able to destroy the JVM and the index will be unharmed. I would : really like to get to the bottom of why this is not the case here. At any point you can shutdown the JVM and the index will be unharmed, but "destroying" it with "kill -9" goes a little farther then that. Lucene can't make that claim because the JVM can't even garuntee that bytes are written to physical disk when we close() an OutputStream -- all it garuntees is that the bytes have been handed to the OS. When you "kill -9" a process the OS is free to make *EVERYTHING* about that process vanish without cleaning up after it ... i'm pretty sure even pending IO operations are fair game for disappearing. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with Lucene Indexer crash recovery
"vivek sar" <[EMAIL PROTECTED]> wrote: > Sorry, I'm using Lucene 2.2. We are using Lucene to index our database > (Oracle) into documents for full-text search feature. Here is the > process of indexing, > > 1) Have two IndexWriters which run in two different threads and write > to two different directories (temporary indexes). They both read from > the same queue (db resultset queue) and then right to the index. Close > the indexwriters once done. > 2) Once the IndexWriters are done we start the MasterIndex, which is > another IndexWriter. This merges the indexes in those two temporary > indexes. > 3) Once the writer.addIndexes is done I call writer.optimize() and > then writer.close(). > 4) Our IndexSearcher reads only from the MasterIndex This process sounds fine, though as Karl pointed out you could let the reader before you start the optimize. You could also consider skipping the optimize entirely, unless the search latency is in fact too high (or throughput too low) without it. > Once in a while we kill the running application using "kill -9". I > think if the IndexWriter is in process of merging and we kill it we > run into this problem. It has already happened few times in last one > week. I do clean up the lock if there is a write.lock at the startup > of the system. I can not recreate the index as it may take hours to > re-index. As Hoss pointed out, "kill -9" really should be a means of last resort. That said, it should never in fact cause index corruption, as far as I know. Lucene is "semi-transactional": at any & all moments you should be able to destroy the JVM and the index will be unharmed. I would really like to get to the bottom of why this is not the case here. So you've noticed that if kill -9 is sent while the addIndexes is happening then that can lead to this corruption? If possible, could you use IndexWriter.setInfoStream(...) during at least that step to get verbose details about what the writer is doing, and then capture that output & post it the next time you get this error to happen? That would go a long ways to getting to the root cause here. Which OS and file system are you using? Are all these steps happening on a single machine & JVM? > I don't have any shutdown hook right now, but I'm thinking of adding > one for graceful index closing. We use following merge parameters, > > mergeFactor=100 > maxMergeDocs=9 > maxBufferedDocs=1000 Seems OK. > I can try out your tool, is it something that can be integrated into > the application itself? So, basically I'm looking to catch the > "FileNotFoundException" and take some action to recover from it. Well, once the tool has been tested and shown to be bug-free then you could in theory use this as a live recovery inside the application. But for starters I would run it from the command line without the -check. Be very careful: this is totally new code and it could make your situation even worse, if it has any bugs. And remember when the tool works, it will have removed a whole segment from your index which means possibly a great many documents are now gone. Also, it would be far better to get to the root cause & fix it, instead of having to use this tool perpetually. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with Lucene Indexer crash recovery
5 okt 2007 kl. 21.50 skrev vivek sar: Once the writer.addIndexes is done I call writer.optimize() No biggie, but IndexWriter.addIndexes() will automatically optimize, so that is one line of code you can get rid of. it may take hours to re-index /Perhaps/ using IndexWriter.addIndexesNoOptimize(), closing the index, making it accessable and then optimizing it in a new thread could bring the "master index" up for use noticeable sooner. -- karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with Lucene Indexer crash recovery
: Once in a while we kill the running application using "kill -9". I To quote a great man, who frequently quotes another great man: "Well there's your problem!" stop using "kill -9" ... i'll say it again because it's important, and i'm even going to violate etiquite and use all caps because it's *that* important... STOP USING KILL -9 ...it's an abhorent practice that too many people make a habit of. SIGKILL (the signal sent when you run "kill -9") is ment to be a last resort only if you can't get a rogue process to stop by any other means. Instead of using kill -9, add some sort of notification mechanism to your application so you can trigger graceful shutdowns, or at the very least just use "kill" (no -9) so that the process (the JVM) can at least exit on it's own and do basic buffer flushing and file handle closing. : I don't have any shutdown hook right now, but I'm thinking of adding : one for graceful index closing. We use following merge parameters, When you use SIGKILL the process has no idea it's about to die ... it is given no notice, it is wiped off the face of the earth in one blinding atomic action -- so a shutdown hook isn't going to do you any good if you keep using kill -9. http://en.wikipedia.org/wiki/SIGKILL http://speculation.org/garrick/kill-9.html -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with Lucene Indexer crash recovery
Thanks for the response Michael. Sorry, I'm using Lucene 2.2. We are using Lucene to index our database (Oracle) into documents for full-text search feature. Here is the process of indexing, 1) Have two IndexWriters which run in two different threads and write to two different directories (temporary indexes). They both read from the same queue (db resultset queue) and then right to the index. Close the indexwriters once done. 2) Once the IndexWriters are done we start the MasterIndex, which is another IndexWriter. This merges the indexes in those two temporary indexes. 3) Once the writer.addIndexes is done I call writer.optimize() and then writer.close(). 4) Our IndexSearcher reads only from the MasterIndex Once in a while we kill the running application using "kill -9". I think if the IndexWriter is in process of merging and we kill it we run into this problem. It has already happened few times in last one week. I do clean up the lock if there is a write.lock at the startup of the system. I can not recreate the index as it may take hours to re-index. I don't have any shutdown hook right now, but I'm thinking of adding one for graceful index closing. We use following merge parameters, mergeFactor=100 maxMergeDocs=9 maxBufferedDocs=1000 I can try out your tool, is it something that can be integrated into the application itself? So, basically I'm looking to catch the "FileNotFoundException" and take some action to recover from it. Thanks, -vivek On 10/5/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > "vivek sar" <[EMAIL PROTECTED]> wrote: > > > We are using Lucene 2.3. > > Do you mean Lucene 2.2? Your stack trace seems to line up with 2.2, > and 2.3 isn't quite released yet. > > > The problem we are facing is quite a few times if our application is > > stopped (killed or crash) while Indexer is doing its job, the next > > time when we bring up the application the Indexer fails to run with > > the following exception, > > > 2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text > > indexer failed to index > > java.io.FileNotFoundException: > > /opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or > > directory) > > at java.io.RandomAccessFile.open(Native Method) > > at java.io.RandomAccessFile.(Unknown Source) > > at > > > > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) > > at > > > > org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) > > at > > org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) > > at > > > > org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70) > > at > > > > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) > > at > > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) > > at > > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131) > > at > > org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206) > > at > > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610) > > > > The search also doesn't work after this. > > Can you share some details of how you are using Lucene, and, how/why > it's killed or crashed so often? When it crashes, do you get an > exception from Lucene (which could be the root cause here)? > > What OS and filesystem is the index on? Are you changing any default > settings like autoCommit, lock factory & lock file location, etc? > > Even if Lucene (JVM) is killed, the index should not become corrupt in > this particular way, unless the IO system fails to complete its > "write" operations. Lucene always writes & closes new segments files > (_llb.cfs) before writing the segments_N file that refers to them. > > > Looks like the index were left in some weird state (might be > > corrupted). I was wondering if there is a tool or a way to repair the > > indexes if we are not able to open them at run-time? > > I just took a first stab at just such a tool, here: > > https://issues.apache.org/jira/browse/LUCENE-1020 > > Please be very very careful!: I just wrote this code and it could have > some horrible bug that destroys your index. So make a backup of your > index first. > > Could you first run that tool without the "-fix" option and post back > the resulting output? > > Mike > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with Lucene Indexer crash recovery
"vivek sar" <[EMAIL PROTECTED]> wrote: > We are using Lucene 2.3. Do you mean Lucene 2.2? Your stack trace seems to line up with 2.2, and 2.3 isn't quite released yet. > The problem we are facing is quite a few times if our application is > stopped (killed or crash) while Indexer is doing its job, the next > time when we bring up the application the Indexer fails to run with > the following exception, > 2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text > indexer failed to index > java.io.FileNotFoundException: > /opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or > directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.(Unknown Source) > at > > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:506) > at > > org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:536) > at > org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) > at > > org.apache.lucene.index.CompoundFileReader.(CompoundFileReader.java:70) > at > > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) > at > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) > at > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131) > at > org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206) > at > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610) > > The search also doesn't work after this. Can you share some details of how you are using Lucene, and, how/why it's killed or crashed so often? When it crashes, do you get an exception from Lucene (which could be the root cause here)? What OS and filesystem is the index on? Are you changing any default settings like autoCommit, lock factory & lock file location, etc? Even if Lucene (JVM) is killed, the index should not become corrupt in this particular way, unless the IO system fails to complete its "write" operations. Lucene always writes & closes new segments files (_llb.cfs) before writing the segments_N file that refers to them. > Looks like the index were left in some weird state (might be > corrupted). I was wondering if there is a tool or a way to repair the > indexes if we are not able to open them at run-time? I just took a first stab at just such a tool, here: https://issues.apache.org/jira/browse/LUCENE-1020 Please be very very careful!: I just wrote this code and it could have some horrible bug that destroys your index. So make a backup of your index first. Could you first run that tool without the "-fix" option and post back the resulting output? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]