On May 25, 2012, at 8:11 AM, Sami Siren wrote: > Just thinking out loud... shouldn't solr(cloud) manage such situation > gracefully?
Currently, you can handle it gracefully if you up the graceful timeout in jetty. It's easy enough to do that with the jetty we ship, but it's painful (extremely it seems) to do it in tests. In any case, I don't think it hurts anything practically? The merge thread fails, and so simply, you don't get those merges I think? The problem with the tests is that the exception is thrown from the merge thread. We have no affect on that from Solr - the test framework picks up an uncaught exception in the thread, and our goose is cooked. > I mean in real life solr instances can be killed or even > whole servers can go away. Would it be ok to ignore that exception > instead? It's at the Lucene level really, so unless we try really hard to work around it, we would have to figure out if something different made sense there I think. Right now, if its waiting for merges to finish and gets interrupted, it throws an interrupted exception. Unless we explicitly try and kill the current merge threads, I'd think that could be a problem in any general code. You close the IW with wait for merges to finish = true, then you start closing other resources, because you assume you are done with the IW, but in fact merges can still be occurring if the thread was interrupted. And you might close resources merging depends on (ie the directory). Lucene does not like interruptions in other cases as well, but unfortunately, running in a webapp, we can't easily always avoid them it seems. > -- > Sami Siren > > On Fri, May 25, 2012 at 3:01 PM, Mark Miller <[email protected]> wrote: >> I actually know what this one is now. >> >> Jetty is shutting down, and the graceful timeout is too low, and so jetty >> interrupts the webapp, and while we are waiting for merges to finish on >> IW#close, an interrupt is thrown and we stop waiting. So the directory is >> then closed out from under the merge thread. So really, mostly a test issue >> it seems? >> >> So I changed out jetty instances in tests to a 30 second graceful shutdown. >> Tests went from 6 minutes for me, to 33 minutes. I won't make this fix for >> now :) One idea is to perhaps do it just for this test - but even then it >> makes the test *much* longer, and there is no reason it can't happen on >> other tests that use jetty instances. It just happens to only show up in the >> test currently AFAICT. >> >> On May 25, 2012, at 5:30 AM, Apache Jenkins Server wrote: >> >>> Build: https://builds.apache.org/job/Solr-trunk/1865/ >>> >>> 1 tests failed. >>> REGRESSION: org.apache.solr.cloud.RecoveryZkTest.testDistribSearch >>> >>> Error Message: >>> Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread >>> #2,6,] >>> >>> Stack Trace: >>> java.lang.RuntimeException: Thread threw an uncaught exception, thread: >>> Thread[Lucene Merge Thread #2,6,] >>> at >>> com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) >>> at >>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) >>> at >>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >>> at >>> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) >>> at >>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) >>> at >>> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) >>> at >>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >>> at >>> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) >>> at >>> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) >>> at >>> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) >>> at >>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >>> at >>> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) >>> at >>> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) >>> Caused by: org.apache.lucene.index.MergePolicy$MergeException: >>> org.apache.lucene.store.AlreadyClosedException: this Directory is closed >>> at __randomizedtesting.SeedInfo.seed([8B4A827F28B6F16]:0) >>> at >>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507) >>> at >>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480) >>> Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory >>> is closed >>> at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244) >>> at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241) >>> at >>> org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:345) >>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3031) >>> at >>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382) >>> at >>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451) >>> >>> >>> >>> >>> Build Log (for compile errors): >>> [...truncated 41930 lines...] >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > - Mark Miller lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
