On May 25, 2012, at 8:11 AM, Sami Siren wrote:

> Just thinking out loud... shouldn't solr(cloud) manage such situation
> gracefully?

Currently, you can handle it gracefully if you up the graceful timeout in 
jetty. It's easy enough to do that with the jetty we ship, but it's painful 
(extremely it seems) to do it in tests.

In any case, I don't think it hurts anything practically? The merge thread 
fails, and so simply, you don't get those merges I think? The problem with the 
tests is that the exception is thrown from the merge thread. We have no affect 
on that from Solr - the test framework picks up an uncaught exception in the 
thread, and our goose is cooked.

> I mean in real life solr instances can be killed or even
> whole servers can go away. Would it be ok to ignore that exception
> instead?

It's at the Lucene level really, so unless we try really hard to work around 
it, we would have to figure out if something different made sense there I think.

Right now, if its waiting for merges to finish and gets interrupted, it throws 
an interrupted exception. Unless we explicitly try and kill the current merge 
threads, I'd think that could be a problem in any general code. You close the 
IW with wait for merges to finish = true, then you start closing other 
resources, because you assume you are done with the IW, but in fact merges can 
still be occurring if the thread was interrupted. And you might close resources 
merging depends on (ie the directory).

Lucene does not like interruptions in other cases as well, but unfortunately, 
running in a webapp, we can't easily always avoid them it seems.

> --
> Sami Siren
> 
> On Fri, May 25, 2012 at 3:01 PM, Mark Miller <[email protected]> wrote:
>> I actually know what this one is now.
>> 
>> Jetty is shutting down, and the graceful timeout is too low, and so jetty 
>> interrupts the webapp, and while we are waiting for merges to finish on 
>> IW#close, an interrupt is thrown and we stop waiting. So the directory is 
>> then closed out from under the merge thread. So really, mostly a test issue 
>> it seems?
>> 
>> So I changed out jetty instances in tests to a 30 second graceful shutdown. 
>> Tests went from 6 minutes for me, to 33 minutes. I won't make this fix for 
>> now :) One idea is to perhaps do it just for this test - but even then it 
>> makes the test *much* longer, and there is no reason it can't happen on 
>> other tests that use jetty instances. It just happens to only show up in the 
>> test currently AFAICT.
>> 
>> On May 25, 2012, at 5:30 AM, Apache Jenkins Server wrote:
>> 
>>> Build: https://builds.apache.org/job/Solr-trunk/1865/
>>> 
>>> 1 tests failed.
>>> REGRESSION:  org.apache.solr.cloud.RecoveryZkTest.testDistribSearch
>>> 
>>> Error Message:
>>> Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread 
>>> #2,6,]
>>> 
>>> Stack Trace:
>>> java.lang.RuntimeException: Thread threw an uncaught exception, thread: 
>>> Thread[Lucene Merge Thread #2,6,]
>>>       at 
>>> com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>>>       at 
>>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>>>       at 
>>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>>>       at 
>>> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>>>       at 
>>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>>>       at 
>>> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>>>       at 
>>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>>>       at 
>>> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>>>       at 
>>> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>>>       at 
>>> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>>>       at 
>>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>>       at 
>>> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
>>>       at 
>>> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
>>> Caused by: org.apache.lucene.index.MergePolicy$MergeException: 
>>> org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>>>       at __randomizedtesting.SeedInfo.seed([8B4A827F28B6F16]:0)
>>>       at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
>>>       at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480)
>>> Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory 
>>> is closed
>>>       at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244)
>>>       at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241)
>>>       at 
>>> org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:345)
>>>       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3031)
>>>       at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
>>>       at 
>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
>>> 
>>> 
>>> 
>>> 
>>> Build Log (for compile errors):
>>> [...truncated 41930 lines...]
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to