[
https://issues.apache.org/jira/browse/SOLR-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-4557:
---------------------------
Attachment: SOLR-4557_posthshutdown_stack.txt
Erick: you didn't mention what threads you see in thread dumps when you see
hang's with this patch, but when i try it is see...
{noformat}
[junit4:junit4] 2> 4384 T10 oasc.SolrCore.closeSearcher [collection1] Closing
main searcher on request.
[junit4:junit4] 2> 4385 T10 oas.SolrTestCaseJ4.tearDown ###Ending testReload
[junit4:junit4] 1> EOE: started thread 12
[junit4:junit4] 1> EOE: started thread 13
[junit4:junit4] 1> EOE: started thread 14
[junit4:junit4] 1> EOE: started thread 15
[junit4:junit4] 1> EOE: past join thread 12
[junit4:junit4] 1> EOE: past join thread 13
[junit4:junit4] 1> EOE: past join thread 14
[junit4:junit4] 1> EOE: past join thread 15
[junit4:junit4] OK 2.06s | TestCoreContainer.testReload
[junit4:junit4] 2> 4390 T10 oas.SolrTestCaseJ4.deleteCore ###deleteCore
[junit4:junit4] 2> 4390 T10 oasc.CoreContainer.shutdown Shutting down
CoreContainer instance=2056171012
{noformat}
...at which point there is a pause, and i took a threaddump with jstack (see
SOLR-4557_posthshutdown_stack.txt attachment) then waiting a bit more...
{noformat}
[junit4:junit4] 2> 125406 T10 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE
ERROR: SolrIndexSearcher opens=9 closes=5
[junit4:junit4] 2> 125428 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING
Will linger awaiting termination of 4 leaked thread(s).
[junit4:junit4] HEARTBEAT J0 PID(13420@frisbee): 2013-03-11T16:38:37, stalled
for 126s at: TestCoreContainer.testReload
[junit4:junit4] 2> 145551 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 4
threads leaked from SUITE scope at org.apache.solr.core.TestCoreContainer:
[junit4:junit4] 2> 1) Thread[id=16, name=searcherExecutor-5-thread-1,
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4] 2> at sun.misc.Unsafe.park(Native Method)
[junit4:junit4] 2> at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4] 2> at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4] 2> at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4] 2> at java.lang.Thread.run(Thread.java:679)
[junit4:junit4] 2> 2) Thread[id=17, name=searcherExecutor-8-thread-1,
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4] 2> at sun.misc.Unsafe.park(Native Method)
[junit4:junit4] 2> at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4] 2> at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4] 2> at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4] 2> at java.lang.Thread.run(Thread.java:679)
[junit4:junit4] 2> 3) Thread[id=11, name=searcherExecutor-2-thread-1,
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4] 2> at sun.misc.Unsafe.park(Native Method)
[junit4:junit4] 2> at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4] 2> at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4] 2> at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4] 2> at java.lang.Thread.run(Thread.java:679)
[junit4:junit4] 2> 4) Thread[id=18, name=searcherExecutor-11-thread-1,
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4] 2> at sun.misc.Unsafe.park(Native Method)
[junit4:junit4] 2> at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4] 2> at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4] 2> at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4] 2> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4] 2> at java.lang.Thread.run(Thread.java:679)
{noformat}
...followed by errors from ThreadLeakControl.tryToInterruptAll that it was
unable to terminate those searchExecuter threads.
The threaddump i got from jstack jives with the threaddump from the test
framework as well as the error from SolrTestCaseJ4.endTrackingSearchers about
"opens=9 closes=5" -- it would appear that there is a SolrIndexSearcher leaking
for each of the 4 reload commands executed.
I have to run, but i would suggest starting by looking closely at how the
SolrIndexSearcher refrences are tracked on core init/close and compare that
with what's done on reload.
> Fix broken CoreContainerTest.testReload
> ---------------------------------------
>
> Key: SOLR-4557
> URL: https://issues.apache.org/jira/browse/SOLR-4557
> Project: Solr
> Issue Type: Test
> Affects Versions: 4.2, 5.0
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Attachments: SOLR-4557.patch, SOLR-4557_posthshutdown_stack.txt
>
>
> I was chasing down a test failure, and it turns out that
> CoreContainerTest.testReload has only succeeded by chance. The test fires up
> 4 threads that go out and reload the same core all at once. This caused me to
> look at properly synchronizing reloading cores pursuant to SOLR-4196, on the
> theory that we should serialize loading, unloading and reloading cores; we
> shouldn't be doing _any_ of those operations from different threads on the
> same core at the same time. It turns out that if you fire up multiple reloads
> at once without serializing them, an error is thrown instead of proper
> reloading occurring, and that's the only reason the test doesn't hang. The
> stack trace of the exception is below for reference, but it doesn't with the
> code I'll attach to this patch:
> [junit4:junit4] 2> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
> [junit4:junit4] 2> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
> [junit4:junit4] 2> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
> [junit4:junit4] 2> at
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:536)
> [junit4:junit4] 2> at
> org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:138)
> [junit4:junit4] 2> at
> org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
> [junit4:junit4] 2> at
> org.apache.solr.core.RequestHandlers.register(RequestHandlers.java:106)
> [junit4:junit4] 2> at
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:157)
> [junit4:junit4] 2> at
> org.apache.solr.core.SolrCore.<init>(SolrCore.java:757)
> [junit4:junit4] 2> at
> org.apache.solr.core.SolrCore.reload(SolrCore.java:408)
> [junit4:junit4] 2> at
> org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1076)
> [junit4:junit4] 2> at
> org.apache.solr.core.TestCoreContainer$1TestThread.run(TestCoreContainer.java:90)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]