[ 
https://issues.apache.org/jira/browse/SOLR-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4557:
---------------------------

    Attachment: SOLR-4557_posthshutdown_stack.txt

Erick: you didn't mention what threads you see in thread dumps when you see 
hang's with this patch, but when i try it is see...

{noformat}
[junit4:junit4]   2> 4384 T10 oasc.SolrCore.closeSearcher [collection1] Closing 
main searcher on request.
[junit4:junit4]   2> 4385 T10 oas.SolrTestCaseJ4.tearDown ###Ending testReload
[junit4:junit4]   1> EOE: started thread 12
[junit4:junit4]   1> EOE: started thread 13
[junit4:junit4]   1> EOE: started thread 14
[junit4:junit4]   1> EOE: started thread 15
[junit4:junit4]   1> EOE: past join thread 12
[junit4:junit4]   1> EOE: past join thread 13
[junit4:junit4]   1> EOE: past join thread 14
[junit4:junit4]   1> EOE: past join thread 15
[junit4:junit4] OK      2.06s | TestCoreContainer.testReload
[junit4:junit4]   2> 4390 T10 oas.SolrTestCaseJ4.deleteCore ###deleteCore
[junit4:junit4]   2> 4390 T10 oasc.CoreContainer.shutdown Shutting down 
CoreContainer instance=2056171012
{noformat}

...at which point there is a pause, and i took a threaddump with jstack (see 
SOLR-4557_posthshutdown_stack.txt attachment) then waiting a bit more...

{noformat}
[junit4:junit4]   2> 125406 T10 oas.SolrTestCaseJ4.endTrackingSearchers SEVERE 
ERROR: SolrIndexSearcher opens=9 closes=5
[junit4:junit4]   2> 125428 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING 
Will linger awaiting termination of 4 leaked thread(s).
[junit4:junit4] HEARTBEAT J0 PID(13420@frisbee): 2013-03-11T16:38:37, stalled 
for  126s at: TestCoreContainer.testReload
[junit4:junit4]   2> 145551 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 4 
threads leaked from SUITE scope at org.apache.solr.core.TestCoreContainer: 
[junit4:junit4]   2>       1) Thread[id=16, name=searcherExecutor-5-thread-1, 
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4]   2>            at sun.misc.Unsafe.park(Native Method)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4]   2>            at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4]   2>            at java.lang.Thread.run(Thread.java:679)
[junit4:junit4]   2>       2) Thread[id=17, name=searcherExecutor-8-thread-1, 
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4]   2>            at sun.misc.Unsafe.park(Native Method)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4]   2>            at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4]   2>            at java.lang.Thread.run(Thread.java:679)
[junit4:junit4]   2>       3) Thread[id=11, name=searcherExecutor-2-thread-1, 
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4]   2>            at sun.misc.Unsafe.park(Native Method)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4]   2>            at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4]   2>            at java.lang.Thread.run(Thread.java:679)
[junit4:junit4]   2>       4) Thread[id=18, name=searcherExecutor-11-thread-1, 
state=WAITING, group=TGRP-TestCoreContainer]
[junit4:junit4]   2>            at sun.misc.Unsafe.park(Native Method)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4]   2>            at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4]   2>            at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1069)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1131)
[junit4:junit4]   2>            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[junit4:junit4]   2>            at java.lang.Thread.run(Thread.java:679)
{noformat}

...followed by errors from ThreadLeakControl.tryToInterruptAll that it was 
unable to terminate those searchExecuter threads.

The threaddump i got from jstack jives with the threaddump from the test 
framework as well as the error from SolrTestCaseJ4.endTrackingSearchers about 
"opens=9 closes=5" -- it would appear that there is a SolrIndexSearcher leaking 
for each of the 4 reload commands executed.

I have to run, but i would suggest starting by looking closely at how the 
SolrIndexSearcher refrences are tracked on core init/close and compare that 
with what's done on reload.
                
> Fix broken CoreContainerTest.testReload
> ---------------------------------------
>
>                 Key: SOLR-4557
>                 URL: https://issues.apache.org/jira/browse/SOLR-4557
>             Project: Solr
>          Issue Type: Test
>    Affects Versions: 4.2, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-4557.patch, SOLR-4557_posthshutdown_stack.txt
>
>
> I was chasing down a test failure, and it turns out that 
> CoreContainerTest.testReload has only succeeded by chance. The test fires up 
> 4 threads that go out and reload the same core all at once. This caused me to 
> look at properly synchronizing reloading cores pursuant to SOLR-4196, on the 
> theory that we should serialize loading, unloading and reloading cores; we 
> shouldn't be doing _any_ of those operations from different threads on the 
> same core at the same time. It turns out that if you fire up multiple reloads 
> at once without serializing them, an error is thrown instead of proper 
> reloading occurring, and that's the only reason the test doesn't hang. The 
> stack trace of the exception is below for reference, but it doesn't with the 
> code I'll attach to this patch:
> [junit4:junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
> [junit4:junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
> [junit4:junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
> [junit4:junit4]   2>  at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:536)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:138)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.RequestHandlers.register(RequestHandlers.java:106)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:157)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.SolrCore.<init>(SolrCore.java:757)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.SolrCore.reload(SolrCore.java:408)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1076)
> [junit4:junit4]   2>  at 
> org.apache.solr.core.TestCoreContainer$1TestThread.run(TestCoreContainer.java:90)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to