[ 
https://issues.apache.org/jira/browse/SOLR-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter reopened SOLR-14247:
---------------------------------------

Why did this issue modify SolrTestCase.java ? ? ?

 

For reasons i don't understand, this issue removed the Logger from SolrTestCase 
– which (again for reasons i don't undestand) seems to be causing suite level 
thread leaks of Log4j AsyncLogger threads from any test that does not define 
it's own loggers – ie: something about how we are using async logging means 
that any SolrCloudTestCase that doesn't initialize a logger anywhere will leak 
a logger thread – and evidently the SolrCloudTestCase Logger was ensuring this 
didn't happen until it was removed by this jira...

 

As an example, starting with 71b869381ef0090a6e96eccbc9924ebdb4f57306 the 
trivial {{NamedListTest}} fails for me 100% of the time with leaked threads 
(regardless of seed) ...
{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=NamedListTest 
-Dtests.seed=F67D0AB0258C4521 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=yue-Hant -Dtests.timezone=Antarctica/South_Pole 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.00s | NamedListTest (suite) <<<
   [junit4]    > Throwable #1: 
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.common.util.NamedListTest: 
   [junit4]    >    1) Thread[id=16, name=Log4j2-TF-1-AsyncLoggerConfig-1, 
state=TIMED_WAITING, group=TGRP-NamedListTest]
   [junit4]    >         at 
java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method)
   [junit4]    >         at 
java.base@11.0.4/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
   [junit4]    >         at 
java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
   [junit4]    >         at 
app//com.lmax.disruptor.TimeoutBlockingWaitStrategy.waitFor(TimeoutBlockingWaitStrategy.java:38)
   [junit4]    >         at 
app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
   [junit4]    >         at 
app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
   [junit4]    >         at 
app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
   [junit4]    >         at 
java.base@11.0.4/java.lang.Thread.run(Thread.java:834)
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([F67D0AB0258C4521]:0)Throwable #2: 
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie 
threads that couldn't be terminated:
   [junit4]    >    1) Thread[id=16, name=Log4j2-TF-1-AsyncLoggerConfig-1, 
state=TIMED_WAITING, group=TGRP-NamedListTest]
   [junit4]    >         at 
java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method)
   [junit4]    >         at 
java.base@11.0.4/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
   [junit4]    >         at 
java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
   [junit4]    >         at 
app//com.lmax.disruptor.TimeoutBlockingWaitStrategy.waitFor(TimeoutBlockingWaitStrategy.java:38)
   [junit4]    >         at 
app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
   [junit4]    >         at 
app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
   [junit4]    >         at 
app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
   [junit4]    >         at 
java.base@11.0.4/java.lang.Thread.run(Thread.java:834)
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([F67D0AB0258C4521]:0)
   [junit4] Completed [1/1 (1!)] in 23.32s, 6 tests, 2 errors <<< FAILURES!
{noformat}
These failures do not happen w/ b21312f411bdfb069114846f31f45dcc6ec6ecb8 (the 
prior commit on the master branch) checked out.

 

> IndexSizeTriggerMixedBoundsTest does a lot of sleeping
> ------------------------------------------------------
>
>                 Key: SOLR-14247
>                 URL: https://issues.apache.org/jira/browse/SOLR-14247
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Tests
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Minor
>             Fix For: master (9.0)
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When I run tests locally, the slowest reported test is always 
> IndexSizeTriggerMixedBoundsTest  coming in at around 2 minutes.
> I took a look at the code and discovered that at least 80s of that is all 
> sleeps!
> There might need to be more synchronization and ordering added back in, but 
> when I removed all of the sleeps the test still passed locally for me, so I'm 
> not too sure what the point was or why we were slowing the system down so 
> much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to