[ https://issues.apache.org/jira/browse/SOLR-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter reopened SOLR-14247: --------------------------------------- Why did this issue modify SolrTestCase.java ? ? ? For reasons i don't understand, this issue removed the Logger from SolrTestCase – which (again for reasons i don't undestand) seems to be causing suite level thread leaks of Log4j AsyncLogger threads from any test that does not define it's own loggers – ie: something about how we are using async logging means that any SolrCloudTestCase that doesn't initialize a logger anywhere will leak a logger thread – and evidently the SolrCloudTestCase Logger was ensuring this didn't happen until it was removed by this jira... As an example, starting with 71b869381ef0090a6e96eccbc9924ebdb4f57306 the trivial {{NamedListTest}} fails for me 100% of the time with leaked threads (regardless of seed) ... {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=NamedListTest -Dtests.seed=F67D0AB0258C4521 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=yue-Hant -Dtests.timezone=Antarctica/South_Pole -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.00s | NamedListTest (suite) <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.common.util.NamedListTest: [junit4] > 1) Thread[id=16, name=Log4j2-TF-1-AsyncLoggerConfig-1, state=TIMED_WAITING, group=TGRP-NamedListTest] [junit4] > at java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method) [junit4] > at java.base@11.0.4/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) [junit4] > at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) [junit4] > at app//com.lmax.disruptor.TimeoutBlockingWaitStrategy.waitFor(TimeoutBlockingWaitStrategy.java:38) [junit4] > at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56) [junit4] > at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159) [junit4] > at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) [junit4] > at java.base@11.0.4/java.lang.Thread.run(Thread.java:834) [junit4] > at __randomizedtesting.SeedInfo.seed([F67D0AB0258C4521]:0)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated: [junit4] > 1) Thread[id=16, name=Log4j2-TF-1-AsyncLoggerConfig-1, state=TIMED_WAITING, group=TGRP-NamedListTest] [junit4] > at java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method) [junit4] > at java.base@11.0.4/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) [junit4] > at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) [junit4] > at app//com.lmax.disruptor.TimeoutBlockingWaitStrategy.waitFor(TimeoutBlockingWaitStrategy.java:38) [junit4] > at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56) [junit4] > at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159) [junit4] > at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) [junit4] > at java.base@11.0.4/java.lang.Thread.run(Thread.java:834) [junit4] > at __randomizedtesting.SeedInfo.seed([F67D0AB0258C4521]:0) [junit4] Completed [1/1 (1!)] in 23.32s, 6 tests, 2 errors <<< FAILURES! {noformat} These failures do not happen w/ b21312f411bdfb069114846f31f45dcc6ec6ecb8 (the prior commit on the master branch) checked out. > IndexSizeTriggerMixedBoundsTest does a lot of sleeping > ------------------------------------------------------ > > Key: SOLR-14247 > URL: https://issues.apache.org/jira/browse/SOLR-14247 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests > Reporter: Mike Drob > Assignee: Mike Drob > Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > When I run tests locally, the slowest reported test is always > IndexSizeTriggerMixedBoundsTest coming in at around 2 minutes. > I took a look at the code and discovered that at least 80s of that is all > sleeps! > There might need to be more synchronization and ordering added back in, but > when I removed all of the sleeps the test still passed locally for me, so I'm > not too sure what the point was or why we were slowing the system down so > much. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org