[ https://issues.apache.org/jira/browse/SOLR-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531885#comment-17531885 ]
Michael Gibney commented on SOLR-16154: --------------------------------------- Hm.. can't argue with the clear reduction in failures following this merge! That said, I still can't think how the ZKEventListenerThread threads would be useful once shutdown has been called, and I'm fairly convinced that what the merged PR here fixed the thread leak in the sense that it causes "orderly shutdown" to block until these threads complete normally. That's certainly an improvement, but IIUC "these threads completing normally" can take a _really_ long time (standard nominal zkClientTimeout times, which are intended to control how long the the listener will try to connect to zk, seem to run 10, 30, 45, 90 seconds). And I say nominal/intended, because I'm pretty sure the way the zkClientTimeout is [converted to backoff retryCount|https://github.com/apache/solr/blob/e53179f439244c33082632aa1e936fe2c39c76c7/solr/solrj/src/java/org/apache/solr/common/cloud/ZkCmdExecutor.java#L48] actually inflates the "real" timeout by approximately a factor of 2: {code} timeoutms=30000, retryCount=8, retryDelay=1500 => sanityCheckms=54000 timeoutms=45000, retryCount=10, retryDelay=1500 => sanityCheckms=82500 timeoutms=60000, retryCount=11, retryDelay=1500 => sanityCheckms=99000 timeoutms=90000, retryCount=14, retryDelay=1500 => sanityCheckms=157500 {code} If I'm right about the lines along which I'm thinking here, if we leave main (with the merged PR) as-is, we may never see these errors in logs again ... but the delayed shutdowns would still be there -- some to the tune of over 2 minutes! > ZKEventListenerThread leaks from tests > -------------------------------------- > > Key: SOLR-16154 > URL: https://issues.apache.org/jira/browse/SOLR-16154 > Project: Solr > Issue Type: Test > Reporter: Mike Drob > Assignee: Mike Drob > Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Seen repeatedly on Jenkins. > {noformat} > com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from > SUITE scope at > org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO: > 1) Thread[id=1089, name=ZKEventListenerThread, state=TIMED_WAITING, > group=TGRP-TestSchemaDesignerSettingsDAO] > at java.base@18/java.lang.Thread.sleep(Native Method) > at > app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161) > at > app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82) > at > app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:361) > at > app//org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:75) > at > app//org.apache.lucene.analysis.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:302) > at > app//org.apache.lucene.analysis.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:293) > at > app//org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory.inform(AbstractWordsFileFilterFactory.java:88) > at > app//org.apache.solr.core.SolrResourceLoader.informAware(SolrResourceLoader.java:762) > at > app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsInChain(ManagedIndexSchema.java:1470) > at > app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsForFieldType(ManagedIndexSchema.java:1319) > at > app//org.apache.solr.schema.ManagedIndexSchema.postReadInform(ManagedIndexSchema.java:1307) > at > app//org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:654) > at > app//org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:188) > at > app//org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:119) > at > app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:279) > at > app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:51) > at > app//org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:342) > at > app//org.apache.solr.core.ConfigSetService.lambda$loadConfigSet$0(ConfigSetService.java:253) > at > app//org.apache.solr.core.ConfigSetService$$Lambda$632/0x0000000801137758.get(Unknown > Source) > at app//org.apache.solr.core.ConfigSet.<init>(ConfigSet.java:49) > at > app//org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:249) > at > app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1850) > at > app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394) > at > app//org.apache.solr.core.SolrCore$$Lambda$742/0x00000008011f2560.run(Unknown > Source) > at > app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2761) > at > app//org.apache.solr.cloud.ZkController$$Lambda$1153/0x00000008014e8938.run(Unknown > Source) > at java.base@18/java.lang.Thread.run(Thread.java:833) > at __randomizedtesting.SeedInfo.seed([DE9B93CA6D75B373]:0) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org