[ 
https://issues.apache.org/jira/browse/SOLR-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531885#comment-17531885
 ] 

Michael Gibney commented on SOLR-16154:
---------------------------------------

Hm.. can't argue with the clear reduction in failures following this merge!

That said, I still can't think how the ZKEventListenerThread threads would be 
useful once shutdown has been called, and I'm fairly convinced that what the 
merged PR here fixed the thread leak in the sense that it causes "orderly 
shutdown" to block until these threads complete normally.

That's certainly an improvement, but IIUC "these threads completing normally" 
can take a _really_ long time (standard nominal zkClientTimeout times, which 
are intended to control how long the the listener will try to connect to zk, 
seem to run 10, 30, 45, 90 seconds). And I say nominal/intended, because I'm 
pretty sure the way the zkClientTimeout is [converted to backoff 
retryCount|https://github.com/apache/solr/blob/e53179f439244c33082632aa1e936fe2c39c76c7/solr/solrj/src/java/org/apache/solr/common/cloud/ZkCmdExecutor.java#L48]
 actually inflates the "real" timeout by approximately a factor of 2:

{code}
timeoutms=30000, retryCount=8, retryDelay=1500 => sanityCheckms=54000
timeoutms=45000, retryCount=10, retryDelay=1500 => sanityCheckms=82500
timeoutms=60000, retryCount=11, retryDelay=1500 => sanityCheckms=99000
timeoutms=90000, retryCount=14, retryDelay=1500 => sanityCheckms=157500
{code}

If I'm right about the lines along which I'm thinking here, if we leave main 
(with the merged PR) as-is, we may never see these errors in logs again ... but 
the delayed shutdowns would still be there -- some to the tune of over 2 
minutes!

> ZKEventListenerThread leaks from tests
> --------------------------------------
>
>                 Key: SOLR-16154
>                 URL: https://issues.apache.org/jira/browse/SOLR-16154
>             Project: Solr
>          Issue Type: Test
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Seen repeatedly on Jenkins.
> {noformat}
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from 
> SUITE scope at 
> org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO: 
>    1) Thread[id=1089, name=ZKEventListenerThread, state=TIMED_WAITING, 
> group=TGRP-TestSchemaDesignerSettingsDAO]
>         at java.base@18/java.lang.Thread.sleep(Native Method)
>         at 
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161)
>         at 
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82)
>         at 
> app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:361)
>         at 
> app//org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:75)
>         at 
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:302)
>         at 
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:293)
>         at 
> app//org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory.inform(AbstractWordsFileFilterFactory.java:88)
>         at 
> app//org.apache.solr.core.SolrResourceLoader.informAware(SolrResourceLoader.java:762)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsInChain(ManagedIndexSchema.java:1470)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsForFieldType(ManagedIndexSchema.java:1319)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.postReadInform(ManagedIndexSchema.java:1307)
>         at 
> app//org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:654)
>         at 
> app//org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:188)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:119)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:279)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:51)
>         at 
> app//org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:342)
>         at 
> app//org.apache.solr.core.ConfigSetService.lambda$loadConfigSet$0(ConfigSetService.java:253)
>         at 
> app//org.apache.solr.core.ConfigSetService$$Lambda$632/0x0000000801137758.get(Unknown
>  Source)
>         at app//org.apache.solr.core.ConfigSet.<init>(ConfigSet.java:49)
>         at 
> app//org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:249)
>         at 
> app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1850)
>         at 
> app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394)
>         at 
> app//org.apache.solr.core.SolrCore$$Lambda$742/0x00000008011f2560.run(Unknown 
> Source)
>         at 
> app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2761)
>         at 
> app//org.apache.solr.cloud.ZkController$$Lambda$1153/0x00000008014e8938.run(Unknown
>  Source)
>         at java.base@18/java.lang.Thread.run(Thread.java:833)
>       at __randomizedtesting.SeedInfo.seed([DE9B93CA6D75B373]:0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to