[ 
https://issues.apache.org/jira/browse/SOLR-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655085#comment-17655085
 ] 

Uwe Schindler edited comment on SOLR-15254 at 1/5/23 6:20 PM:
--------------------------------------------------------------

At least this was the last deadlock regarding to SPI, all others looking 
similar were fixed before (including one with TermsEnum). In generel the 
problem is when a superclass in its static initializers somehow indriectly 
initializes subclasses. This can lead to a deadlock, because while the 
Superclass initializes itsself with clinit() method the subclasses load and 
those refer indirectly to their superclass. If they call something from there 
it breaks completely. The new code also detects this "call something before 
init was done".

Actually there are checks in static analysis, but neither forbiddenapis, ECJ 
compiler, errorprone have abilities to detect it. Interestingly the Netbeans 
IDE finds some of them, but not those that rely on reflection/dynamic loading 
(like SPI).


was (Author: thetaphi):
At least this was the last deadlock regarding to SPI, all others looking 
similar were fixed before (including one with TermsEnum). In generel the 
problem is when a superclass in its static initializers somehow indriectly 
initializes subclasses. This can lead to a deadlock.

> Random deadlock during cores initialization
> -------------------------------------------
>
>                 Key: SOLR-15254
>                 URL: https://issues.apache.org/jira/browse/SOLR-15254
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 8.5.1
>         Environment: openjdk 11.0.9.1
>            Reporter: Thomas Mortagne
>            Priority: Major
>
> At XWiki we are using Solr in an embbedded way with 5 cores, and sometimes we 
> end up with what appears to be a race condition leading to a deadlock between 
> several of the cores's configuration loading while loading some required 
> classes.
> Here is an extract of a thread dump (also attached is a more complete version 
> of the thread dump) we get in such a case:
> {noformat}
> "coreLoadExecutor-9-thread-1" #41 prio=5 os_prio=0 cpu=305.06ms 
> elapsed=101.24s tid=0x00007fb5cfc8f800 nid=0x57a in Object.wait()  
> [0x00007fb5717ec000]
>    java.lang.Thread.State: RUNNABLE
>       at java.lang.Class.forName0(java.base@11.0.9.1/Native Method)
>       at java.lang.Class.forName(java.base@11.0.9.1/Class.java:398)
>       at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:554)
>       at 
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:639)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:339)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:332)
>       at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:356)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:48)
>       at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
>       at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:178)
>       at 
> org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:105)
>       at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:178)
>       at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:46)
>       at 
> org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:145)
>       at 
> org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83)
>       at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302)
>       at 
> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
>       at 
> org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown
>  Source)
>       at 
> org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38)
>       at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
>       at 
> java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown
>  Source)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
>       at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> "coreLoadExecutor-9-thread-3" #43 prio=5 os_prio=0 cpu=294.37ms 
> elapsed=101.24s tid=0x00007fb5cfcdf800 nid=0x57c in Object.wait()  
> [0x00007fb5715ea000]
>    java.lang.Thread.State: RUNNABLE
>       at java.lang.Class.forName0(java.base@11.0.9.1/Native Method)
>       at java.lang.Class.forName(java.base@11.0.9.1/Class.java:398)
>       at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:554)
>       at 
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:639)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:339)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:332)
>       at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:356)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90)
>       at 
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:48)
>       at 
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
>       at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
>       at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:178)
>       at 
> org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:105)
>       at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:178)
>       at 
> org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:46)
>       at 
> org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:145)
>       at 
> org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83)
>       at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302)
>       at 
> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
>       at 
> org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown
>  Source)
>       at 
> org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38)
>       at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
>       at 
> java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown
>  Source)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
>       at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> "coreLoadExecutor-9-thread-2" #42 prio=5 os_prio=0 cpu=388.81ms 
> elapsed=101.24s tid=0x00007fb5cfcde800 nid=0x57b in Object.wait()  
> [0x00007fb5716eb000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> jdk.internal.misc.Unsafe.ensureClassInitialized0(java.base@11.0.9.1/Native 
> Method)
>       at 
> jdk.internal.misc.Unsafe.ensureClassInitialized(java.base@11.0.9.1/Unsafe.java:1042)
>       at 
> jdk.internal.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(java.base@11.0.9.1/UnsafeFieldAccessorFactory.java:43)
>       at 
> jdk.internal.reflect.ReflectionFactory.newFieldAccessor(java.base@11.0.9.1/ReflectionFactory.java:186)
>       at 
> java.lang.reflect.Field.acquireFieldAccessor(java.base@11.0.9.1/Field.java:1105)
>       at 
> java.lang.reflect.Field.getFieldAccessor(java.base@11.0.9.1/Field.java:1086)
>       at java.lang.reflect.Field.get(java.base@11.0.9.1/Field.java:418)
>       at 
> org.apache.lucene.analysis.util.AbstractAnalysisFactory.lookupSPIName(AbstractAnalysisFactory.java:358)
>       at 
> org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:90)
>       - locked <0x00000000e98e20a0> (a 
> org.apache.lucene.analysis.util.AnalysisSPILoader)
>       at 
> org.apache.lucene.analysis.util.AnalysisSPILoader.<init>(AnalysisSPILoader.java:66)
>       at 
> org.apache.lucene.analysis.util.AnalysisSPILoader.<init>(AnalysisSPILoader.java:52)
>       at 
> org.apache.lucene.analysis.util.TokenFilterFactory.<clinit>(TokenFilterFactory.java:33)
>       at 
> org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:256)
>       at 
> org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:196)
>       at 
> org.apache.solr.core.ConfigSetService$Standalone.createCoreResourceLoader(ConfigSetService.java:207)
>       at 
> org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:66)
>       at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302)
>       at 
> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
>       at 
> org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown
>  Source)
>       at 
> org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38)
>       at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
>       at 
> java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown
>  Source)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
>       at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> "main" #1 prio=5 os_prio=0 cpu=25108.09ms elapsed=133.50s 
> tid=0x00007fb5cc018800 nid=0x541 waiting on condition  [0x00007fb5d2f73000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at jdk.internal.misc.Unsafe.park(java.base@11.0.9.1/Native Method)
>       - parking to wait for  <0x00000000ea480000> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.9.1/LockSupport.java:234)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.9.1/AbstractQueuedSynchronizer.java:2123)
>       at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(java.base@11.0.9.1/ThreadPoolExecutor.java:1454)
>       at 
> com.codahale.metrics.InstrumentedExecutorService.awaitTermination(InstrumentedExecutorService.java:163)
>       at 
> org.apache.solr.common.util.ExecutorUtil.awaitTermination(ExecutorUtil.java:85)
>       at 
> org.apache.solr.common.util.ExecutorUtil.shutdownAndAwaitTermination(ExecutorUtil.java:77)
>       at org.apache.solr.core.CoreContainer.load(CoreContainer.java:843)
> "h2sc-4-thread-1" #38 prio=5 os_prio=0 cpu=0.20ms elapsed=101.89s 
> tid=0x00007fb5cfc16800 nid=0x577 runnable  [0x00007fb571cf0000]
>    java.lang.Thread.State: RUNNABLE
>       at sun.nio.ch.EPoll.wait(java.base@11.0.9.1/Native Method)
>       at 
> sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.9.1/EPollSelectorImpl.java:120)
>       at 
> sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.9.1/SelectorImpl.java:124)
>       - locked <0x00000000e9819cf0> (a sun.nio.ch.Util$2)
>       - locked <0x00000000e9819c98> (a sun.nio.ch.EPollSelectorImpl)
>       at 
> sun.nio.ch.SelectorImpl.select(java.base@11.0.9.1/SelectorImpl.java:141)
>       at 
> org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:149)
>       at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:156)
>       at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:572)
>       at 
> org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:509)
>       at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360)
>       at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184)
>       at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
>       at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
>       at 
> org.eclipse.jetty.io.ManagedSelector$$Lambda$437/0x0000000840b28c40.run(Unknown
>  Source)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>       at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown
>  Source)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
>       at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> {noformat}
> Notice the coreLoadExecutor-9-thread-* thread which despite their "RUNNABLE" 
> state are actually internally blocked waiting on some unknown object. The 
> main thread is stuck because it's waiting for the cores's initialization to 
> finish.
> I included "h2sc-4-thread-1" to have all the threads doing stuff but it does 
> not seems to have much to do with the deadlock (other than being a Solr thing 
> too).
> It's not very easy to debug since we cannot always reproduce but maybe you 
> guys have a clue for where this issue might be comming from based on those 
> informations.
> When we hit this problem we can "workaround" it by restarting a few times 
> until it pass the Solr init.
> We never had this issue back when we only had 1 or 2 cores but having 5 of 
> those is very recent for us so I cannot really tell if it's some recent Solr 
> regression or an old bug we did not had before because we did not had enough 
> cores to trigger it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to