[ 
https://issues.apache.org/jira/browse/SOLR-17118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866525#comment-17866525
 ] 

ASF subversion and git services commented on SOLR-17118:
--------------------------------------------------------

Commit 53a2a36df2fb843df640ce51ead4a6e0f65d8e74 in solr's branch 
refs/heads/backport_SOLR-16842_to_branch_9x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=53a2a36df2f ]

SOLR-17118: Simplify/fix CoreContainerProvider initialization (#2474)

Thereby fixing a rare occurrence of Solr hanging on startup.

CoreContainerProvider:  Don't need any CountDownLatch (x2), the synchronized 
WeakHashMap of "services", the ServiceHolder, the ContextInitializationKey.  No 
looping to wait for initialization.

JettySolrRunner: incorporate the CoreContainerProvider and various servlet 
filters in a normal way -- add all this before starting, not after.  Thus Jetty 
will shut them down properly so we don't have to.  Removed some needless 
synchronized wait/notify and other needless stuff.

HealthCheckHandlerTest was shutting down CoreContainer improperly; this subset 
of the test was removed.

(cherry picked from commit 1177796e32631c62d8f00e7df4341c92b75e1617)


> Solr deadlock during servlet container start
> --------------------------------------------
>
>                 Key: SOLR-17118
>                 URL: https://issues.apache.org/jira/browse/SOLR-17118
>             Project: Solr
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 9.2.1
>            Reporter: Andreas Hubold
>            Assignee: David Smiley
>            Priority: Major
>              Labels: deadlock, servlet-context
>             Fix For: 9.7
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In rare cases, Solr can run into a deadlock when started. The servlet 
> container startup thread gets blocked and there's no other thread that could 
> unblock it:
> {noformat}
> "main" #1 prio=5 os_prio=0 cpu=5922.39ms elapsed=7490.27s 
> tid=0x00007f637402ae70 nid=0x47 waiting on condition [0x00007f6379488000]
>    java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)
>     - parking to wait for  <0x0000000081da8000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>     at java.util.concurrent.locks.LockSupport.park(java.base@17.0.9/Unknown 
> Source)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.9/Unknown
>  Source)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@17.0.9/Unknown
>  Source)
>     at java.util.concurrent.CountDownLatch.await(java.base@17.0.9/Unknown 
> Source)
>     at 
> org.apache.solr.servlet.CoreContainerProvider$ContextInitializationKey.waitForReadyService(CoreContainerProvider.java:523)
>     at 
> org.apache.solr.servlet.CoreContainerProvider$ServiceHolder.getService(CoreContainerProvider.java:562)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:148)
>     at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:133)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:725)
>     at 
> org.eclipse.jetty.servlet.ServletHandler$$Lambda$315/0x00007f62fc2674b8.accept(Unknown
>  Source)
>     at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base@17.0.9/Unknown
>  Source)
>     at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining(java.base@17.0.9/Unknown
>  Source)
>     at 
> java.util.stream.ReferencePipeline$Head.forEach(java.base@17.0.9/Unknown 
> Source)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
>     at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  
> {noformat}
> ContextInitializationKey.waitForReadyService should have been unblocked by 
> CoreContainerProvider#init, which is calling ServiceHolder#setService. This 
> should work because CoreContainerProvider#init is always called before 
> SolrDispatchFilter#init (ServletContextListeners are initialized before 
> Filters). 
> But there's a problem: CoreContainerProvider#init stores the 
> ContextInitializationKey and the mapped ServiceHolder in 
> CoreContainerProvider#services, and that's a *WeakHashMap*: 
> {code:java}
>       services 
>           .computeIfAbsent(new ContextInitializationKey(servletContext), 
> ServiceHolder::new) 
>           .setService(this); 
> {code}
> The key is not referenced anywhere else, which makes the mapping a candidate 
> for garbage collection. The ServiceHolder value also does not reference the 
> key anymore, because #setService cleared the reference. 
> With bad luck, the mapping is already gone from the WeakHashMap before 
> SolrDispatchFilter#init tries to retrieve it with 
> CoreContainerProvider#serviceForContext. And that method will then create a 
> new ContextInitializationKey and ServiceHolder, which is then used for 
> #waitForReadyService. But such a new ContextInitializationKey has never 
> received a #makeReady call, and #waitForReadyService will block forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to