bruno-roustant commented on code in PR #1155:
URL: https://github.com/apache/solr/pull/1155#discussion_r1044244530


##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -70,10 +80,15 @@ public TransientSolrCoreCacheDefault(CoreContainer 
coreContainer) {
     Caffeine<String, SolrCore> transientCoresCacheBuilder =
         Caffeine.newBuilder()
             .initialCapacity(initialCapacity)
-            // Use the current thread to queue evicted cores for closing. This 
ensures the
-            // cache max size is respected (with a different thread the max 
size would be
-            // respected asynchronously only eventually).
-            .executor(Runnable::run)
+            // Do NOT use the current thread to queue evicted cores for 
closing. Although this
+            // ensures the cache max size is respected only eventually, it 
should be a very
+            // short period of time until the cache maintenance task kicks in.
+            // The onEvict method needs the writeLock from SolrCores to 
operate safely.
+            // However, the current thread most probably has acquired a 
read-lock already
+            // somewhere up the call stack and would deadlock.
+            // Note that Caffeine cache has an internal maintenance task 
rescheduling logic which
+            // explicitly checks on the common pool.
+            .executor(ForkJoinPool.commonPool())

Review Comment:
   Here SolrCore instances are not usual object hold in memory, they can be 
still in-use, they are ref-counted, there can be pending operations on them. We 
have to close them carefully, not GCed as weak references.
   It becomes clearer to me that this transient core cache cannot have a strict 
limit, because it must manage the cores until they are not used anymore and 
unloaded. It's more a target size, that the cache should aim to keep when there 
is no heavy activity on the cores.
   If we go with an overflow structure, it is probably not a cache but an 
unlimited size map, with its maintenance thread to monitor the cores to close 
when they are not used anymore.
   Or we keep a single cache from which we do not remove cores if they are 
still in-use (so not Caffeine). But we would still need a maintenance thread to 
reduce the cache size down to the target size when it is possible.
   I'll take some time to reflect on this and probably create a Jira issue on 
this subject.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to