mlbiscoc commented on code in PR #3707:
URL: https://github.com/apache/solr/pull/3707#discussion_r2384458009


##########
gradle/testing/randomization/policies/solr-tests.policy:
##########
@@ -277,3 +277,7 @@ grant {
   // Allow testing effects of customized or bug-fixed dependencies locally 
(also need to add mavenLocal() to build)
   permission java.io.FilePermission "${user.home}${/}.m2${/}repository${/}-", 
"read";
 };
+
+grant {
+    permission jdk.jfr.FlightRecorderPermission "accessFlightRecorder";
+};

Review Comment:
   Added this because of JFR for JVM metrics from the OTEL library



##########
solr/core/src/test/org/apache/solr/core/TestSolrConfigHandler.java:
##########
@@ -1042,24 +1043,33 @@ public void testSetPropertyCacheSize() throws Exception 
{
 
   public void testSetPropertyEnableAndDisableCache() throws Exception {
     RESTfulServerProvider oldProvider = restTestHarness.getServerProvider();
-    // Enabling Cache
-    String payload = "{'set-property' : { 'query.documentCache.enabled': true} 
}";
-    runConfigCommand(restTestHarness, "/config", payload);
-    restTestHarness.setServerProvider(() -> getBaseUrl());
 
-    String prometheusMetrics = 
restTestHarness.query("/admin/metrics?wt=prometheus");
-    assertTrue(prometheusMetrics.contains("name=\"documentCache\""));
+    // We need a reference to the core because a reload from /config call 
closes the whole core thus
+    // closing metric registries
+    try (SolrCore core = 
solrClientTestRule.getCoreContainer().getCore("collection1")) {

Review Comment:
   I don't know if this was common knowledge but this was difficult to figure 
out. From what I can tell, when a reload happened here I think it closes the 
cores when there were no more references. Since we now close them on core 
close, `runConfigCommand` closed the registries just making all the metrics go 
null. So I have to keep a reference to the core so that the reload doesn't just 
tear down the meter provider in this test.



##########
solr/core/src/test/org/apache/solr/handler/admin/StatsReloadRaceTest.java:
##########
@@ -68,18 +69,19 @@ public void testParallelReloadAndStats() throws Exception {
                   "async",
                   "" + asyncId),
               new SolrQueryResponse());
-
-      boolean isCompleted;
-      do {
-        if (random.nextBoolean()) {
-          requestMetrics(true);
-        } else {
-          requestCoreStatus();
-        }
-
-        isCompleted = checkReloadCompletion(asyncId);
-      } while (!isCompleted);
-      requestMetrics(false);
+      try (SolrCore core = h.getCoreInc()) {

Review Comment:
   Same thing happened here as above, so need to keep references to avoid 
closing the meter providers



##########
solr/core/src/java/org/apache/solr/metrics/SolrCoreMetricManager.java:
##########
@@ -57,7 +56,7 @@ public class SolrCoreMetricManager implements Closeable {
   // rename
   private final List<MetricProducerInfo> registeredProducers = new 
ArrayList<>();
 
-  private record MetricProducerInfo(SolrMetricProducer producer, String scope) 
{}
+  private record MetricProducerInfo(SolrMetricProducer producer, Attributes 
attributes) {}

Review Comment:
   Did a little bit of cleanup on this because of this weird scope string.



##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricManager.java:
##########
@@ -889,12 +888,11 @@ private static MetricRegistry getOrCreateRegistry(
    */
   // NOCOMMIT: Remove this
   public void removeRegistry(String registry) {
-    meterProviderAndReaders.computeIfPresent(
-        enforcePrefix(registry),
-        (key, meterAndReader) -> {
-          IOUtils.closeQuietly(meterAndReader.sdkMeterProvider());
-          return null;
-        });
+    String key = enforcePrefix(registry);
+    MeterProviderAndReaders mpr = meterProviderAndReaders.remove(key);
+    if (mpr != null) {
+      IOUtils.closeQuietly(mpr.sdkMeterProvider());
+    }

Review Comment:
   This here was difficult to track down. Tests were running into a dead lock 
because of this. What I had found is that when cores were being closed the 
Meter providers were going with it. The periodic metric reader was being 
registered even when the exporter was a Noop. When the periodic metric reader 
is closed, it triggers one last metric collection for the exporter calling the 
observable instruments callbacks holding the lock for that concurrent hashmaps 
bucket. One of the callbacks was getting index size in SolrCore which had a 
searcher lock. At the same time, the Solr core closing looked to hold that same 
searcher lock creating a deadlock. Basically from what I can tell:
   
   Thread A: holds ConcurrentHashMaps bucket lock → waiting for searcherLock 
(inside the callback).
   Thread B: holds searcherLock from the Solr close call → waiting for CHM bin 
lock.
   
   Deadlock!
   
   So changed this to just close the provider without the computeIfPresent.



##########
solr/core/src/test/org/apache/solr/search/TestFiltersQueryCaching.java:
##########
@@ -79,7 +79,9 @@ private static long lookupFilterCacheHits(SolrCore core) {
     return (long) getFilterCacheHits(core).getValue();
   }
 
+  // NOCOMMIT: Core reload causing a flaky test for cache metrics
   @Test
+  @AwaitsFix(bugUrl = "https://issues.apache.org/jira/browse/SOLR-17458";)
   public void testRecursiveFilter() throws Exception {

Review Comment:
   I couldn't figure out this problem now that we close meter providers on 
unload and reload. All tests that pretty much do a reload and check cache 
metrics and SolrIndexSearcher metrics began to fail. When I test reload with a 
running solr cloud locally, metrics all reinitialize except for `Caffeine 
Cache` and `Solr index searcher` metrics as its hit or miss. Almost seems like 
a timing or race condition for a reload and when searchers register 
initializing metrics. I could not get to the bottom of why. Was hoping to get 
some help if someone understands why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to