dsmiley commented on code in PR #3384:
URL: https://github.com/apache/solr/pull/3384#discussion_r2147630116


##########
solr/core/src/java/org/apache/solr/blockcache/Metrics.java:
##########
@@ -54,8 +55,10 @@ public class Metrics extends SolrCacheBase implements 
SolrInfoBean {
   private SolrMetricsContext solrMetricsContext;
   private long previous = System.nanoTime();
 
+  // TODO SOLR-17458: Migrate to Otel

Review Comment:
   The Lucene & Solr projects use the magic word "NOCOMMIT" not TODO.  TODO 
means someday maybe.  NOCOMMIT means, do NOT merge this into a release branch.  
"precommit" will fail.
   
   But since you've written this so many times already, we could leave the 
existing ones as-is and remember to search for "TODO SOLR-17458"



##########
solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:
##########
@@ -193,14 +203,79 @@ public static class HandlerMetrics {
     public final Timer requestTimes;
     public final Counter totalTime;
 
-    public HandlerMetrics(SolrMetricsContext solrMetricsContext, String... 
metricPath) {
+    public AttributedLongCounter otelRequests;
+    public AttributedLongCounter otelNumErrors;
+    public AttributedLongCounter otelNumServerErrors;
+    public AttributedLongCounter otelNumClientErrors;
+    public AttributedLongCounter otelNumTimeouts;
+    public AttributedLongTimer otelRequestTimes;
+
+    public HandlerMetrics(
+        SolrMetricsContext solrMetricsContext, Attributes attributes, 
String... metricPath) {
+
+      // TODO SOLR-17458: To be removed
       numErrors = solrMetricsContext.meter("errors", metricPath);
       numServerErrors = solrMetricsContext.meter("serverErrors", metricPath);
       numClientErrors = solrMetricsContext.meter("clientErrors", metricPath);
       numTimeouts = solrMetricsContext.meter("timeouts", metricPath);
       requests = solrMetricsContext.counter("requests", metricPath);
       requestTimes = solrMetricsContext.timer("requestTimes", metricPath);
       totalTime = solrMetricsContext.counter("totalTime", metricPath);
+
+      var baseRequestMetric =
+          solrMetricsContext.longCounter("solr_metrics_core_requests", "HTTP 
Solr request counts");
+
+      var baseRequestTimeMetric =
+          solrMetricsContext.longHistogram(
+              "solr_metrics_core_requests_times", "HTTP Solr request times", 
"ms");
+
+      otelRequests =
+          new AttributedLongCounter(
+              baseRequestMetric,
+              Attributes.builder()
+                  .putAll(attributes)
+                  .put(AttributeKey.stringKey("type"), "requests")
+                  .build());
+
+      otelNumErrors =

Review Comment:
   I wonder if we're using OTEL the right way here to handle this scenario -- 
basically there are errors, and some are server and some are client.  I see you 
simply vary the "type" for each.  Maybe... instead we should only have the 
server & client, both have "type=errors", but differentiate by another 
attribute like "errorSource=client|server".  This still allows users to easily 
look at metrics undifferentiated if they don't care (i.e. errors of whatever 
source).  Presumably this would also work?



##########
solr/core/src/java/org/apache/solr/metrics/otel/instruments/AttributedLongTimer.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.metrics.otel.instruments;
+
+import io.opentelemetry.api.common.Attributes;
+import io.opentelemetry.api.metrics.LongHistogram;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * This class records the elapsed time (in milliseconds) to a {@link 
LongHistogram}. Each thread
+ * measures its own timer allowing concurrent timing operations on separate 
threads..
+ */
+public class AttributedLongTimer extends AttributedLongHistogram {
+
+  private final ThreadLocal<Long> startTimeNanos = new ThreadLocal<>();

Review Comment:
   The use of ThreadLocal concerns me; seems like an over-use of ThreadLocal.  
I could be convinced otherwise, though.  Instead of conceptually changing the 
state of this timer (scoped to this thread, granted), couldn't we return a 
Timer object that the caller closes?  Note that Solr has RTimer / RTimerTree 
you may want to see... perhaps return an RTimer subclass, for example.



##########
solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java:
##########
@@ -837,9 +838,11 @@ private CommitVersionInfo getIndexVersion() {
   }
 
   // TODO: Handle compatibility in 8.x

Review Comment:
   This comment adjacent to what you are adding seem obsolete; feel free to 
replace them



##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricsContext.java:
##########
@@ -116,21 +134,94 @@ public SolrMetricsContext getChildContext(Object child) {
    * Register a metric name that this component reports. This method is called 
by various metric
    * registration methods in {@link org.apache.solr.metrics.SolrMetricManager} 
in order to capture
    * what metric names are reported from this component (which in turn is 
called from {@link
-   * 
org.apache.solr.metrics.SolrMetricProducer#initializeMetrics(SolrMetricsContext,
 String)}).
+   * SolrMetricProducer#initializeMetrics(SolrMetricsContext,
+   * io.opentelemetry.api.common.Attributes, String)}).
    */
+  // TODO We can continue to register metric names
   public void registerMetricName(String name) {
     metricNames.add(name);
   }
 
   /** Return a snapshot of metric values that this component reports. */
+  // TODO Don't think this is needed anymore
   public Map<String, Object> getMetricsSnapshot() {
     return MetricUtils.convertMetrics(getMetricRegistry(), metricNames);
   }
 
+  public LongCounter longCounter(String metricName, String description) {

Review Comment:
   Seeing all these simply pass through to metricManager, I wonder if you plan 
to remove SolrMetricsContext once DropWizard is gone?  Or maybe it stays but 
you have yet to introduce OTEL scope, so it shows up here?



##########
solr/core/src/java/org/apache/solr/jersey/RequestMetricHandling.java:
##########
@@ -117,13 +122,17 @@ public void filter(
         final SolrJerseyResponse response = (SolrJerseyResponse) 
responseContext.getEntity();
         if (Boolean.TRUE.equals(response.responseHeader.partialResults)) {
           metrics.numTimeouts.mark();
+          metrics.otelNumTimeouts.inc();
         }
       } else {
         log.debug("Skipping partialResults check because entity was not 
SolrJerseyResponse");
       }
+      final Timer.Context dropwizardTimer =
+          (Timer.Context) requestContext.getProperty("dropwizardTimer");
+      metrics.totalTime.inc(dropwizardTimer.stop());
 
-      final Timer.Context timer = (Timer.Context) 
requestContext.getProperty(TIMER);
-      metrics.totalTime.inc(timer.stop());
+      final AttributedLongTimer timer = (AttributedLongTimer) 
requestContext.getProperty(TIMER);

Review Comment:
   minor but I recommend using `var` in these cases rather then repeating the 
type



##########
solr/core/src/java/org/apache/solr/metrics/otel/OtelDoubleMetric.java:
##########
@@ -14,9 +14,9 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+package org.apache.solr.metrics.otel;
 
-/**
- * The {@link 
org.apache.solr.metrics.prometheus.jvm.SolrPrometheusJvmFormatter} is 
responsible for
- * exporting solr.jvm registry metrics to Prometheus.
- */
-package org.apache.solr.metrics.prometheus.jvm;
+@FunctionalInterface
+public interface OtelDoubleMetric {

Review Comment:
   Does this serve a purpose?  Ditto for OtelLongMetric.



##########
solr/core/src/java/org/apache/solr/util/stats/MetricUtils.java:
##########
@@ -858,4 +861,20 @@ public static <T extends PlatformManagedObject> void 
addMXBeanMetrics(
   public static MeterProvider getMeterProvider() {
     return GlobalOpenTelemetry.getMeterProvider();
   }
+
+  public static Attributes createAttributes(String... attributes) {

Review Comment:
   I'm susipicious that this is a good idea since Attributes has a nice builder.



##########
solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:
##########
@@ -193,14 +203,79 @@ public static class HandlerMetrics {
     public final Timer requestTimes;
     public final Counter totalTime;
 
-    public HandlerMetrics(SolrMetricsContext solrMetricsContext, String... 
metricPath) {
+    public AttributedLongCounter otelRequests;
+    public AttributedLongCounter otelNumErrors;
+    public AttributedLongCounter otelNumServerErrors;
+    public AttributedLongCounter otelNumClientErrors;
+    public AttributedLongCounter otelNumTimeouts;
+    public AttributedLongTimer otelRequestTimes;
+
+    public HandlerMetrics(
+        SolrMetricsContext solrMetricsContext, Attributes attributes, 
String... metricPath) {
+
+      // TODO SOLR-17458: To be removed
       numErrors = solrMetricsContext.meter("errors", metricPath);
       numServerErrors = solrMetricsContext.meter("serverErrors", metricPath);
       numClientErrors = solrMetricsContext.meter("clientErrors", metricPath);
       numTimeouts = solrMetricsContext.meter("timeouts", metricPath);
       requests = solrMetricsContext.counter("requests", metricPath);
       requestTimes = solrMetricsContext.timer("requestTimes", metricPath);
       totalTime = solrMetricsContext.counter("totalTime", metricPath);
+
+      var baseRequestMetric =
+          solrMetricsContext.longCounter("solr_metrics_core_requests", "HTTP 
Solr request counts");
+
+      var baseRequestTimeMetric =
+          solrMetricsContext.longHistogram(
+              "solr_metrics_core_requests_times", "HTTP Solr request times", 
"ms");
+
+      otelRequests =
+          new AttributedLongCounter(
+              baseRequestMetric,
+              Attributes.builder()
+                  .putAll(attributes)
+                  .put(AttributeKey.stringKey("type"), "requests")
+                  .build());
+
+      otelNumErrors =
+          new AttributedLongCounter(
+              baseRequestMetric,
+              Attributes.builder()
+                  .putAll(attributes)
+                  .put(AttributeKey.stringKey("type"), "errors")
+                  .build());
+
+      otelNumServerErrors =
+          new AttributedLongCounter(
+              baseRequestMetric,
+              Attributes.builder()
+                  .putAll(attributes)
+                  .put(AttributeKey.stringKey("type"), "serverErrors")
+                  .build());
+
+      otelNumClientErrors =
+          new AttributedLongCounter(
+              baseRequestMetric,
+              Attributes.builder()
+                  .putAll(attributes)
+                  .put(AttributeKey.stringKey("type"), "clientErrors")
+                  .build());
+
+      otelNumTimeouts =
+          new AttributedLongCounter(
+              baseRequestMetric,
+              Attributes.builder()
+                  .putAll(attributes)
+                  .put(AttributeKey.stringKey("type"), "timeouts")
+                  .build());
+
+      otelRequestTimes = new AttributedLongTimer(baseRequestTimeMetric, 
attributes);
+
+      otelRequests.record(0L);
+      otelNumErrors.record(0L);
+      otelNumServerErrors.record(0L);
+      otelNumClientErrors.record(0L);
+      otelNumTimeouts.record(0L);

Review Comment:
   please remove.  I've argued in the past that we shouldn't be dumping metrics 
out for stuff that isn't used.



##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricManager.java:
##########
@@ -527,6 +717,7 @@ public void removeRegistry(String registry) {
    *     exist, so the swap operation will only rename the existing registry 
without creating an
    *     empty one under the previous name.
    */
+  // TODO SOLR-17458: Don't think we need

Review Comment:
   We might choose not to support core swapping in 10 or might have a 
disclaimer around metrics not swapping. 



##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricProducer.java:
##########
@@ -40,20 +41,16 @@ static String getUniqueMetricTag(Object o, String 
parentName) {
   }
 
   /**
-   * Initialize metrics specific to this producer.
-   *
-   * @param parentContext parent metrics context. If this component has the 
same life-cycle as the
-   *     parent it can simply use the parent context, otherwise it should 
obtain a child context
-   *     using {@link SolrMetricsContext#getChildContext(Object)} passing 
<code>this</code> as the
-   *     child object.
-   * @param scope component scope
+   * Deprecated entry point for initializing metrics. TODO SOLR-17458: This 
will be removed Change
+   * to only take attributes completely removing Dropwizard
    */
-  void initializeMetrics(SolrMetricsContext parentContext, String scope);
+  void initializeMetrics(SolrMetricsContext parentContext, Attributes 
attributes, String scope);

Review Comment:
   most of the lines of this PR were the impact of adding Attributes as a 
parameter.  Question:  could/should Attributes go into SolrMetricsContext?  
It's not clear to me what the javadocs of this method should say about 
Attributes.    I think writing some javadocs at this stage helps clarify to 
you/reviewers what the intention is.  I understand OTEL metrics is attributed 
but it's not clear why the caller of this method would be passing attributes to 
the component implementing SolrMetricProducer.



##########
solr/core/src/test/org/apache/solr/handler/RequestHandlerBaseTest.java:
##########
@@ -89,9 +116,16 @@ public void 
testEach4xxSolrExceptionIncrementsTheClientErrorCount() {
 
     RequestHandlerBase.processErrorMetricsOnException(e, metrics);
 
-    verify(metrics.numErrors).mark();
-    verify(metrics.numClientErrors).mark();
-    verifyNoInteractions(metrics.numServerErrors);
+    verify(mockLongCounter, times(1))
+        .add(eq(1L), argThat(attrs -> 
"errors".equals(attrs.get(AttributeKey.stringKey("type")))));

Review Comment:
   I suspect the idea here in this OTEL API is that you're expected to declare 
`AttributeKey.stringKey("type")` as a constant somewhere, perhaps TYPE_ATTR



##########
solr/core/src/test/org/apache/solr/metrics/SolrCoreMetricTest.java:
##########
@@ -1,82 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.solr.metrics;
-
-import com.codahale.metrics.Metric;
-import io.prometheus.metrics.model.snapshots.Labels;
-import org.apache.solr.SolrTestCaseJ4;
-import org.apache.solr.metrics.prometheus.SolrPrometheusFormatter;
-import org.apache.solr.metrics.prometheus.core.SolrCoreMetric;
-import org.junit.Test;
-
-public class SolrCoreMetricTest extends SolrTestCaseJ4 {

Review Comment:
   if you delete a test, it'd be good for your review to comment why



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to