dsmiley commented on code in PR #3384:
URL: https://github.com/apache/solr/pull/3384#discussion_r2147630116
##########
solr/core/src/java/org/apache/solr/blockcache/Metrics.java:
##########
@@ -54,8 +55,10 @@ public class Metrics extends SolrCacheBase implements
SolrInfoBean {
private SolrMetricsContext solrMetricsContext;
private long previous = System.nanoTime();
+ // TODO SOLR-17458: Migrate to Otel
Review Comment:
The Lucene & Solr projects use the magic word "NOCOMMIT" not TODO. TODO
means someday maybe. NOCOMMIT means, do NOT merge this into a release branch.
"precommit" will fail.
But since you've written this so many times already, we could leave the
existing ones as-is and remember to search for "TODO SOLR-17458"
##########
solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:
##########
@@ -193,14 +203,79 @@ public static class HandlerMetrics {
public final Timer requestTimes;
public final Counter totalTime;
- public HandlerMetrics(SolrMetricsContext solrMetricsContext, String...
metricPath) {
+ public AttributedLongCounter otelRequests;
+ public AttributedLongCounter otelNumErrors;
+ public AttributedLongCounter otelNumServerErrors;
+ public AttributedLongCounter otelNumClientErrors;
+ public AttributedLongCounter otelNumTimeouts;
+ public AttributedLongTimer otelRequestTimes;
+
+ public HandlerMetrics(
+ SolrMetricsContext solrMetricsContext, Attributes attributes,
String... metricPath) {
+
+ // TODO SOLR-17458: To be removed
numErrors = solrMetricsContext.meter("errors", metricPath);
numServerErrors = solrMetricsContext.meter("serverErrors", metricPath);
numClientErrors = solrMetricsContext.meter("clientErrors", metricPath);
numTimeouts = solrMetricsContext.meter("timeouts", metricPath);
requests = solrMetricsContext.counter("requests", metricPath);
requestTimes = solrMetricsContext.timer("requestTimes", metricPath);
totalTime = solrMetricsContext.counter("totalTime", metricPath);
+
+ var baseRequestMetric =
+ solrMetricsContext.longCounter("solr_metrics_core_requests", "HTTP
Solr request counts");
+
+ var baseRequestTimeMetric =
+ solrMetricsContext.longHistogram(
+ "solr_metrics_core_requests_times", "HTTP Solr request times",
"ms");
+
+ otelRequests =
+ new AttributedLongCounter(
+ baseRequestMetric,
+ Attributes.builder()
+ .putAll(attributes)
+ .put(AttributeKey.stringKey("type"), "requests")
+ .build());
+
+ otelNumErrors =
Review Comment:
I wonder if we're using OTEL the right way here to handle this scenario --
basically there are errors, and some are server and some are client. I see you
simply vary the "type" for each. Maybe... instead we should only have the
server & client, both have "type=errors", but differentiate by another
attribute like "errorSource=client|server". This still allows users to easily
look at metrics undifferentiated if they don't care (i.e. errors of whatever
source). Presumably this would also work?
##########
solr/core/src/java/org/apache/solr/metrics/otel/instruments/AttributedLongTimer.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.metrics.otel.instruments;
+
+import io.opentelemetry.api.common.Attributes;
+import io.opentelemetry.api.metrics.LongHistogram;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * This class records the elapsed time (in milliseconds) to a {@link
LongHistogram}. Each thread
+ * measures its own timer allowing concurrent timing operations on separate
threads..
+ */
+public class AttributedLongTimer extends AttributedLongHistogram {
+
+ private final ThreadLocal<Long> startTimeNanos = new ThreadLocal<>();
Review Comment:
The use of ThreadLocal concerns me; seems like an over-use of ThreadLocal.
I could be convinced otherwise, though. Instead of conceptually changing the
state of this timer (scoped to this thread, granted), couldn't we return a
Timer object that the caller closes? Note that Solr has RTimer / RTimerTree
you may want to see... perhaps return an RTimer subclass, for example.
##########
solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java:
##########
@@ -837,9 +838,11 @@ private CommitVersionInfo getIndexVersion() {
}
// TODO: Handle compatibility in 8.x
Review Comment:
This comment adjacent to what you are adding seem obsolete; feel free to
replace them
##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricsContext.java:
##########
@@ -116,21 +134,94 @@ public SolrMetricsContext getChildContext(Object child) {
* Register a metric name that this component reports. This method is called
by various metric
* registration methods in {@link org.apache.solr.metrics.SolrMetricManager}
in order to capture
* what metric names are reported from this component (which in turn is
called from {@link
- *
org.apache.solr.metrics.SolrMetricProducer#initializeMetrics(SolrMetricsContext,
String)}).
+ * SolrMetricProducer#initializeMetrics(SolrMetricsContext,
+ * io.opentelemetry.api.common.Attributes, String)}).
*/
+ // TODO We can continue to register metric names
public void registerMetricName(String name) {
metricNames.add(name);
}
/** Return a snapshot of metric values that this component reports. */
+ // TODO Don't think this is needed anymore
public Map<String, Object> getMetricsSnapshot() {
return MetricUtils.convertMetrics(getMetricRegistry(), metricNames);
}
+ public LongCounter longCounter(String metricName, String description) {
Review Comment:
Seeing all these simply pass through to metricManager, I wonder if you plan
to remove SolrMetricsContext once DropWizard is gone? Or maybe it stays but
you have yet to introduce OTEL scope, so it shows up here?
##########
solr/core/src/java/org/apache/solr/jersey/RequestMetricHandling.java:
##########
@@ -117,13 +122,17 @@ public void filter(
final SolrJerseyResponse response = (SolrJerseyResponse)
responseContext.getEntity();
if (Boolean.TRUE.equals(response.responseHeader.partialResults)) {
metrics.numTimeouts.mark();
+ metrics.otelNumTimeouts.inc();
}
} else {
log.debug("Skipping partialResults check because entity was not
SolrJerseyResponse");
}
+ final Timer.Context dropwizardTimer =
+ (Timer.Context) requestContext.getProperty("dropwizardTimer");
+ metrics.totalTime.inc(dropwizardTimer.stop());
- final Timer.Context timer = (Timer.Context)
requestContext.getProperty(TIMER);
- metrics.totalTime.inc(timer.stop());
+ final AttributedLongTimer timer = (AttributedLongTimer)
requestContext.getProperty(TIMER);
Review Comment:
minor but I recommend using `var` in these cases rather then repeating the
type
##########
solr/core/src/java/org/apache/solr/metrics/otel/OtelDoubleMetric.java:
##########
@@ -14,9 +14,9 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
+package org.apache.solr.metrics.otel;
-/**
- * The {@link
org.apache.solr.metrics.prometheus.jvm.SolrPrometheusJvmFormatter} is
responsible for
- * exporting solr.jvm registry metrics to Prometheus.
- */
-package org.apache.solr.metrics.prometheus.jvm;
+@FunctionalInterface
+public interface OtelDoubleMetric {
Review Comment:
Does this serve a purpose? Ditto for OtelLongMetric.
##########
solr/core/src/java/org/apache/solr/util/stats/MetricUtils.java:
##########
@@ -858,4 +861,20 @@ public static <T extends PlatformManagedObject> void
addMXBeanMetrics(
public static MeterProvider getMeterProvider() {
return GlobalOpenTelemetry.getMeterProvider();
}
+
+ public static Attributes createAttributes(String... attributes) {
Review Comment:
I'm susipicious that this is a good idea since Attributes has a nice builder.
##########
solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:
##########
@@ -193,14 +203,79 @@ public static class HandlerMetrics {
public final Timer requestTimes;
public final Counter totalTime;
- public HandlerMetrics(SolrMetricsContext solrMetricsContext, String...
metricPath) {
+ public AttributedLongCounter otelRequests;
+ public AttributedLongCounter otelNumErrors;
+ public AttributedLongCounter otelNumServerErrors;
+ public AttributedLongCounter otelNumClientErrors;
+ public AttributedLongCounter otelNumTimeouts;
+ public AttributedLongTimer otelRequestTimes;
+
+ public HandlerMetrics(
+ SolrMetricsContext solrMetricsContext, Attributes attributes,
String... metricPath) {
+
+ // TODO SOLR-17458: To be removed
numErrors = solrMetricsContext.meter("errors", metricPath);
numServerErrors = solrMetricsContext.meter("serverErrors", metricPath);
numClientErrors = solrMetricsContext.meter("clientErrors", metricPath);
numTimeouts = solrMetricsContext.meter("timeouts", metricPath);
requests = solrMetricsContext.counter("requests", metricPath);
requestTimes = solrMetricsContext.timer("requestTimes", metricPath);
totalTime = solrMetricsContext.counter("totalTime", metricPath);
+
+ var baseRequestMetric =
+ solrMetricsContext.longCounter("solr_metrics_core_requests", "HTTP
Solr request counts");
+
+ var baseRequestTimeMetric =
+ solrMetricsContext.longHistogram(
+ "solr_metrics_core_requests_times", "HTTP Solr request times",
"ms");
+
+ otelRequests =
+ new AttributedLongCounter(
+ baseRequestMetric,
+ Attributes.builder()
+ .putAll(attributes)
+ .put(AttributeKey.stringKey("type"), "requests")
+ .build());
+
+ otelNumErrors =
+ new AttributedLongCounter(
+ baseRequestMetric,
+ Attributes.builder()
+ .putAll(attributes)
+ .put(AttributeKey.stringKey("type"), "errors")
+ .build());
+
+ otelNumServerErrors =
+ new AttributedLongCounter(
+ baseRequestMetric,
+ Attributes.builder()
+ .putAll(attributes)
+ .put(AttributeKey.stringKey("type"), "serverErrors")
+ .build());
+
+ otelNumClientErrors =
+ new AttributedLongCounter(
+ baseRequestMetric,
+ Attributes.builder()
+ .putAll(attributes)
+ .put(AttributeKey.stringKey("type"), "clientErrors")
+ .build());
+
+ otelNumTimeouts =
+ new AttributedLongCounter(
+ baseRequestMetric,
+ Attributes.builder()
+ .putAll(attributes)
+ .put(AttributeKey.stringKey("type"), "timeouts")
+ .build());
+
+ otelRequestTimes = new AttributedLongTimer(baseRequestTimeMetric,
attributes);
+
+ otelRequests.record(0L);
+ otelNumErrors.record(0L);
+ otelNumServerErrors.record(0L);
+ otelNumClientErrors.record(0L);
+ otelNumTimeouts.record(0L);
Review Comment:
please remove. I've argued in the past that we shouldn't be dumping metrics
out for stuff that isn't used.
##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricManager.java:
##########
@@ -527,6 +717,7 @@ public void removeRegistry(String registry) {
* exist, so the swap operation will only rename the existing registry
without creating an
* empty one under the previous name.
*/
+ // TODO SOLR-17458: Don't think we need
Review Comment:
We might choose not to support core swapping in 10 or might have a
disclaimer around metrics not swapping.
##########
solr/core/src/java/org/apache/solr/metrics/SolrMetricProducer.java:
##########
@@ -40,20 +41,16 @@ static String getUniqueMetricTag(Object o, String
parentName) {
}
/**
- * Initialize metrics specific to this producer.
- *
- * @param parentContext parent metrics context. If this component has the
same life-cycle as the
- * parent it can simply use the parent context, otherwise it should
obtain a child context
- * using {@link SolrMetricsContext#getChildContext(Object)} passing
<code>this</code> as the
- * child object.
- * @param scope component scope
+ * Deprecated entry point for initializing metrics. TODO SOLR-17458: This
will be removed Change
+ * to only take attributes completely removing Dropwizard
*/
- void initializeMetrics(SolrMetricsContext parentContext, String scope);
+ void initializeMetrics(SolrMetricsContext parentContext, Attributes
attributes, String scope);
Review Comment:
most of the lines of this PR were the impact of adding Attributes as a
parameter. Question: could/should Attributes go into SolrMetricsContext?
It's not clear to me what the javadocs of this method should say about
Attributes. I think writing some javadocs at this stage helps clarify to
you/reviewers what the intention is. I understand OTEL metrics is attributed
but it's not clear why the caller of this method would be passing attributes to
the component implementing SolrMetricProducer.
##########
solr/core/src/test/org/apache/solr/handler/RequestHandlerBaseTest.java:
##########
@@ -89,9 +116,16 @@ public void
testEach4xxSolrExceptionIncrementsTheClientErrorCount() {
RequestHandlerBase.processErrorMetricsOnException(e, metrics);
- verify(metrics.numErrors).mark();
- verify(metrics.numClientErrors).mark();
- verifyNoInteractions(metrics.numServerErrors);
+ verify(mockLongCounter, times(1))
+ .add(eq(1L), argThat(attrs ->
"errors".equals(attrs.get(AttributeKey.stringKey("type")))));
Review Comment:
I suspect the idea here in this OTEL API is that you're expected to declare
`AttributeKey.stringKey("type")` as a constant somewhere, perhaps TYPE_ATTR
##########
solr/core/src/test/org/apache/solr/metrics/SolrCoreMetricTest.java:
##########
@@ -1,82 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements. See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.solr.metrics;
-
-import com.codahale.metrics.Metric;
-import io.prometheus.metrics.model.snapshots.Labels;
-import org.apache.solr.SolrTestCaseJ4;
-import org.apache.solr.metrics.prometheus.SolrPrometheusFormatter;
-import org.apache.solr.metrics.prometheus.core.SolrCoreMetric;
-import org.junit.Test;
-
-public class SolrCoreMetricTest extends SolrTestCaseJ4 {
Review Comment:
if you delete a test, it'd be good for your review to comment why
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]