mlbiscoc commented on code in PR #3430: URL: https://github.com/apache/solr/pull/3430#discussion_r2219601289
########## solr/core/src/test/org/apache/solr/metrics/SolrMetricsIntegrationTest.java: ########## @@ -49,6 +50,10 @@ import org.junit.Before; import org.junit.Test; +// NOCOMMIT: Test fails because of the @After assert on index path. Was going to migrate to just +// check the registry for the core is deleted but this test does a rename operation which otel has +// not addressed yet. Need to migrate the rename operation to otel first. [email protected](bugUrl = "https://issues.apache.org/jira/browse/SOLR-17458") public class SolrMetricsIntegrationTest extends SolrTestCaseJ4 { Review Comment: In that case we need the renaming or even swapping might be a bit awkward. From my knowledge, we can't rename an SDKMeterProvider and all the metrics it has collected from the attributes. Must have been easier in Dropwizard because it was hierarchical and you just rename the parent. For OTEL, I'm pretty sure we would need to completely delete the SDKMeterProviders and recreate it meaning you lose the state of the captured metrics at that point for the core. No different from a restart of the node technically but its something that would need to be noted. ########## solr/core/src/java/org/apache/solr/metrics/reporters/SolrJmxReporter.java: ########## @@ -37,6 +37,8 @@ * <p>NOTE: {@link com.codahale.metrics.jmx.JmxReporter} that this class uses exports only newly * added metrics (it doesn't process already existing metrics in a registry) */ +// NOCOMMIT: This JMX reporter looks to be wrapped over a Dropwizard registry. We need to migrate +// this to OTEL. Maybe we can look at available otel shims for JMX? public class SolrJmxReporter extends FilteringSolrMetricReporter { Review Comment: We can assume to drop it for now. If users decide they want a JMX port, maybe we just actually create a bridge for it in a later release of Solr 10.x? This can also apply for the Slf4j reporter ########## solr/core/src/java/org/apache/solr/core/SolrCore.java: ########## @@ -1305,92 +1321,129 @@ private SolrCoreMetricManager initCoreMetricManager(SolrConfig config) { return coreMetricManager; } - // TODO SOLR-17458: Migrate to Otel @Override public void initializeMetrics( SolrMetricsContext parentContext, Attributes attributes, String scope) { - newSearcherCounter = parentContext.counter("new", Category.SEARCHER.toString()); - newSearcherTimer = parentContext.timer("time", Category.SEARCHER.toString(), "new"); - newSearcherWarmupTimer = parentContext.timer("warmup", Category.SEARCHER.toString(), "new"); + + var searcherAttributesBuilder = + MetricUtils.baseAttributesSupplier( + attributes.toBuilder().put(CATEGORY_ATTR, Category.SEARCHER.toString()).build()); + var gaugeCoreAttributesBuilder = + MetricUtils.baseAttributesSupplier( + attributes.toBuilder().put(CATEGORY_ATTR, Category.CORE.toString()).build()); + + Attributes baseSearcherAttributes = searcherAttributesBuilder.get().build(); + Attributes baseGaugeCoreAttributes = gaugeCoreAttributesBuilder.get().build(); + + var baseSearcherTimerMetric = + parentContext.longHistogram("solr_searcher_timer", "Timer for opening new searchers", "ms"); + + newSearcherCounter = + new AttributedLongCounter( + parentContext.longCounter( + "solr_core_new_searcher", "Total number of new searchers opened"), + baseSearcherAttributes); + newSearcherMaxReachedCounter = - parentContext.counter("maxReached", Category.SEARCHER.toString(), "new"); + new AttributedLongCounter( + parentContext.longCounter( + "solr_core_max_reached_searcher", + "Total number of maximum number of concurrent warming searchers reached"), + baseSearcherAttributes); + newSearcherOtherErrorsCounter = - parentContext.counter("errors", Category.SEARCHER.toString(), "new"); - - parentContext.gauge( - () -> name == null ? parentContext.nullString() : name, - true, - "coreName", - Category.CORE.toString()); - parentContext.gauge(() -> startTime, true, "startTime", Category.CORE.toString()); + new AttributedLongCounter( + parentContext.longCounter( + "solr_core_searcher_errors", "Total number of searcher errors"), + baseSearcherAttributes); + + newSearcherTimer = + new AttributedLongTimer( + baseSearcherTimerMetric, searcherAttributesBuilder.get().put(TYPE_ATTR, "new").build()); + + newSearcherWarmupTimer = + new AttributedLongTimer( + baseSearcherTimerMetric, + searcherAttributesBuilder.get().put(TYPE_ATTR, "warmup").build()); + parentContext.gauge(() -> getOpenCount(), true, "refCount", Category.CORE.toString()); - parentContext.gauge( - () -> getInstancePath().toString(), true, "instanceDir", Category.CORE.toString()); - parentContext.gauge( - () -> isClosed() ? parentContext.nullString() : getIndexDir(), - true, - "indexDir", - Category.CORE.toString()); - parentContext.gauge( - () -> isClosed() ? parentContext.nullNumber() : getIndexSize(), - true, - "sizeInBytes", - Category.INDEX.toString()); - parentContext.gauge( - () -> isClosed() ? parentContext.nullString() : NumberUtils.readableSize(getIndexSize()), - true, - "size", - Category.INDEX.toString()); - parentContext.gauge( - () -> isReady() ? getSegmentCount() : parentContext.nullNumber(), - true, - "segments", - Category.INDEX.toString()); - - final CloudDescriptor cd = getCoreDescriptor().getCloudDescriptor(); - if (cd != null) { - parentContext.gauge(cd::getCollectionName, true, "collection", Category.CORE.toString()); - parentContext.gauge(cd::getShardId, true, "shard", Category.CORE.toString()); - parentContext.gauge(cd::isLeader, true, "isLeader", Category.CORE.toString()); - parentContext.gauge( - () -> String.valueOf(cd.getLastPublished()), - true, - "replicaState", - Category.CORE.toString()); - } - - // initialize disk total / free metrics - Path dataDirPath = Path.of(dataDir); - parentContext.gauge( - () -> { + + parentContext.observableLongGauge( + "solr_core_ref_count", + "The current number of active references to a Solr core", + (observableLongMeasurement -> { + observableLongMeasurement.record(getOpenCount(), baseGaugeCoreAttributes); + })); + + parentContext.observableLongGauge( + "solr_core_disk_space", + "Solr core disk space metrics", + (observableLongMeasurement -> { + + // initialize disk total / free metrics + Path dataDirPath = Path.of(dataDir); + var totalSpaceAttributes = + gaugeCoreAttributesBuilder.get().put(TYPE_ATTR, "total_space").build(); + var usableSpaceAttributes = + gaugeCoreAttributesBuilder.get().put(TYPE_ATTR, "usable_space").build(); try { - return Files.getFileStore(dataDirPath).getTotalSpace(); + observableLongMeasurement.record( + Files.getFileStore(dataDirPath).getTotalSpace(), totalSpaceAttributes); } catch (IOException e) { - return 0L; + observableLongMeasurement.record(0L, totalSpaceAttributes); } - }, - true, - "totalSpace", - Category.CORE.toString(), - "fs"); - parentContext.gauge( - () -> { try { - return Files.getFileStore(dataDirPath).getUsableSpace(); + observableLongMeasurement.record( + Files.getFileStore(dataDirPath).getUsableSpace(), usableSpaceAttributes); } catch (IOException e) { - return 0L; + observableLongMeasurement.record(0L, usableSpaceAttributes); } - }, - true, - "usableSpace", - Category.CORE.toString(), - "fs"); - parentContext.gauge( - () -> dataDirPath.toAbsolutePath().toString(), - true, - "path", - Category.CORE.toString(), - "fs"); + }), + "By"); + + parentContext.observableLongGauge( + "solr_core_index_size", + "Index size for a Solr core", + (observableLongMeasurement -> { + if (!isClosed()) + observableLongMeasurement.record(getIndexSize(), baseGaugeCoreAttributes); + }), + "By"); + + parentContext.observableLongGauge( + "solr_core_segment_count", + "Number of segments in a Solr core", + (observableLongMeasurement -> { + if (isReady()) + observableLongMeasurement.record(getSegmentCount(), baseGaugeCoreAttributes); + })); + + // NOCOMMIT: Do we need these start_time metrics? I think at minimum it should be optional Review Comment: I believe OTEL might have something to get timestamp but not sure. The prometheus response writer can be enabled with a boolean, but we will explode the cardinality as it will be the created timestamp of every metric created like this example: ``` solr_metrics_core_requests_errors_created{category="QUERY",collection="gettingstarted",core="gettingstarted_shard1_replica_n1",handler="/export",internal="false",otel_scope_name="org.apache.solr",replica="replica_n1",shard="shard1",source="client"} 1753111730.894 ``` If we want start_times we should do it ourselves as the start of the metric created doesn't necessarily match the component actually started. ########## solr/core/src/java/org/apache/solr/core/SolrCore.java: ########## @@ -1305,92 +1321,129 @@ private SolrCoreMetricManager initCoreMetricManager(SolrConfig config) { return coreMetricManager; } - // TODO SOLR-17458: Migrate to Otel @Override public void initializeMetrics( SolrMetricsContext parentContext, Attributes attributes, String scope) { - newSearcherCounter = parentContext.counter("new", Category.SEARCHER.toString()); - newSearcherTimer = parentContext.timer("time", Category.SEARCHER.toString(), "new"); - newSearcherWarmupTimer = parentContext.timer("warmup", Category.SEARCHER.toString(), "new"); + + var searcherAttributesBuilder = Review Comment: I created it as a simple way to get a base Attributes set. If I keep reusing a builder, then I would need to clone the builder to add a new label and/or mutate it to remove/overwrite the old ones then build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
