kfaraz commented on code in PR #14443:
URL: https://github.com/apache/druid/pull/14443#discussion_r1236481793
##########
docs/operations/metrics.md:
##########
@@ -326,6 +326,12 @@ If `emitBalancingStats` is set to `true` in the
Coordinator [dynamic configurati
## General Health
+### Overlord/Coordinator
+
+| Metric | Description
| Dimensions | Normal
Value |
Review Comment:
Please use the formatting style used in the rest of the Druid docs.
##########
server/src/main/java/org/apache/druid/server/metrics/ServiceStatusMonitor.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import com.google.inject.Inject;
+import com.google.inject.name.Named;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+/**
+ * Monitor service running status.
+ * For Overlord/Coordinator, the metric reported is service leader count.
+ */
+public class ServiceStatusMonitor extends AbstractMonitor {
+
+ @Named("heartbeat")
+ @Inject(optional = true)
+ Supplier<Map<String, Number>> heartbeatTagsSupplier = null;
+
+ @Inject
+ public ServiceStatusMonitor() {
+ }
+
+ @Override
+ public boolean doMonitor(ServiceEmitter emitter) {
+ if (heartbeatTagsSupplier == null) {
+ return true;
+ }
+
+ heartbeatTagsSupplier.get().forEach((k, v) -> {
Review Comment:
If I am not mistaken, the tags should each be a separate dimension and not
be emitted as metric values.
The metric value will always be 1, as it is a simple count.
In the current code, you would be emitting the `druid/heartbeat` metric
multiple times in every invocation of `doMonitor`.
##########
server/src/main/java/org/apache/druid/server/metrics/ServiceStatusMonitor.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import com.google.inject.Inject;
+import com.google.inject.name.Named;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+/**
+ * Monitor service running status.
+ * For Overlord/Coordinator, the metric reported is service leader count.
+ */
+public class ServiceStatusMonitor extends AbstractMonitor {
+
+ @Named("heartbeat")
+ @Inject(optional = true)
+ Supplier<Map<String, Number>> heartbeatTagsSupplier = null;
+
+ @Inject
+ public ServiceStatusMonitor() {
+ }
+
+ @Override
+ public boolean doMonitor(ServiceEmitter emitter) {
+ if (heartbeatTagsSupplier == null) {
+ return true;
+ }
+
+ heartbeatTagsSupplier.get().forEach((k, v) -> {
Review Comment:
Please add a null check on the supplier, just in case someone wants to use
this monitor for other services, where the tags supplier is not being injected.
##########
docs/configuration/index.md:
##########
@@ -380,25 +380,26 @@ You can configure Druid processes to emit
[metrics](../operations/metrics.md) re
Metric monitoring is an essential part of Druid operations. The following
monitors are available:
-|Name|Description|
-|----|-----------|
-|`org.apache.druid.client.cache.CacheMonitor`|Emits metrics (to logs) about
the segment results cache for Historical and Broker processes. Reports typical
cache statistics include hits, misses, rates, and size (bytes and number of
entries), as well as timeouts and and errors.|
-|`org.apache.druid.java.util.metrics.SysMonitor`|Reports on various system
activities and statuses using the [SIGAR
library](https://github.com/hyperic/sigar). Requires execute privileges on
files in `java.io.tmpdir`. Do not set `java.io.tmpdir` to `noexec` when using
`SysMonitor`.|
-|`org.apache.druid.java.util.metrics.JvmMonitor`|Reports various JVM-related
statistics.|
-|`org.apache.druid.java.util.metrics.JvmCpuMonitor`|Reports statistics of CPU
consumption by the JVM.|
-|`org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor`|Reports consumed CPU
as per the cpuacct cgroup.|
-|`org.apache.druid.java.util.metrics.JvmThreadsMonitor`|Reports Thread
statistics in the JVM, like numbers of total, daemon, started, died threads.|
-|`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and
quotas as per the `cpu` cgroup.|
-|`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT
and memory node allocations as per the `cpuset` cgroup.|
-|`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory
statistic as per the memory cgroup.|
-|`org.apache.druid.server.metrics.EventReceiverFirehoseMonitor`|Reports how
many events have been queued in the EventReceiverFirehose.|
-|`org.apache.druid.server.metrics.HistoricalMetricsMonitor`|Reports statistics
on Historical processes. Available only on Historical processes.|
-|`org.apache.druid.server.metrics.SegmentStatsMonitor` | **EXPERIMENTAL**
Reports statistics about segments on Historical processes. Available only on
Historical processes. Not to be used when lazy loading is configured.
|
-|`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many
queries have been successful/failed/interrupted.|
-|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal
metrics of `http` or `parametrized` emitter (see below). Must not be used with
another emitter type. See the description of the metrics here:
https://github.com/apache/druid/pull/4973.|
-|`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many
ingestion tasks are currently running/pending/waiting and also the number of
successful/failed tasks per emission period.|
-|`org.apache.druid.server.metrics.TaskSlotCountStatsMonitor`|Reports metrics
about task slot usage per emission period.|
-|`org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor`|Reports how
many ingestion tasks are currently running/pending/waiting, the number of
successful/failed tasks, and metrics about task slot usage for the reporting
worker, per emission period. Only supported by middleManager node types.|
+| Name | Description
|
Review Comment:
I think the original formatting aligns better with the rest of the Druid
docs.
##########
server/src/main/java/org/apache/druid/server/metrics/ServiceStatusMonitor.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import com.google.inject.Inject;
+import com.google.inject.name.Named;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+/**
+ * Monitor service running status.
+ * For Overlord/Coordinator, the metric reported is service leader count.
+ */
+public class ServiceStatusMonitor extends AbstractMonitor {
+
+ @Named("heartbeat")
+ @Inject(optional = true)
+ Supplier<Map<String, Number>> heartbeatTagsSupplier = null;
+
+ @Inject
+ public ServiceStatusMonitor() {
Review Comment:
Is the empty constructor needed?
##########
server/src/test/java/org/apache/druid/server/metrics/ServiceStatusMonitorTest.java:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import org.apache.druid.java.util.metrics.StubServiceEmitter;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+public class ServiceStatusMonitorTest {
+
+ private ServiceStatusMonitor monitor;
+ private Map<String, Number> heartbeatTags = new HashMap<>();
+ private Supplier<Map<String, Number>> heartbeatTagsSupplier = () ->
heartbeatTags;
+
+ @Before
+ public void setUp() throws Exception {
+ monitor = new ServiceStatusMonitor();
+ monitor.heartbeatTagsSupplier = heartbeatTagsSupplier;
+ }
+
+ @Test
+ public void testLeaderCount() {
+ heartbeatTags.put("leader", 0);
+ final StubServiceEmitter emitter = new StubServiceEmitter("service",
"host");
+ monitor.doMonitor(emitter);
+
+ Assert.assertEquals(1, emitter.getEvents().size());
+ Assert.assertEquals("druid/heartbeat",
emitter.getEvents().get(0).toMap().get("metric"));
+ Assert.assertEquals(0, emitter.getEvents().get(0).toMap().get("value"));
+
+ heartbeatTags.put("leader", 1);
+ emitter.flush();
+ monitor.doMonitor(emitter);
+ Assert.assertEquals(1, emitter.getEvents().size());
+ Assert.assertEquals("druid/heartbeat",
emitter.getEvents().get(0).toMap().get("metric"));
+ Assert.assertEquals(1, emitter.getEvents().get(0).toMap().get("value"));
+ }
+}
Review Comment:
Nit: newline at end of file.
##########
docs/operations/metrics.md:
##########
@@ -326,6 +326,12 @@ If `emitBalancingStats` is set to `true` in the
Coordinator [dynamic configurati
## General Health
+### Overlord/Coordinator
Review Comment:
I think it can be used for all services, even though it might not be very
useful right now except for coordinator and overlord.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]