kfaraz commented on code in PR #14443:
URL: https://github.com/apache/druid/pull/14443#discussion_r1236481793


##########
docs/operations/metrics.md:
##########
@@ -326,6 +326,12 @@ If `emitBalancingStats` is set to `true` in the 
Coordinator [dynamic configurati
 
 ## General Health
 
+### Overlord/Coordinator
+
+| Metric         | Description                                                 
                                                  | Dimensions      | Normal 
Value |

Review Comment:
   Please use the formatting style used in the rest of the Druid docs.



##########
server/src/main/java/org/apache/druid/server/metrics/ServiceStatusMonitor.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import com.google.inject.Inject;
+import com.google.inject.name.Named;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+/**
+ * Monitor service running status.
+ * For Overlord/Coordinator, the metric reported is service leader count.
+ */
+public class ServiceStatusMonitor extends AbstractMonitor {
+
+  @Named("heartbeat")
+  @Inject(optional = true)
+  Supplier<Map<String, Number>> heartbeatTagsSupplier = null;
+
+  @Inject
+  public ServiceStatusMonitor() {
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter) {
+    if (heartbeatTagsSupplier == null) {
+      return true;
+    }
+
+    heartbeatTagsSupplier.get().forEach((k, v) -> {

Review Comment:
   If I am not mistaken, the tags should each be a separate dimension and not 
be emitted as metric values.
   The metric value will always be 1, as it is a simple count.
   
   In the current code, you would be emitting the `druid/heartbeat` metric 
multiple times in every invocation of `doMonitor`.



##########
server/src/main/java/org/apache/druid/server/metrics/ServiceStatusMonitor.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import com.google.inject.Inject;
+import com.google.inject.name.Named;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+/**
+ * Monitor service running status.
+ * For Overlord/Coordinator, the metric reported is service leader count.
+ */
+public class ServiceStatusMonitor extends AbstractMonitor {
+
+  @Named("heartbeat")
+  @Inject(optional = true)
+  Supplier<Map<String, Number>> heartbeatTagsSupplier = null;
+
+  @Inject
+  public ServiceStatusMonitor() {
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter) {
+    if (heartbeatTagsSupplier == null) {
+      return true;
+    }
+
+    heartbeatTagsSupplier.get().forEach((k, v) -> {

Review Comment:
   Please add a null check on the supplier, just in case someone wants to use 
this monitor for other services, where the tags supplier is not being injected.



##########
docs/configuration/index.md:
##########
@@ -380,25 +380,26 @@ You can configure Druid processes to emit 
[metrics](../operations/metrics.md) re
 
 Metric monitoring is an essential part of Druid operations.  The following 
monitors are available:
 
-|Name|Description|
-|----|-----------|
-|`org.apache.druid.client.cache.CacheMonitor`|Emits metrics (to logs) about 
the segment results cache for Historical and Broker processes. Reports typical 
cache statistics include hits, misses, rates, and size (bytes and number of 
entries), as well as timeouts and and errors.|
-|`org.apache.druid.java.util.metrics.SysMonitor`|Reports on various system 
activities and statuses using the [SIGAR 
library](https://github.com/hyperic/sigar). Requires execute privileges on 
files in `java.io.tmpdir`. Do not set `java.io.tmpdir` to `noexec` when using 
`SysMonitor`.|
-|`org.apache.druid.java.util.metrics.JvmMonitor`|Reports various JVM-related 
statistics.|
-|`org.apache.druid.java.util.metrics.JvmCpuMonitor`|Reports statistics of CPU 
consumption by the JVM.|
-|`org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor`|Reports consumed CPU 
as per the cpuacct cgroup.|
-|`org.apache.druid.java.util.metrics.JvmThreadsMonitor`|Reports Thread 
statistics in the JVM, like numbers of total, daemon, started, died threads.|
-|`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and 
quotas as per the `cpu` cgroup.|
-|`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT 
and memory node allocations as per the `cpuset` cgroup.|
-|`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory 
statistic as per the memory cgroup.|
-|`org.apache.druid.server.metrics.EventReceiverFirehoseMonitor`|Reports how 
many events have been queued in the EventReceiverFirehose.|
-|`org.apache.druid.server.metrics.HistoricalMetricsMonitor`|Reports statistics 
on Historical processes. Available only on Historical processes.|
-|`org.apache.druid.server.metrics.SegmentStatsMonitor` | **EXPERIMENTAL** 
Reports statistics about segments on Historical processes. Available only on 
Historical processes. Not to be used when lazy loading is configured.           
                                                              |
-|`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many 
queries have been successful/failed/interrupted.|
-|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal 
metrics of `http` or `parametrized` emitter (see below). Must not be used with 
another emitter type. See the description of the metrics here: 
https://github.com/apache/druid/pull/4973.|
-|`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many 
ingestion tasks are currently running/pending/waiting and also the number of 
successful/failed tasks per emission period.|
-|`org.apache.druid.server.metrics.TaskSlotCountStatsMonitor`|Reports metrics 
about task slot usage per emission period.|
-|`org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor`|Reports how 
many ingestion tasks are currently running/pending/waiting, the number of 
successful/failed tasks, and metrics about task slot usage for the reporting 
worker, per emission period. Only supported by middleManager node types.|
+| Name                                                           | Description 
                                                                                
                                                                                
                                                                |

Review Comment:
   I think the original formatting aligns better with the rest of the Druid 
docs.



##########
server/src/main/java/org/apache/druid/server/metrics/ServiceStatusMonitor.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import com.google.inject.Inject;
+import com.google.inject.name.Named;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+import java.util.Map;
+
+/**
+ * Monitor service running status.
+ * For Overlord/Coordinator, the metric reported is service leader count.
+ */
+public class ServiceStatusMonitor extends AbstractMonitor {
+
+  @Named("heartbeat")
+  @Inject(optional = true)
+  Supplier<Map<String, Number>> heartbeatTagsSupplier = null;
+
+  @Inject
+  public ServiceStatusMonitor() {

Review Comment:
   Is the empty constructor needed?



##########
server/src/test/java/org/apache/druid/server/metrics/ServiceStatusMonitorTest.java:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.common.base.Supplier;
+import org.apache.druid.java.util.metrics.StubServiceEmitter;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+public class ServiceStatusMonitorTest {
+
+  private ServiceStatusMonitor monitor;
+  private Map<String, Number> heartbeatTags = new HashMap<>();
+  private Supplier<Map<String, Number>> heartbeatTagsSupplier = () -> 
heartbeatTags;
+
+  @Before
+  public void setUp() throws Exception {
+    monitor = new ServiceStatusMonitor();
+    monitor.heartbeatTagsSupplier = heartbeatTagsSupplier;
+  }
+
+  @Test
+  public void testLeaderCount() {
+    heartbeatTags.put("leader", 0);
+    final StubServiceEmitter emitter = new StubServiceEmitter("service", 
"host");
+    monitor.doMonitor(emitter);
+
+    Assert.assertEquals(1, emitter.getEvents().size());
+    Assert.assertEquals("druid/heartbeat", 
emitter.getEvents().get(0).toMap().get("metric"));
+    Assert.assertEquals(0, emitter.getEvents().get(0).toMap().get("value"));
+
+    heartbeatTags.put("leader", 1);
+    emitter.flush();
+    monitor.doMonitor(emitter);
+    Assert.assertEquals(1, emitter.getEvents().size());
+    Assert.assertEquals("druid/heartbeat", 
emitter.getEvents().get(0).toMap().get("metric"));
+    Assert.assertEquals(1, emitter.getEvents().get(0).toMap().get("value"));
+  }
+}

Review Comment:
   Nit: newline at end of file.



##########
docs/operations/metrics.md:
##########
@@ -326,6 +326,12 @@ If `emitBalancingStats` is set to `true` in the 
Coordinator [dynamic configurati
 
 ## General Health
 
+### Overlord/Coordinator

Review Comment:
   I think it can be used for all services, even though it might not be very 
useful right now except for coordinator and overlord.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to