[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

2018-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/2840


---


[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

2018-09-18 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2840#discussion_r218484299
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -2922,19 +2948,20 @@ public void launchServer() throws Exception {
 }
 });
 
-metricsRegistry.registerGauge("nimbus:num-supervisors", () -> 
state.supervisors(null).size());
-metricsRegistry.registerGauge("nimbus:fragmented-memory", 
this::fragmentedMemory);
-metricsRegistry.registerGauge("nimbus:fragmented-cpu", 
this::fragmentedCpu);
-metricsRegistry.registerGauge("nimbus:available-memory", () -> 
nodeIdToResources.get().values().parallelStream()
-.mapToDouble(SupervisorResources::getAvailableMem)
+
metricsRegistry.registerGauge("nimbus:total-available-memory-non-negative", () 
-> nodeIdToResources.get().values()
--- End diff --

There appears to be some oddness with the naming of the metrics.  It would 
be nice to have some kind of convention for them.

Perhaps we can have something like 

 * 'nimbus:cluster:available-memory-mb' to show that this came from nimbus, 
it is about the cluster and it is the available memory in MB.
 * 'nimbus:cluster:available-cpu-pct'
 * 'nimbus:cluster:total-memory-mb'
 * 'nimbus:cluster:total-cpu-pct'

etc.



---


[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

2018-09-18 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2840#discussion_r218485913
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -4767,4 +4797,205 @@ public IScheduler getForcedScheduler() {
 
 }
 
+private static class ClusterSummaryMetrics implements MetricSet {
+private static final String SUMMARY = "summary";
+private final Map metrics = 
new HashMap<>();
+
+public com.codahale.metrics.Metric put(String key, 
com.codahale.metrics.Metric value) {
+return metrics.put(MetricRegistry.name(SUMMARY, key), value);
+}
+
+@Override
+public Map getMetrics() {
+return metrics;
+}
+}
+
+private class ClusterSummaryMetricSet implements Runnable {
+private static final int CACHING_WINDOW = 5;
+
+private final ClusterSummaryMetrics clusterSummaryMetrics = new 
ClusterSummaryMetrics();
+
+private final Function registerHistogram = 
(name) -> {
+//This histogram reflects the data distribution across only 
one ClusterSummary, i.e.,
+// data distribution across all entities of a type (e.g., data 
from all nimbus/topologies) at one moment.
+// Hence we use half of the CACHING_WINDOW time to ensure it 
retains only data from the most recent update
+final Histogram histogram = new Histogram(new 
SlidingTimeWindowReservoir(CACHING_WINDOW / 2, TimeUnit.SECONDS));
+clusterSummaryMetrics.put(name, histogram);
+return histogram;
+};
+private volatile boolean active = false;
+
+//NImbus metrics distribution
+private final Histogram nimbusUptime = 
registerHistogram.apply("nimbuses:uptime-secs");
+
+//Supervisor metrics distribution
+private final Histogram supervisorsUptime = 
registerHistogram.apply("supervisors:uptime-secs");
+private final Histogram supervisorsNumWorkers = 
registerHistogram.apply("supervisors:num-workers");
+private final Histogram supervisorsNumUsedWorkers = 
registerHistogram.apply("supervisors:num-used-workers");
+private final Histogram supervisorsUsedMem = 
registerHistogram.apply("supervisors:used-mem");
+private final Histogram supervisorsUsedCpu = 
registerHistogram.apply("supervisors:used-cpu");
+private final Histogram supervisorsFragmentedMem = 
registerHistogram.apply("supervisors:fragmented-mem");
+private final Histogram supervisorsFragmentedCpu = 
registerHistogram.apply("supervisors:fragmented-cpu");
+
+//Topology metrics distribution
+private final Histogram topologiesNumTasks = 
registerHistogram.apply("topologies:num-tasks");
+private final Histogram topologiesNumExecutors = 
registerHistogram.apply("topologies:num-executors");
+private final Histogram topologiesNumWorker = 
registerHistogram.apply("topologies:num-workers");
+private final Histogram topologiesUptime = 
registerHistogram.apply("topologies:uptime-secs");
+private final Histogram topologiesReplicationCount = 
registerHistogram.apply("topologies:replication-count");
+private final Histogram topologiesRequestedMemOnHeap = 
registerHistogram.apply("topologies:requested-mem-on-heap");
+private final Histogram topologiesRequestedMemOffHeap = 
registerHistogram.apply("topologies:requested-mem-off-heap");
+private final Histogram topologiesRequestedCpu = 
registerHistogram.apply("topologies:requested-cpu");
+private final Histogram topologiesAssignedMemOnHeap = 
registerHistogram.apply("topologies:assigned-mem-on-heap");
+private final Histogram topologiesAssignedMemOffHeap = 
registerHistogram.apply("topologies:assigned-mem-off-heap");
+private final Histogram topologiesAssignedCpu = 
registerHistogram.apply("topologies:assigned-cpu");
--- End diff --

I am sorry, but I am really having a hard time of seeing a histogram of 
these being valuable, but perhaps I am wrong.  I would much rather see 
dimensions/tags being used, but since those are not available yet, I am okay 
with this.


---


[GitHub] storm pull request #2840: STORM-3147: Add metrics based on ClusterSummary

2018-09-17 Thread srdo
GitHub user srdo opened a pull request:

https://github.com/apache/storm/pull/2840

STORM-3147: Add metrics based on ClusterSummary

Rebase and update of https://github.com/apache/storm/pull/2764.

@zd-project Please take a look when you have a chance.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srdo/storm STORM-3147

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2840


commit 02428594b53801572f657f4d0bff85b2213b3723
Author: Zhengdai Hu 
Date:   2018-07-13T19:22:20Z

STORM-3147: Port ClusterSummary to StormMetricsRegistry

commit 392803c9bb2fe81a5df57f695a0e6d7c37bd1f2e
Author: Stig Rohde Døssing 
Date:   2018-09-17T20:21:12Z

STORM-3147: Fix minor nits, rebase to use non-static metrics registry




---