Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11468 to look at the new patch set (#7). Change subject: IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. ...................................................................... IMPALA-7596. Adding JvmPauseMonitor (and other GC) metrics to Impala metrics. Following up to IMPALA-6857, it's useful for monitoring tools to see if the pause monitor is getting triggered, and to see other GC metrics. The Java side here, and the Thrift side, were easy enough. However, the Impala metric implementation here caused us to call into the frontend to read through the JMX memory beans 72 times, because each call to GetValue() was getting all the data for the pool. This structure made it hard to add additional, non-pool, metrics, and it felt wasteful. To combat this, I added a cache of 10 seconds for getting the metrics from the Frontend. The counters will typically re-use the same data. There are five metrics here, and to avoid yet another enum class, I used C++ lambdas to capture which field of the Thrift object I care about. If folks like the approach, I think it can simplify way the enums for the pool metrics as well. I measured the cost of calling into the metrics code by looping the metrics-gathering 100 times and looking at CPU time for the process using this script: START_CPU=$(cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') for i in $(seq 100); do curl http://localhost:25000/jsonmetrics?json > /dev/null 2> /dev/null done END_CPU=$( cat /proc/$(fuser 25000/tcp 2> /dev/null | tr -d ' ')/stat | awk '{ print $14 + $15 }') echo $START_CPU $END_CPU $(($END_CPU - $START_CPU)) On a release build on my development machine, gathering metrics 100 times took 0.16 cpu seconds without this change and 0.07 cpu seconds with this change. The measurement accuracy here is 0.01 (I spot-checked this with using the cpuacct cgroup infrastructure which gives you nanos, but it was more painful to script), but this convinces me that this is a net improvement. Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 --- M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/memory-metrics.cc M be/src/util/memory-metrics.h M common/thrift/Frontend.thrift M common/thrift/metrics.json M fe/src/main/java/org/apache/impala/common/JniUtil.java M fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java M tests/custom_cluster/test_pause_monitor.py A tests/observability/__init__.py A tests/observability/test_jvm_metrics.py M tests/run-tests.py 12 files changed, 380 insertions(+), 145 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/11468/7 -- To view, visit http://gerrit.cloudera.org:8080/11468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia707393962ad94ef715ec015b3fe3bb1769104a2 Gerrit-Change-Number: 11468 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>