Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/10998 )
Change subject: IMPALA-6857: Add Jvm pause/GC Monitor utility and expose JMX metrics ...................................................................... Patch Set 5: (9 comments) Thanks. I'm very glad this is making it in. I've got a few comments, but nothing too major. http://gerrit.cloudera.org:8080/#/c/10998/5/be/src/util/default-path-handlers.cc File be/src/util/default-path-handlers.cc: http://gerrit.cloudera.org:8080/#/c/10998/5/be/src/util/default-path-handlers.cc@199 PS5, Line 199: if (!JniUtil::GetJMXJson(&result).ok()) return; > is it worth adding something like: Also, let's log on error here, at least once? http://gerrit.cloudera.org:8080/#/c/10998/5/be/src/util/default-path-handlers.cc@243 PS5, Line 243: webserver->RegisterUrlCallback("/jmx", "raw_text_pre.tmpl", JmxHandler); Is raw_text_pre.tmpl missing from this review? When I run curl against this endpoint, I should see "Content-type" header being application/json. Here's what the NN does. Your end-to-end test should validate that the HTTP headers are what we'd expect. I worry that "raw_text_pre.tmpl" is "<pre>$text</pre>", which means it's HTML, and the tools who read /jmx are expecting JSON and shouldn't be doing any more parsing than JSON-parsing. $curl --silent -v http://.../jmx | head * Trying ...... * TCP_NODELAY set * Connected to ... (...) port 20101 (#0) > GET /jmx HTTP/1.1 > Host: ...:20101 > User-Agent: curl/7.52.1 > Accept: */* > < HTTP/1.1 200 OK < Date: Mon, 30 Jul 2018 16:36:01 GMT < Cache-Control: no-cache < Expires: Mon, 30 Jul 2018 16:36:01 GMT < Date: Mon, 30 Jul 2018 16:36:01 GMT < Pragma: no-cache < Content-Type: application/json; charset=utf8 < X-Content-Type-Options: nosniff < X-FRAME-OPTIONS: SAMEORIGIN < X-XSS-Protection: 1; mode=block < Access-Control-Allow-Methods: GET < Access-Control-Allow-Origin: * < Transfer-Encoding: chunked < { [12968 bytes data] { "beans" : [ { "name" : "Hadoop:service=NameNode,name=JvmMetrics", "modelerType" : "JvmMetrics", "tag.Context" : "jvm", "tag.ProcessName" : "NameNode", http://gerrit.cloudera.org:8080/#/c/10998/5/be/src/util/jni-util.h File be/src/util/jni-util.h: http://gerrit.cloudera.org:8080/#/c/10998/5/be/src/util/jni-util.h@295 PS5, Line 295: /// Gets JMX metrics of the JVM. "encoded as a JSON string"? http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/common/JniUtil.java File fe/src/main/java/org/apache/impala/common/JniUtil.java: http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@102 PS5, Line 102: byte[] serializeToThrift(T input, F protocolFactory) throws ImpalaException { This is boring, but please add a unit test for it; should be easy. Is protocolFactory always the same? I bet it is, so you can remove that argument and call it something about "serializeToThriftWithBinaryProtocol()" http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/common/JniUtil.java@211 PS5, Line 211: return JniUtil.serializeToThrift(jvmMetrics, protocolFactory_); nit: You shouldn't need "JniUtil." here because you're already in the context. Try removing it here and line 238. http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/util/JMXJsonUtil.java File fe/src/main/java/org/apache/impala/util/JMXJsonUtil.java: http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/util/JMXJsonUtil.java@71 PS5, Line 71: public class JMXJsonUtil { Please add a simple unit test. Make sure it runs, generates a valid JSON, and contains something expected in it. http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/util/JMXJsonUtil.java@72 PS5, Line 72: // MBean server isntance instance http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/util/JMXJsonUtil.java@73 PS5, Line 73: protected static transient MBeanServer mBeanServer = why is this transient? http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java File fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java: http://gerrit.cloudera.org:8080/#/c/10998/5/fe/src/main/java/org/apache/impala/util/JvmPauseMonitor.java@173 PS5, Line 173: : if (extraSleepTime > warnThresholdMs_) { : LOG.warn(formatMessage( : extraSleepTime, gcTimesAfterSleep, gcTimesBeforeSleep)); : } else if (extraSleepTime > infoThresholdMs_) { : LOG.info(formatMessage( : extraSleepTime, gcTimesAfterSleep, gcTimesBeforeSleep)); : } Perhaps we should add Impala metrics for number of detected pauses? (Please take a look to see if HDFS and/or HBase do something similar.) The idea would be to let a monitoring tool quickly know that this warning is being sent out without having to scan through the logs. -- To view, visit http://gerrit.cloudera.org:8080/10998 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30d897b7e063846ad6d8f88243e2c04264da0341 Gerrit-Change-Number: 10998 Gerrit-PatchSet: 5 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 30 Jul 2018 16:39:17 +0000 Gerrit-HasComments: Yes