[jira] [Updated] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ZOOKEEPER-3037: -- Labels: pull-request-available (was: ) > Add JvmPauseMonitor to ZooKeeper > > > Key: ZOOKEEPER-3037 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Affects Versions: 3.5.3, 3.4.12 >Reporter: Norbert Kalmar >Assignee: Norbert Kalmar >Priority: Minor > Labels: pull-request-available > > After a ZK crash, or client timeout sometimes it's hard to determine from the > logs what happened. Knowing if ZK was responsive at the time would help a > lot. For example, ZK might spend a lot of time waiting on GC (there is still > some misconception that ZK is a storage). > To help detect this, HADOOP already has a great tool called JVM Pause > Monitor. (As the name suggest, it can be also used for monitoring, but it > also helps post-mortem in a lot of cases). Basically it has a daemon that > sleeps for one second, and if the sleep time exceeds the 1s by more than the > threshold (1s: INFO, 10s: WARN by default - this can be configurable in our > case, see below), it will alert/make a log entry. It can also monitor the > time GC took. > The class implementing this is in HADOOP-common, but ZK should not depend on > this package. Since this is a straightforward implementation, and in the past > five years the few commits it had is nothing really serious, I think we could > just copy this class in ZooKeeper, and introduce it as a configurable > feature, by default it can be off. > The class: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java > Task: > - Create a class in ZK (under zookeeper/server/util/) called JvmPauseMonitor. > - Make feature configurable, by default: OFF > - Make sleep time and threshold time configurable > - Update documentation > - Add [current size of the heap OR % of heap used] in the log entry whenever > sleep threshold had exceeded by a lot (10s) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Norbert Kalmar updated ZOOKEEPER-3037: -- Description: After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage). To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took. The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off. The class: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Task: - Create a class in ZK (under zookeeper/server/util/) called JvmPauseMonitor. - Make feature configurable, by default: OFF - Make sleep time and threshold time configurable - Update documentation - Add [current size of the heap OR % of heap used] in the log entry whenever sleep threshold had exceeded by a lot (10s) was: After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage). To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took. The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off. The class: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Task: - Create a class in ZK under contrib called JvmPauseMonitor. - Make feature configurable, by default: OFF - Make sleep time and threshold time configurable - Update documentation - Add [current size of the heap OR % of heap used] in the log entry whenever sleep threshold had exceeded by a lot (10s) > Add JvmPauseMonitor to ZooKeeper > > > Key: ZOOKEEPER-3037 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Affects Versions: 3.5.3, 3.4.12 >Reporter: Norbert Kalmar >Assignee: Norbert Kalmar >Priority: Minor > > After a ZK crash, or client timeout sometimes it's hard to determine from the > logs what happened. Knowing if ZK was responsive at the time would help a > lot. For example, ZK might spend a lot of time waiting on GC (there is still > some misconception that ZK is a storage). > To help detect this, HADOOP already has a great tool called JVM Pause > Monitor. (As the name suggest, it can be also used for monitoring, but it > also helps post-mortem in a lot of cases). Basically it has a daemon that > sleeps for one second, and if the sleep time exceeds the 1s by more than the > threshold (1s: INFO, 10s: WARN by default - this can be configurable in our > case, see below), it will alert/make a log entry. It can also monitor the > time GC took. > The class implementing this is in HADOOP-common, but ZK should not depend on > this package. Since this is a straightforward implementation, and in the past > five years the few commits it had is nothing really serious, I think we could > just copy this class in ZooKeeper, and introduce it as a configurable > feature, by default it can be off. > The class: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apa
[jira] [Updated] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Norbert Kalmar updated ZOOKEEPER-3037: -- Description: After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage). To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took. The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off. The class: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Task: - Create a class in ZK under contrib called JvmPauseMonitor. - Make feature configurable, by default: OFF - Make sleep time and threshold time configurable - Update documentation - Add [current size of the heap OR % of heap used] in the log entry whenever sleep threshold had exceeded by a lot (10s) was: After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage). To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took. The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off. The class: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Task: - Create a class in ZK under contrib called JvmPauseMonitor. - Make feature configurable, by default: OFF - Make sleep time and threshold time configurable > Add JvmPauseMonitor to ZooKeeper > > > Key: ZOOKEEPER-3037 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Affects Versions: 3.5.3, 3.4.12 >Reporter: Norbert Kalmar >Assignee: Norbert Kalmar >Priority: Minor > > After a ZK crash, or client timeout sometimes it's hard to determine from the > logs what happened. Knowing if ZK was responsive at the time would help a > lot. For example, ZK might spend a lot of time waiting on GC (there is still > some misconception that ZK is a storage). > To help detect this, HADOOP already has a great tool called JVM Pause > Monitor. (As the name suggest, it can be also used for monitoring, but it > also helps post-mortem in a lot of cases). Basically it has a daemon that > sleeps for one second, and if the sleep time exceeds the 1s by more than the > threshold (1s: INFO, 10s: WARN by default - this can be configurable in our > case, see below), it will alert/make a log entry. It can also monitor the > time GC took. > The class implementing this is in HADOOP-common, but ZK should not depend on > this package. Since this is a straightforward implementation, and in the past > five years the few commits it had is nothing really serious, I think we could > just copy this class in ZooKeeper, and introduce it as a configurable > feature, by default it can be off. > The class: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java > Task: > - Create a class in ZK under contrib called JvmPauseMonitor. > - Make feature configurable, by default: OFF > - Mak
[jira] [Updated] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Norbert Kalmar updated ZOOKEEPER-3037: -- Description: After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage). To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took. The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off. The class: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Task: - Create a class in ZK under contrib called JvmPauseMonitor. - Make feature configurable, by default: OFF - Make sleep time and threshold time configurable was: After a ZK crash, or client timeout sometimes it's hard to determine from the logs what happened. Knowing if ZK was responsive at the time would help a lot. For example, ZK might spend a lot of time waiting on GC (there is still some misconception that ZK is a storage). To help detect this, HADOOP already has a great tool called JVM Pause Monitor. (As the name suggest, it can be also used for monitoring, but it also helps post-mortem in a lot of cases). Basically it has a daemon that sleeps for one second, and if the sleep time exceeds the 1s by more than the threshold (1s: INFO, 10s: WARN by default - this can be configurable in our case, see below), it will alert/make a log entry. It can also monitor the time GC took. The class implementing this is in HADOOP-common, but ZK should not depend on this package. Since this is a straightforward implementation, and in the past five years the few commits it had is nothing really serious, I think we could just copy this class in ZooKeeper, and introduce it as a configurable feature, by default it can be off. The class: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Task: - Create a class in ZK under contrib called JvmPauseMonitor. - Make feature configurable, by default: OFF - ?Make sleep time and threshold time configurable? > Add JvmPauseMonitor to ZooKeeper > > > Key: ZOOKEEPER-3037 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037 > Project: ZooKeeper > Issue Type: Improvement > Components: contrib >Affects Versions: 3.5.3, 3.4.12 >Reporter: Norbert Kalmar >Assignee: Norbert Kalmar >Priority: Minor > > After a ZK crash, or client timeout sometimes it's hard to determine from the > logs what happened. Knowing if ZK was responsive at the time would help a > lot. For example, ZK might spend a lot of time waiting on GC (there is still > some misconception that ZK is a storage). > To help detect this, HADOOP already has a great tool called JVM Pause > Monitor. (As the name suggest, it can be also used for monitoring, but it > also helps post-mortem in a lot of cases). Basically it has a daemon that > sleeps for one second, and if the sleep time exceeds the 1s by more than the > threshold (1s: INFO, 10s: WARN by default - this can be configurable in our > case, see below), it will alert/make a log entry. It can also monitor the > time GC took. > The class implementing this is in HADOOP-common, but ZK should not depend on > this package. Since this is a straightforward implementation, and in the past > five years the few commits it had is nothing really serious, I think we could > just copy this class in ZooKeeper, and introduce it as a configurable > feature, by default it can be off. > The class: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java > Task: > - Create a class in ZK under contrib called JvmPauseMonitor. > - Make feature configurable, by default: OFF > - Make sleep time and threshold time configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005)