[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation
[ https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] C. Scott Andreas updated CASSANDRA-3670: Component/s: Observability > provide "red flags" JMX instrumentation > --- > > Key: CASSANDRA-3670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3670 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Peter Schuller >Priority: Minor > > As discussed in CASSANDRA-3641, it would be nice to expose through JMX > certain information which is almost without exception indicative of something > being wrong with the node or cluster. > In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. > Other examples include: > * Number of times the selection of files to compact was adjusted due to disk > space heuristics > * Number of times compaction has failed > * Any I/O error reading from or writing to disk (the work here is collecting, > not exposing, so maybe not in an initial version) > * Any data skipped due to checksum mismatches (when checksumming is being > used); e.g., "number of skips". > * Any arbitrary exception at least in certain code paths (compaction, scrub, > cleanup for starters) > Probably other things. > The motivation is that if we have clear and obvious indications that > something truly is wrong, it seems suboptimal to just leave that information > in the log somewhere, for someone to discover later when something else broke > as a result and a human investigates. You might argue that one should use > non-trivial log analysis to detect these things, but I highly doubt a lot of > people do this and it seems very wasteful to require that in comparison to > just providing the MBean. > It is important to note that the *lack* of a certain problem being advertised > in this MBean is not supposed to be indicative of a *lack* of a problem. > Rather, the point is that to the extent we can easily do so, it is nice to > have a clear method of communicating to monitoring systems where there *is* a > clear indication of something being wrong. > The main part of this ticket is not to cover everything under the sun, but > rather to reach agreement on adding an MBean where these types of indicators > can be collected. Individual counters can then be added over time as one > thinks of them. > I propose: > * Create an org.apache.cassandra.db.RedFlags MBean > * Populate with a few things to begin with. > I'll submit the patch if there is agreement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation
[ https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3670: -- Reviewer: (was: Brandon Williams) Assignee: (was: Tyler Hobbs) > provide "red flags" JMX instrumentation > --- > > Key: CASSANDRA-3670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3670 > Project: Cassandra > Issue Type: Improvement >Reporter: Peter Schuller >Priority: Minor > > As discussed in CASSANDRA-3641, it would be nice to expose through JMX > certain information which is almost without exception indicative of something > being wrong with the node or cluster. > In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. > Other examples include: > * Number of times the selection of files to compact was adjusted due to disk > space heuristics > * Number of times compaction has failed > * Any I/O error reading from or writing to disk (the work here is collecting, > not exposing, so maybe not in an initial version) > * Any data skipped due to checksum mismatches (when checksumming is being > used); e.g., "number of skips". > * Any arbitrary exception at least in certain code paths (compaction, scrub, > cleanup for starters) > Probably other things. > The motivation is that if we have clear and obvious indications that > something truly is wrong, it seems suboptimal to just leave that information > in the log somewhere, for someone to discover later when something else broke > as a result and a human investigates. You might argue that one should use > non-trivial log analysis to detect these things, but I highly doubt a lot of > people do this and it seems very wasteful to require that in comparison to > just providing the MBean. > It is important to note that the *lack* of a certain problem being advertised > in this MBean is not supposed to be indicative of a *lack* of a problem. > Rather, the point is that to the extent we can easily do so, it is nice to > have a clear method of communicating to monitoring systems where there *is* a > clear indication of something being wrong. > The main part of this ticket is not to cover everything under the sun, but > rather to reach agreement on adding an MBean where these types of indicators > can be collected. Individual counters can then be added over time as one > thinks of them. > I propose: > * Create an org.apache.cassandra.db.RedFlags MBean > * Populate with a few things to begin with. > I'll submit the patch if there is agreement. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation
[ https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3670: -- Reviewer: brandon.williams (was: slebresne) Assignee: Tyler Hobbs (was: Peter Schuller) WDYT [~thobbs]? > provide "red flags" JMX instrumentation > --- > > Key: CASSANDRA-3670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3670 > Project: Cassandra > Issue Type: Improvement >Reporter: Peter Schuller >Assignee: Tyler Hobbs >Priority: Minor > > As discussed in CASSANDRA-3641, it would be nice to expose through JMX > certain information which is almost without exception indicative of something > being wrong with the node or cluster. > In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. > Other examples include: > * Number of times the selection of files to compact was adjusted due to disk > space heuristics > * Number of times compaction has failed > * Any I/O error reading from or writing to disk (the work here is collecting, > not exposing, so maybe not in an initial version) > * Any data skipped due to checksum mismatches (when checksumming is being > used); e.g., "number of skips". > * Any arbitrary exception at least in certain code paths (compaction, scrub, > cleanup for starters) > Probably other things. > The motivation is that if we have clear and obvious indications that > something truly is wrong, it seems suboptimal to just leave that information > in the log somewhere, for someone to discover later when something else broke > as a result and a human investigates. You might argue that one should use > non-trivial log analysis to detect these things, but I highly doubt a lot of > people do this and it seems very wasteful to require that in comparison to > just providing the MBean. > It is important to note that the *lack* of a certain problem being advertised > in this MBean is not supposed to be indicative of a *lack* of a problem. > Rather, the point is that to the extent we can easily do so, it is nice to > have a clear method of communicating to monitoring systems where there *is* a > clear indication of something being wrong. > The main part of this ticket is not to cover everything under the sun, but > rather to reach agreement on adding an MBean where these types of indicators > can be collected. Individual counters can then be added over time as one > thinks of them. > I propose: > * Create an org.apache.cassandra.db.RedFlags MBean > * Populate with a few things to begin with. > I'll submit the patch if there is agreement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation
[ https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Schuller updated CASSANDRA-3670: -- Reviewer: slebresne > provide "red flags" JMX instrumentation > --- > > Key: CASSANDRA-3670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3670 > Project: Cassandra > Issue Type: Improvement >Reporter: Peter Schuller >Assignee: Peter Schuller >Priority: Minor > > As discussed in CASSANDRA-3641, it would be nice to expose through JMX > certain information which is almost without exception indicative of something > being wrong with the node or cluster. > In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. > Other examples include: > * Number of times the selection of files to compact was adjusted due to disk > space heuristics > * Number of times compaction has failed > * Any I/O error reading from or writing to disk (the work here is collecting, > not exposing, so maybe not in an initial version) > * Any data skipped due to checksum mismatches (when checksumming is being > used); e.g., "number of skips". > * Any arbitrary exception at least in certain code paths (compaction, scrub, > cleanup for starters) > Probably other things. > The motivation is that if we have clear and obvious indications that > something truly is wrong, it seems suboptimal to just leave that information > in the log somewhere, for someone to discover later when something else broke > as a result and a human investigates. You might argue that one should use > non-trivial log analysis to detect these things, but I highly doubt a lot of > people do this and it seems very wasteful to require that in comparison to > just providing the MBean. > It is important to note that the *lack* of a certain problem being advertised > in this MBean is not supposed to be indicative of a *lack* of a problem. > Rather, the point is that to the extent we can easily do so, it is nice to > have a clear method of communicating to monitoring systems where there *is* a > clear indication of something being wrong. > The main part of this ticket is not to cover everything under the sun, but > rather to reach agreement on adding an MBean where these types of indicators > can be collected. Individual counters can then be added over time as one > thinks of them. > I propose: > * Create an org.apache.cassandra.db.RedFlags MBean > * Populate with a few things to begin with. > I'll submit the patch if there is agreement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira