[ https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Schuller updated CASSANDRA-3670: -------------------------------------- Reviewer: slebresne > provide "red flags" JMX instrumentation > --------------------------------------- > > Key: CASSANDRA-3670 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3670 > Project: Cassandra > Issue Type: Improvement > Reporter: Peter Schuller > Assignee: Peter Schuller > Priority: Minor > > As discussed in CASSANDRA-3641, it would be nice to expose through JMX > certain information which is almost without exception indicative of something > being wrong with the node or cluster. > In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. > Other examples include: > * Number of times the selection of files to compact was adjusted due to disk > space heuristics > * Number of times compaction has failed > * Any I/O error reading from or writing to disk (the work here is collecting, > not exposing, so maybe not in an initial version) > * Any data skipped due to checksum mismatches (when checksumming is being > used); e.g., "number of skips". > * Any arbitrary exception at least in certain code paths (compaction, scrub, > cleanup for starters) > Probably other things. > The motivation is that if we have clear and obvious indications that > something truly is wrong, it seems suboptimal to just leave that information > in the log somewhere, for someone to discover later when something else broke > as a result and a human investigates. You might argue that one should use > non-trivial log analysis to detect these things, but I highly doubt a lot of > people do this and it seems very wasteful to require that in comparison to > just providing the MBean. > It is important to note that the *lack* of a certain problem being advertised > in this MBean is not supposed to be indicative of a *lack* of a problem. > Rather, the point is that to the extent we can easily do so, it is nice to > have a clear method of communicating to monitoring systems where there *is* a > clear indication of something being wrong. > The main part of this ticket is not to cover everything under the sun, but > rather to reach agreement on adding an MBean where these types of indicators > can be collected. Individual counters can then be added over time as one > thinks of them. > I propose: > * Create an org.apache.cassandra.db.RedFlags MBean > * Populate with a few things to begin with. > I'll submit the patch if there is agreement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira