provide "red flags" JMX instrumentation
---------------------------------------

                 Key: CASSANDRA-3670
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Peter Schuller
            Assignee: Peter Schuller
            Priority: Minor


As discussed in CASSANDRA-3641, it would be nice to expose through JMX certain 
information which is almost without exception indicative of something being 
wrong with the node or cluster.

In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
Other examples include:

* Number of times the selection of files to compact was adjusted due to disk 
space heuristics
* Number of times compaction has failed
* Any I/O error reading from or writing to disk (the work here is collecting, 
not exposing, so maybe not in an initial version)
* Any data skipped due to checksum mismatches (when checksumming is being 
used); e.g., "number of skips".
* Any arbitrary exception at least in certain code paths (compaction, scrub, 
cleanup for starters)

Probably other things.

The motivation is that if we have clear and obvious indications that something 
truly is wrong, it seems suboptimal to just leave that information in the log 
somewhere, for someone to discover later when something else broke as a result 
and a human investigates. You might argue that one should use non-trivial log 
analysis to detect these things, but I highly doubt a lot of people do this and 
it seems very wasteful to require that in comparison to just providing the 
MBean.

It is important to note that the *lack* of a certain problem being advertised 
in this MBean is not supposed to be indicative of a *lack* of a problem. 
Rather, the point is that to the extent we can easily do so, it is nice to have 
a clear method of communicating to monitoring systems where there *is* a clear 
indication of something being wrong.

The main part of this ticket is not to cover everything under the sun, but 
rather to reach agreement on adding an MBean where these types of indicators 
can be collected. Individual counters can then be added over time as one thinks 
of them.

I propose:

* Create an org.apache.cassandra.db.RedFlags MBean
* Populate with a few things to begin with.

I'll submit the patch if there is agreement.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to