[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

C. Scott Andreas (JIRA) Sun, 18 Nov 2018 22:13:11 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


C. Scott Andreas updated CASSANDRA-3670:
----------------------------------------
    Component/s: Observability

> provide "red flags" JMX instrumentation
> ---------------------------------------
>
>                 Key: CASSANDRA-3670
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Observability
>            Reporter: Peter Schuller
>            Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

Reply via email to