[jira] [Issue Comment Edited] (CASSANDRA-3670) provide "red flags" JMX instrumentation

Peter Schuller (Issue Comment Edited) (JIRA) Fri, 27 Jan 2012 15:14:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195216#comment-13195216
 ]


Peter Schuller edited comment on CASSANDRA-3670 at 1/27/12 11:13 PM:
---------------------------------------------------------------------

CodaHale Metrics being evaluated in CASSANDRA-3671. If there's a +1 there, will 
go for same here.
                
      was (Author: scode):
    CodaHale Metrics being evaluated in CASSANDRA-3671. If there's a +1 here, 
will go for same here.
                  
> provide "red flags" JMX instrumentation
> ---------------------------------------
>
>                 Key: CASSANDRA-3670
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3670) provide "red flags" JMX instrumentation

Reply via email to