[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

2018-11-18 Thread C. Scott Andreas (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-3670:

Component/s: Observability

> provide "red flags" JMX instrumentation
> ---
>
> Key: CASSANDRA-3670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Peter Schuller
>Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

2014-02-27 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3670:
--

Reviewer:   (was: Brandon Williams)
Assignee: (was: Tyler Hobbs)

> provide "red flags" JMX instrumentation
> ---
>
> Key: CASSANDRA-3670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Peter Schuller
>Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

2013-07-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3670:
--

Reviewer: brandon.williams  (was: slebresne)
Assignee: Tyler Hobbs  (was: Peter Schuller)

WDYT [~thobbs]?

> provide "red flags" JMX instrumentation
> ---
>
> Key: CASSANDRA-3670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Peter Schuller
>Assignee: Tyler Hobbs
>Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-3670) provide "red flags" JMX instrumentation

2011-12-23 Thread Peter Schuller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3670:
--

Reviewer: slebresne

> provide "red flags" JMX instrumentation
> ---
>
> Key: CASSANDRA-3670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3670
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Peter Schuller
>Assignee: Peter Schuller
>Priority: Minor
>
> As discussed in CASSANDRA-3641, it would be nice to expose through JMX 
> certain information which is almost without exception indicative of something 
> being wrong with the node or cluster.
> In the CASSANDRA-3641 case, it was the detection of corrupt counter shards. 
> Other examples include:
> * Number of times the selection of files to compact was adjusted due to disk 
> space heuristics
> * Number of times compaction has failed
> * Any I/O error reading from or writing to disk (the work here is collecting, 
> not exposing, so maybe not in an initial version)
> * Any data skipped due to checksum mismatches (when checksumming is being 
> used); e.g., "number of skips".
> * Any arbitrary exception at least in certain code paths (compaction, scrub, 
> cleanup for starters)
> Probably other things.
> The motivation is that if we have clear and obvious indications that 
> something truly is wrong, it seems suboptimal to just leave that information 
> in the log somewhere, for someone to discover later when something else broke 
> as a result and a human investigates. You might argue that one should use 
> non-trivial log analysis to detect these things, but I highly doubt a lot of 
> people do this and it seems very wasteful to require that in comparison to 
> just providing the MBean.
> It is important to note that the *lack* of a certain problem being advertised 
> in this MBean is not supposed to be indicative of a *lack* of a problem. 
> Rather, the point is that to the extent we can easily do so, it is nice to 
> have a clear method of communicating to monitoring systems where there *is* a 
> clear indication of something being wrong.
> The main part of this ticket is not to cover everything under the sun, but 
> rather to reach agreement on adding an MBean where these types of indicators 
> can be collected. Individual counters can then be added over time as one 
> thinks of them.
> I propose:
> * Create an org.apache.cassandra.db.RedFlags MBean
> * Populate with a few things to begin with.
> I'll submit the patch if there is agreement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira