[ https://issues.apache.org/jira/browse/FLUME-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010907#comment-16010907 ]
ASF GitHub Bot commented on FLUME-3092: --------------------------------------- GitHub user adenes opened a pull request: https://github.com/apache/flume/pull/131 FLUME-3092. Extend the FileChannel's monitoring metrics This patch adds the following new metrics to the FileChannel's counters: - eventPutErrorCount: incremented if an IOException occurs during put operation. - eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs during take operation. - checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write. - unhealthy: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem), so the channel is capable for normal operation - closed flag: the numeric representation (1: closed, 0: open) of the negated open flag. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adenes/flume FLUME-3092 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flume/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 ---- commit 7c5957e4692817482519e6b9da20d29324a7f332 Author: Denes Arvay <de...@cloudera.com> Date: 2017-05-09T14:23:31Z FLUME-3092. Extend the FileChannel's monitoring metrics This patch adds the following new metrics to the FileChannel's counters: - eventPutErrorCount: incremented if an IOException occurs during put operation. - eventTakeErrorCount: incremented if an IOException or CorruptEventException occurs during take operation. - checkpointWriteErrorCount: incremented if an exception occurs during checkpoint write. - unhealthy: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem), so the channel is capable for normal operation - closed flag: the numeric representation (1: closed, 0: open) of the negated open flag. ---- > Extend the FileChannel's monitoring metrics > ------------------------------------------- > > Key: FLUME-3092 > URL: https://issues.apache.org/jira/browse/FLUME-3092 > Project: Flume > Issue Type: Improvement > Components: File Channel > Affects Versions: 1.7.0 > Reporter: Denes Arvay > Assignee: Denes Arvay > > There are already several generic metrics (e.g. {{eventPutAttemptCount}} and > {{eventPutSuccessCount}}) which can be used to create compound metrics for > monitoring the FileChannel's health. > Some monitoring system's aren't capable to calculate such derived metrics, > though, so I recommend to add the following extra counters to represent if a > channel operation failed or the channel is in an unhealthy state. > - {{eventPutErrorCount}}: incremented if an {{IOException}} occurs during > {{put}} operation. > - {{eventTakeErrorCount}}: incremented if an {{IOException}} or > {{CorruptEventException}} occurs during {{take}} operation. > - {{checkpointWriteErrorCount}}: incremented if an exception occurs during > checkpoint write. > - {{unhealthy}}: this flag represents whether the channel has started > successfully (i.e. the replay ran without any problem). This is similar to > the already existing {{open}} flag except that the latter is initially false > and is set to {{true}} if the initialization (including the log replay) is > successfully done. The {{unhealthy}}, in contrary, is {{false}} by default > and is set to {{true}} if there is an error during startup. > Beside these flags I'd also introduce a {{closed}} flag which is the numeric > representation (1: closed, 0: open) of the negated (already existing) > {{open}} flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346)