Denes Arvay created FLUME-3092: ---------------------------------- Summary: Extend the FileChannel's monitoring metrics Key: FLUME-3092 URL: https://issues.apache.org/jira/browse/FLUME-3092 Project: Flume Issue Type: Improvement Components: File Channel Affects Versions: 1.7.0 Reporter: Denes Arvay Assignee: Denes Arvay
There are already several generic metrics (e.g. {{eventPutAttemptCount}} and {{eventPutSuccessCount}}) which can be used to create compound metrics for monitoring the FileChannel's health. Some monitoring system's aren't capable to calculate such derived metrics, though, so I recommend to add the following extra counters to represent if a channel operation failed or the channel is in an unhealthy state. - {{eventPutErrorCount}}: incremented if an {{IOException}} occurs during {{put}} operation. - {{eventTakeErrorCount}}: incremented if an {{IOException}} or {{CorruptEventException}} occurs during {{take}} operation. - {{checkpointWriteErrorCount}}: incremented if an exception occurs during checkpoint write. - {{unhealthy}}: this flag represents whether the channel has started successfully (i.e. the replay ran without any problem). This is similar to the already existing {{open}} flag except that the latter is initially false and is set to {{true}} if the initialization (including the log replay) is successfully done. The {{unhealthy}}, in contrary, is {{false}} by default and is set to {{true}} if there is an error during startup. Beside these flags I'd also introduce a {{closed}} flag which is the numeric representation (1: closed, 0: open) of the negated (already existing) {{open}} flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346)