Denes Arvay created FLUME-3092:
----------------------------------

             Summary: Extend the FileChannel's monitoring metrics
                 Key: FLUME-3092
                 URL: https://issues.apache.org/jira/browse/FLUME-3092
             Project: Flume
          Issue Type: Improvement
          Components: File Channel
    Affects Versions: 1.7.0
            Reporter: Denes Arvay
            Assignee: Denes Arvay


There are already several generic metrics (e.g. {{eventPutAttemptCount}} and 
{{eventPutSuccessCount}}) which can be used to create compound metrics for 
monitoring the FileChannel's health.
Some monitoring system's aren't capable to calculate such derived metrics, 
though, so I recommend to add the following extra counters to represent if a 
channel operation failed or the channel is in an unhealthy state.

- {{eventPutErrorCount}}: incremented if an {{IOException}} occurs during 
{{put}} operation.
- {{eventTakeErrorCount}}: incremented if an {{IOException}} or 
{{CorruptEventException}} occurs during {{take}} operation.
- {{checkpointWriteErrorCount}}: incremented if an exception occurs during 
checkpoint write.
- {{unhealthy}}: this flag represents whether the channel has started 
successfully (i.e. the replay ran without any problem). This is similar to the 
already existing {{open}} flag except that the latter is initially false and is 
set to {{true}} if the initialization (including the log replay) is 
successfully done. The {{unhealthy}}, in contrary, is {{false}} by default and 
is set to {{true}} if there is an error during startup.

Beside these flags I'd also introduce a {{closed}} flag which is the numeric 
representation (1: closed, 0: open) of the negated (already existing) {{open}} 
flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to