[ 
https://issues.apache.org/jira/browse/FLUME-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885378#comment-15885378
 ] 

Yuval Lifshitz commented on FLUME-3050:
---------------------------------------

Hi Attila,
Thanks for looking into that. Some errors counters we though about:
* taildir source: fail to read file
* spooldir source: fail to read file; fail to delete file; file changed while 
reading
* tcp source: I assume we don't drop messages, but apply pushback on socket if 
channels is full. But do we handle malformed messages? message too long? 
connection lost in the middle of a message?
* hdfs sink: fail to write file; connectivity error; failovers
* avro sink: fail to write event; connection error
* kafka sink: not sure about specific errors but there could be some as well
* avro interceptor: conversion failure, since this is based on kite sdk, we may 
need an interface that allow 3rd party to publish stats as well?

having the above as counters and not as one time indicators in the log file is 
very helpful when integrating with NMS, and reporting systems.


> add error stats to monitor URL
> ------------------------------
>
>                 Key: FLUME-3050
>                 URL: https://issues.apache.org/jira/browse/FLUME-3050
>             Project: Flume
>          Issue Type: Improvement
>          Components: Channel, Shell, Sinks+Sources
>    Affects Versions: v1.7.0
>            Reporter: Yuval Lifshitz
>              Labels: features
>
> currently error counters are not present when getting stats. for example:
> {code}
>  > curl http://my-flume-host:44444/metrics
> {"SINK.k1":{"ConnectionCreatedCount":"1","ConnectionClosedCount":"0","Type":"SINK","BatchCompleteCount":"0","BatchEmptyCount":"4","EventDrainAttemptCount":"10","StartTime":"1485348138992","EventDrainSuccessCount":"10","BatchUnderflowCount":"1","StopTime":"0","ConnectionFailedCount":"0"},"CHANNEL.c1":{"ChannelCapacity":"1000000","ChannelFillPercentage":"0.0","Type":"CHANNEL","ChannelSize":"0","EventTakeSuccessCount":"10","EventTakeAttemptCount":"15","StartTime":"1485348138990","EventPutAttemptCount":"10","EventPutSuccessCount":"10","StopTime":"0"},"SOURCE.r1":{"EventReceivedCount":"10","AppendBatchAcceptedCount":"0","Type":"SOURCE","AppendReceivedCount":"0","EventAcceptedCount":"10","StartTime":"1485348138993","AppendAcceptedCount":"0","OpenConnectionCount":"0","AppendBatchReceivedCount":"0","StopTime":"0"}}
> {code}
> return only "good" stats for source, channel and sink.
> to get error you need to look into the log file. this makes it hard to 
> integrate flume into automatic monitoring systems, NMS etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to