[ 
https://issues.apache.org/jira/browse/KAFKA-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718063#comment-17718063
 ] 

Sagar Rao commented on KAFKA-14952:
-----------------------------------

hi [~blacktooth] thanks for reporting this. This certainly looks like a good 
addition. One thing to note is that you would need to create a KIP for this and 
post it on the discussion forum. Besides that, couple of other points =>

1) I think this should be added at the task level metrics and not connector 
level metrics.

2) Also, these metrics can go in task error metrics as listed here 
[https://kafka.apache.org/31/generated/connect_metrics.html.]

 

Looking forward to the KIP and contribution on this one!

> Publish metrics when source connector fails to poll data
> --------------------------------------------------------
>
>                 Key: KAFKA-14952
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14952
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 3.3.2
>            Reporter: Ravindranath Kakarla
>            Priority: Minor
>              Labels: connect, connect-api
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, there is no metric in Kafka Connect to track when a source 
> connector fails to poll data from the source. This information would be 
> useful to operators and developers to visualize, monitor and alert when the 
> connector fails to poll records from the source.
> Existing metrics like *kafka_producer_producer_metrics_record_error_total* 
> and *kafka_connect_task_error_metrics_total_record_failures* only cover 
> failures when producing data to the Kafka cluster but not when the source 
> task fails with a retryable exception or ConnectException.
> Polling from source can fail due to unavailability of the source system or 
> errors with the connect configuration. Currently, this cannot be monitored 
> directly using metrics and instead operators have to rely on log diving which 
> is not consistent with how other metrics are monitored.
> I propose adding new metrics to Kafka Connect, 
> "{_}source-record-poll-error-total{_}" and 
> "{_}source-record-poll-error-rate{_}" that can be used to monitor failures 
> during polling.
> *source-record-poll-error-total* - The total number of times a source 
> connector failed to poll data from the source. This will include both 
> retryable and non-retryable exceptions.
> *source-record-poll-error-rate* - The rate of above failures per unit of time.
> These metrics would be tracked at the connector level and could be exposed 
> through the JMX along with the other metrics.
> I am willing to submit a PR if this looks good, sample implementation code 
> below,
> {code:java}
> //AbstractWorkerSourceTask.java
> protected List<SourceRecord> poll() throws InterruptedException {
>     try {
>         return task.poll();
>     } catch (RetriableException | 
> org.apache.kafka.common.errors.RetriableException e) {
>         log.warn("{} failed to poll records from SourceTask. Will retry 
> operation.", this, e);
>       
>          sourceTaskMetricsGroup.recordPollError();
>         // Do nothing. Let the framework poll whenever it's ready.
>         return null;
>     } catch (Throwable e) {
>         sourceTaskMetricsGroup.recordPollError();
>         
>         throw e;
>     }
> } {code}
> [Reference|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L460]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to