[ 
https://issues.apache.org/jira/browse/KAFKA-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindranath Kakarla updated KAFKA-14952:
-----------------------------------------
    Description: 
Currently, there is no metric in Kafka Connect to track when a source connector 
fails to poll data from the source. This information would be useful to 
operators and developers to visualize, monitor and alert when the connector 
fails to poll records from the source.

Existing metrics like *kafka_producer_producer_metrics_record_error_total* and 
*kafka_connect_task_error_metrics_total_record_failures* only cover failures 
when producing data to the Kafka cluster but not when the source task fails 
with a retryable exception or ConnectException.

Polling from source can fail due to unavailability of the source system or 
errors with the connect configuration. Currently, this cannot be monitored 
directly using metrics and instead operators have to rely on log diving which 
is not consistent with how other metrics are monitored.

I propose adding new metrics to Kafka Connect, "source-record-poll-error-total" 
and "source-record-poll-error-rate" that can be used to monitor failures during 
polling.

_*source-record-poll-error-total*_ - The total number of times a source 
connector failed to poll data from the source. This will include both retryable 
and non-retryable exceptions.

_*source-record-poll-error-rate*_ - The rate of above failures per unit of time.

These metrics would be tracked at the connector level and could be exposed 
through the JMX along with the other metrics.

I am willing to submit a PR if this looks good, sample implementation code 
below,

 
{code:java}
//AbstractWorkerSourceTask.java

protected List<SourceRecord> poll() throws InterruptedException {
    try {
        return task.poll();
    } catch (RetriableException | 
org.apache.kafka.common.errors.RetriableException e) {
        log.warn("{} failed to poll records from SourceTask. Will retry 
operation.", this, e);
      
         sourceTaskMetricsGroup.recordPollError();

        // Do nothing. Let the framework poll whenever it's ready.
        return null;
    } catch (Throwable e) {
        sourceTaskMetricsGroup.recordPollError();
        
        throw e;
    }
} {code}
[Reference|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L460]

 

 

  was:
Currently, there is no metric in Kafka Connect to track when a source connector 
fails to poll data from the source. This information would be useful to 
operators and developers to visualize, monitor and alert when the connector 
fails to poll records from the source.

Existing metrics like `kafka_producer_producer_metrics_record_error_total` and 

`kafka_connect_task_error_metrics_total_record_failures` only cover failures 
when producing data to the Kafka cluster but not when the source task fails 
with a retryable exception or ConnectException.

Polling from source can fail due to unavailability of the source system or 
errors with the connect configuration. Currently, this cannot be monitored 
directly using metrics and instead operators have to rely on log diving which 
is not consistent with how other metrics are monitored.

I propose adding new metrics to Kafka Connect, "source-record-poll-error-total" 
and "source-record-poll-error-rate" that can be used to monitor failures during 
polling.

`source-record-poll-error-total` - The total number of times a source connector 
failed to poll data from the source. This will include both retryable and 
non-retryable exceptions.

`source-record-poll-error-rate` - The rate of above failures per unit of time. 

These metrics would be tracked at the connector level and could be exposed 
through the JMX along with the other metrics.

I am willing to submit a PR if this looks good, sample implementation code 
below,

 
{code:java}
//AbstractWorkerSourceTask.java

protected List<SourceRecord> poll() throws InterruptedException {
    try {
        return task.poll();
    } catch (RetriableException | 
org.apache.kafka.common.errors.RetriableException e) {
        log.warn("{} failed to poll records from SourceTask. Will retry 
operation.", this, e);
      
         sourceTaskMetricsGroup.recordPollError();

        // Do nothing. Let the framework poll whenever it's ready.
        return null;
    } catch (Throwable e) {
        sourceTaskMetricsGroup.recordPollError();
        
        throw e;
    }
} {code}
 

 

 


> Publish metrics when source connector fails to poll data
> --------------------------------------------------------
>
>                 Key: KAFKA-14952
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14952
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 3.3.2
>            Reporter: Ravindranath Kakarla
>            Priority: Minor
>              Labels: connect, connect-api
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently, there is no metric in Kafka Connect to track when a source 
> connector fails to poll data from the source. This information would be 
> useful to operators and developers to visualize, monitor and alert when the 
> connector fails to poll records from the source.
> Existing metrics like *kafka_producer_producer_metrics_record_error_total* 
> and *kafka_connect_task_error_metrics_total_record_failures* only cover 
> failures when producing data to the Kafka cluster but not when the source 
> task fails with a retryable exception or ConnectException.
> Polling from source can fail due to unavailability of the source system or 
> errors with the connect configuration. Currently, this cannot be monitored 
> directly using metrics and instead operators have to rely on log diving which 
> is not consistent with how other metrics are monitored.
> I propose adding new metrics to Kafka Connect, 
> "source-record-poll-error-total" and "source-record-poll-error-rate" that can 
> be used to monitor failures during polling.
> _*source-record-poll-error-total*_ - The total number of times a source 
> connector failed to poll data from the source. This will include both 
> retryable and non-retryable exceptions.
> _*source-record-poll-error-rate*_ - The rate of above failures per unit of 
> time.
> These metrics would be tracked at the connector level and could be exposed 
> through the JMX along with the other metrics.
> I am willing to submit a PR if this looks good, sample implementation code 
> below,
>  
> {code:java}
> //AbstractWorkerSourceTask.java
> protected List<SourceRecord> poll() throws InterruptedException {
>     try {
>         return task.poll();
>     } catch (RetriableException | 
> org.apache.kafka.common.errors.RetriableException e) {
>         log.warn("{} failed to poll records from SourceTask. Will retry 
> operation.", this, e);
>       
>          sourceTaskMetricsGroup.recordPollError();
>         // Do nothing. Let the framework poll whenever it's ready.
>         return null;
>     } catch (Throwable e) {
>         sourceTaskMetricsGroup.recordPollError();
>         
>         throw e;
>     }
> } {code}
> [Reference|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L460]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to