Ravindranath Kakarla created KAFKA-14952:
--------------------------------------------
Summary: Publish metrics when source connector fails to poll data
Key: KAFKA-14952
URL: https://issues.apache.org/jira/browse/KAFKA-14952
Project: Kafka
Issue Type: Improvement
Components: KafkaConnect
Affects Versions: 3.3.2
Reporter: Ravindranath Kakarla
Currently, there is no metric in Kafka Connect to track when a source connector
fails to poll data from the source. This information would be useful to
operators and developers to visualize, monitor and alert when the connector
fails to poll records from the source.
Existing metrics like `kafka_producer_producer_metrics_record_error_total` and
`kafka_connect_task_error_metrics_total_record_failures` only cover failures
when producing data to the Kafka cluster but not when the source task fails
with a retryable exception or ConnectException.
Polling from source can fail due to unavailability of the source system or
errors with the connect configuration. Currently, this cannot be monitored
directly using metrics and instead operators have to rely on log diving which
is not consistent with how other metrics are monitored.
I propose adding new metrics to Kafka Connect, "source-record-poll-error-total"
and "source-record-poll-error-rate" that can be used to monitor failures during
polling.
`source-record-poll-error-total` - The total number of times a source connector
failed to poll data from the source. This will include both retryable and
non-retryable exceptions.
`source-record-poll-error-rate` - The rate of above failures per unit of time.
These metrics would be tracked at the connector level and could be exposed
through the JMX along with the other metrics.
I am willing to submit a PR if this looks good, sample implementation code
below,
{code:java}
//AbstractWorkerSourceTask.java
protected List<SourceRecord> poll() throws InterruptedException {
try {
return task.poll();
} catch (RetriableException |
org.apache.kafka.common.errors.RetriableException e) {
log.warn("{} failed to poll records from SourceTask. Will retry
operation.", this, e);
sourceTaskMetricsGroup.recordPollError();
// Do nothing. Let the framework poll whenever it's ready.
return null;
} catch (Throwable e) {
sourceTaskMetricsGroup.recordPollError();
throw e;
}
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)