[ https://issues.apache.org/jira/browse/KAFKA-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718063#comment-17718063 ]
Sagar Rao commented on KAFKA-14952: ----------------------------------- hi [~blacktooth] thanks for reporting this. This certainly looks like a good addition. One thing to note is that you would need to create a KIP for this and post it on the discussion forum. Besides that, couple of other points => 1) I think this should be added at the task level metrics and not connector level metrics. 2) Also, these metrics can go in task error metrics as listed here [https://kafka.apache.org/31/generated/connect_metrics.html.] Looking forward to the KIP and contribution on this one! > Publish metrics when source connector fails to poll data > -------------------------------------------------------- > > Key: KAFKA-14952 > URL: https://issues.apache.org/jira/browse/KAFKA-14952 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect > Affects Versions: 3.3.2 > Reporter: Ravindranath Kakarla > Priority: Minor > Labels: connect, connect-api > Original Estimate: 168h > Remaining Estimate: 168h > > Currently, there is no metric in Kafka Connect to track when a source > connector fails to poll data from the source. This information would be > useful to operators and developers to visualize, monitor and alert when the > connector fails to poll records from the source. > Existing metrics like *kafka_producer_producer_metrics_record_error_total* > and *kafka_connect_task_error_metrics_total_record_failures* only cover > failures when producing data to the Kafka cluster but not when the source > task fails with a retryable exception or ConnectException. > Polling from source can fail due to unavailability of the source system or > errors with the connect configuration. Currently, this cannot be monitored > directly using metrics and instead operators have to rely on log diving which > is not consistent with how other metrics are monitored. > I propose adding new metrics to Kafka Connect, > "{_}source-record-poll-error-total{_}" and > "{_}source-record-poll-error-rate{_}" that can be used to monitor failures > during polling. > *source-record-poll-error-total* - The total number of times a source > connector failed to poll data from the source. This will include both > retryable and non-retryable exceptions. > *source-record-poll-error-rate* - The rate of above failures per unit of time. > These metrics would be tracked at the connector level and could be exposed > through the JMX along with the other metrics. > I am willing to submit a PR if this looks good, sample implementation code > below, > {code:java} > //AbstractWorkerSourceTask.java > protected List<SourceRecord> poll() throws InterruptedException { > try { > return task.poll(); > } catch (RetriableException | > org.apache.kafka.common.errors.RetriableException e) { > log.warn("{} failed to poll records from SourceTask. Will retry > operation.", this, e); > > sourceTaskMetricsGroup.recordPollError(); > // Do nothing. Let the framework poll whenever it's ready. > return null; > } catch (Throwable e) { > sourceTaskMetricsGroup.recordPollError(); > > throw e; > } > } {code} > [Reference|https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L460] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)