[ https://issues.apache.org/jira/browse/KAFKA-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472754#comment-16472754 ]
ASF GitHub Bot commented on KAFKA-6738: --------------------------------------- wicknicks opened a new pull request #5010: KAFKA-6738: Error Handling in Connect URL: https://github.com/apache/kafka/pull/5010 **_This PR is a WIP. It has been created to serve as a reference for discussions on the proposal at https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect._** This feature aims to change the Connect framework to allow it to automatically deal with errors while processing records in a Connector. The following behavior changes are introduced here: 1. **Retry on Failure**: Retry the failed operation a configurable number of times, with backoff between each retry. 2. **Task Tolerance Limits**: Tolerate up to a configurable number of failures in a task. We also add the following ways to report errors, along with sufficient context to simplify the debugging process: 1. **Log Error Context**: The error information along with processing context is logged along with the standard application logs. 2. **Dead Letter Queue**: Produce the error information and processing context into a Kafka topic. The logged information consists of the following bits: 1. Descriptions of the different stages (source/sink tasks, transformations and converters) in the connector, and their configs. 2. The record which caused the exception. 3. The exception and stack trace, if available. 4. Number of attempts (if applicable) made to execute the failed operation. 5. The time of error. New metrics which will monitor the number of failures, and the behavior of the response handler are added. The changes proposed here are **backward compatible**. The current behavior in Connect is to kill the task on the first error in any stage. This will remain the default behavior if the connector does not override any of the new configurations which are provided as part of this feature. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Kafka Connect handling of bad data > ---------------------------------- > > Key: KAFKA-6738 > URL: https://issues.apache.org/jira/browse/KAFKA-6738 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect > Affects Versions: 1.1.0 > Reporter: Randall Hauch > Assignee: Arjun Satish > Priority: Critical > Fix For: 2.0.0 > > > Kafka Connect connectors and tasks fail when they run into an unexpected > situation or error, but the framework should provide more general "bad data > handling" options, including (perhaps among others): > # fail fast, which is what we do today (assuming connector actually fails and > doesn't eat errors) > # retry (possibly with configs to limit) > # drop data and move on > # dead letter queue > This needs to be addressed in a way that handles errors from: > # The connector itself (e.g. connectivity issues to the other system) > # Converters/serializers (bad data, unexpected format, etc) > # SMTs > # Ideally the framework as well, though we obviously want to fix known bugs > anyway -- This message was sent by Atlassian JIRA (v7.6.3#76005)