[ 
https://issues.apache.org/jira/browse/KAFKA-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472754#comment-16472754
 ] 

ASF GitHub Bot commented on KAFKA-6738:
---------------------------------------

wicknicks opened a new pull request #5010: KAFKA-6738: Error Handling in Connect
URL: https://github.com/apache/kafka/pull/5010
 
 
   **_This PR is a WIP. It has been created to serve as a reference for 
discussions on the proposal at 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-298%3A+Error+Handling+in+Connect._**
   
   This feature aims to change the Connect framework to allow it to 
automatically deal with errors while processing records in a Connector. The 
following behavior changes are introduced here: 
   1. **Retry on Failure**: Retry the failed operation a configurable number of 
times, with backoff between each retry.
   2. **Task Tolerance Limits**: Tolerate up to a configurable number of 
failures in a task.
   
   We also add the following ways to report errors, along with sufficient 
context to simplify the  debugging process: 
   1. **Log Error Context**: The error information along with processing 
context is logged along with the standard application logs.
   2. **Dead Letter Queue**: Produce the error information and processing 
context into a Kafka topic.
   
   The logged information consists of the following bits:
   1. Descriptions of the different stages (source/sink tasks, transformations 
and converters) in the connector, and their configs.
   2. The record which caused the exception.
   3. The exception and stack trace, if available.
   4. Number of attempts (if applicable) made to execute the failed operation.
   5. The time of error.
   
   New metrics which will monitor the number of failures, and the behavior of 
the response handler are added.
   
   The changes proposed here are **backward compatible**. The current behavior 
in Connect is to kill the task on the first error in any stage. This will 
remain the default behavior if the connector does not override any of the new 
configurations which are provided as part of this feature.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kafka Connect handling of bad data
> ----------------------------------
>
>                 Key: KAFKA-6738
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6738
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 1.1.0
>            Reporter: Randall Hauch
>            Assignee: Arjun Satish
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> Kafka Connect connectors and tasks fail when they run into an unexpected 
> situation or error, but the framework should provide more general "bad data 
> handling" options, including (perhaps among others):
> # fail fast, which is what we do today (assuming connector actually fails and 
> doesn't eat errors)
> # retry (possibly with configs to limit)
> # drop data and move on
> # dead letter queue
> This needs to be addressed in a way that handles errors from:
> # The connector itself (e.g. connectivity issues to the other system)
> # Converters/serializers (bad data, unexpected format, etc)
> # SMTs
> # Ideally the framework as well, though we obviously want to fix known bugs 
> anyway



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to