Ewen Cheslack-Postava created KAFKA-2479:
--------------------------------------------
Summary: Add CopycatExceptions to indicate transient and permanent
errors in a connector/task
Key: KAFKA-2479
URL: https://issues.apache.org/jira/browse/KAFKA-2479
Project: Kafka
Issue Type: Sub-task
Components: copycat
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Sometimes the connector will need to indicate to the framework that an error
occurred, but the error could have multiple responses by the framework.
For source connectors, there's not much they need to indicate since they can
block indefinitely. They probably only need to indicate permanent errors for
correctness, though we may want them to indicate transient errors so we can
report health of the task in a metric.
For sink connectors, there are at least a couple of scenarios:
1. A task encounters some error while processing a {{put(records)}} call and
was unable to fully process it, but thinks it could be resolved in the future.
The task doesn't want to see any new records until the issue is resolved, but
will need to see the same set of records again. (It would be nice if the task
doesn't have to deal with saving these to a buffer itself.)
2. A task encounters some error while processing data, but it has
enqueued/handled the data passed into the {{put(records)}} call. For example,
it may have passed it to some library which buffers it, but then the library
indicated that it is having some connection issues. The connector might be able
accept more data, but the task is not in a healthy state.
3. The task encounters some error that it decides is unrecoverable. This might
just be transient errors that repeat for long enough that the task thinks its
time to give up. Unclear what to do here, but one option is relocating the task
to another worker, hoping that the issue is specific to the worker.
Note that it is not, generally, safe for sink tasks to do their own backoff or
we'd potentially starve the consumer, which needs to poll() in order to
heartbeat. So we need to make sure whatever mechanism we implement encourages
the user to throw an exception and pass control back to us instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)