[
https://issues.apache.org/jira/browse/KAFKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138093#comment-15138093
]
Jason Gustafson commented on KAFKA-3093:
----------------------------------------
[[email protected]] I'm going to go ahead and pick this up. Let me know if
you've already gotten started or if you want to help out. Maybe in this ticket,
we can focus only on exposing status information. This is already a fairly big
piece, so it might make more sense to do pause/restart commands in another JIRA
(there is already KAFKA-2482 for pause). It'd be great if you want to help out
with those issues.
Here's a quick sketch of the proposed implementation:
First, we differentiate the connector's target status as set by the user from
the runtime state of the connector and its tasks. The target state is the
persistent state of the Connector that will be resumed after every rebalance or
cluster restart. Initially, the only target state will be "started," but we
will add "paused" in KAFKA-2482. On the other hand, the runtime states
represent the actual current states of the connector and its tasks. This will
include the following states: rebalancing, running, and failed (we'll also add
paused later). In the failed state, we'll add exception trace information so
that users don't need to inspect the logs to find the actual problem.
Connector target states will be persisted in the config topic. This works
nicely since there is already a synchronization protocol on this topic which
ensures that all workers have read up to the same offset. This guarantees that
the workers will see the same target state of each connector after every
rebalance/restart.
Connector and task runtime states will be persisted in a new topic configured
with "status.storage.topic," which is consumed by all workers. We could
alternatively have only the leader consume this topic, but then the leader
would have to handle all status requests. It would also delay leader failover
since the new leader would have to read the entire log to catch up. The basic
idea is to have the owner of each connector/task write status updates to this
topic as they occur. For example, if the task raises an exception, the worker
will catch it and immediately write the failed state to the topic (note that we
won't attempt to implement restarting or any handling in this ticket).
We will add two APIs: one to get the full status of the connector (including
all of its tasks), and one task-level status API.
> Keep track of connector and task status info, expose it via the REST API
> ------------------------------------------------------------------------
>
> Key: KAFKA-3093
> URL: https://issues.apache.org/jira/browse/KAFKA-3093
> Project: Kafka
> Issue Type: Improvement
> Components: copycat
> Reporter: jin xing
> Assignee: jin xing
>
> Relate to KAFKA-3054;
> We should keep track of the status of connector and task during their
> startup, execution, and handle exceptions thrown by connector and task;
> Users should be able to fetch these informations by REST API and send some
> necessary commands(reconfiguring, restarting, pausing, unpausing) to
> connectors and tasks by REST API;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)