Bruno Cadonna created KAFKA-17489:
-------------------------------------
Summary: IllegalStateException if failed task is removed from
state updater
Key: KAFKA-17489
URL: https://issues.apache.org/jira/browse/KAFKA-17489
Project: Kafka
Issue Type: Task
Components: streams
Reporter: Bruno Cadonna
Assignee: Bruno Cadonna
Fix For: 3.9.0
If a task that is managed by the state updater fails (e.g.
{{OffsetOutOfRangeException}}) and this same task is removed from the state
updater, the task is regarded as corrupted and put into the task registry
waiting for handling.
Now there are multiple ways this leads to an {{IllegalStateException}}:
1. In {{handleAssignment()}} the tasks in the state updater are handled before
the tasks in the task registry. It could happen that a failed standby task is
removed from the state updater and is put in the task registry. When the tasks
in the task registry are handled, the standby task is identified. However, with
the state updater it is illegal to have standby tasks in the task regsitry. The
following {{IllegalStateException}} is thrown:
{code:java}
java.lang.IllegalStateException: Standby tasks should only be managed by the
state updater, but standby task 1_0 is managed by the stream thread
{code}
2. If a failed active task is removed from the state updater during handle
revocation ({{onPartitionRevoked()}} call in the {{ConsumerCoordinator}}), the
exception of the failed task is not immediately thrown by the
{{ConsumerCoordinator#onJoinComplete()}} method. The exception is stored and
{{onAssignment}} is called. Additionally, the failed task is put into the task
registry for later handling. Method {{onAssignment}} calls the
{{handleAssignment()}} which as above handles the tasks in the task registry.
Here two {{IllegalStateException}} are thrown:
{code:java}
java.lang.IllegalStateException: Illegal state RESTORING while recycling active
task 2_1
{code}
(This exception may differ according to the handling, e.g., recycling or
re-assigning)
and
{code:java}
java.lang.IllegalStateException: Task unknown: 2_1
{code}
The latter occurs because the failed task is handled and remove from the task
regsistry in {{handleAssignment()}} although it should stay there until the
original exception is handled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)