[ 
https://issues.apache.org/jira/browse/KAFKA-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sophie Blee-Goldman updated KAFKA-9178:
---------------------------------------
    Affects Version/s: 2.4.0

> restoredPartitions is not cleared until the last restoring task completes
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-9178
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9178
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Boyang Chen
>            Assignee: Boyang Chen
>            Priority: Blocker
>              Labels: streams
>             Fix For: 2.4.0
>
>
> We check the `active` set is empty during closeLostTasks(). However we don't 
> currently properly clear the {{restoredPartitions}} set in some edge cases:
> We only remove partitions from {{restoredPartitions}} when a) all tasks are 
> done restoring, at which point we clear it entirely(in 
> {{AssignedStreamTasks#updateRestored}}), or b) one task at a time, when that 
> task is restoring and is closed.
> Say some partitions were still restoring while others had completed and 
> transitioned to running when a rebalance occurs. The still-restoring tasks 
> are all revoked, and closed immediately, and their partitions removed from 
> {{restoredPartitions}}. We also suspend & revoke some running tasks that have 
> finished restoring, and remove them from {{running}}/{{runningByPartition}}.
> Now we have only running tasks left, so in 
> {{TaskManager#updateNewAndRestoringTasks}} we don’t ever even call 
> {{AssignedStreamTasks#updateRestored }}and therefore we never get to clear 
> {{restoredPartitions}}. We then close each of the currently running tasks and 
> remove their partitions from everything, BUT we never got to remove or clear 
> the partitions of the running tasks that we revoked previously.
> It turns out we can't just rely on removing from {{restoredPartitions }}upon 
> completion since the partitions will just be added back to it during the next 
> loop (blocked by KAFKA-9177). For now, we should just remove partitions from 
> {{restoredPartitions}} when closing or suspending running tasks as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to