[jira] [Commented] (KAFKA-10716) Streams processId is unstable across restarts resulting in task mass migration

A. Sophie Blee-Goldman (Jira) Thu, 12 Nov 2020 18:46:09 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231116#comment-17231116
 ]


A. Sophie Blee-Goldman commented on KAFKA-10716:
------------------------------------------------

There are a few possible ways forward here:

1) generate the processId from the client.id config, if specified. This 
requires users to set this config and ensure that it's unique to the instance
2) generate the processId from the group.instance.id, if specified. This would 
only work for static membership users
3) write/load the processId from the checkpoint file in task directories
4) write/load the processId from a single file in the top-level application 
directory

Both 1 & 2 would be simple for us to implement, but somewhat obnoxious to 
require of a user just for basic functionality of their app. That said, if a 
user already has specified either the client.id or group.instance.id, I don't 
see any reason _not_ to generate the processId from that. This might be a good 
stop-gap measure, but not a good permanent solution. However if we plan to 
implement KAFKA-10121 right away then maybe it's best not to mess around with 
options 3 or 4

Options 3 and 4 would be a bit trickier. Option 3 in particular seems to open 
up a lot of nasty possibilities, like the processId differing from one task 
directory to another, or even between threads in the same app. But Option 4 
seems pretty clean: we load the processId file within the KafkaStreams 
constructor, and if it's not found we generate a random UUID like we do now. 
This would all happen before any threads are created so no need to worry about 
them synchronizing at all

> Streams processId is unstable across restarts resulting in task mass migration
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-10716
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10716
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> The new high availability feature of KIP-441 relies on deterministic 
> assignment to produce an eventually-stable assignment. The 
> HighAvailabilityTaskAssignor assigns tasks based on the unique processId 
> assigned to each client, so if the same set of Kafka Streams applications 
> participate in a rebalance it should generate the same task assignment every 
> time.
> Unfortunately the processIds aren't stable across restarts. We generate a 
> random UUID in the KafkaStreams constructor, so each time the process starts 
> up it would be assigned a completely different processId. Unless this new 
> processId happens to be in exactly the same order as the previous one, a 
> single bounce or crash/restart can result in a large scale shuffling of tasks 
> based on a completely different eventual assignment.
> Ultimately we should fix this via KAFKA-10121, but that's a nontrivial 
> undertaking and this bug merits some immediate relief if we don't intend to 
> tackle the larger problem in the upcoming releases 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-10716) Streams processId is unstable across restarts resulting in task mass migration

Reply via email to