[ https://issues.apache.org/jira/browse/KAFKA-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321183#comment-16321183 ]
ASF GitHub Bot commented on KAFKA-4969: --------------------------------------- bbejeck opened a new pull request #4410: KAFKA-4969: attempt to evenly distribute load of tasks URL: https://github.com/apache/kafka/pull/4410 This PR is an initial attempt to evenly distribute tasks with heavy processing across clients using a somewhat naive approach. The rationale is by making sure each task is not comprised entirely of the same `topicGroupId`s, then if there is one sub-topology doing heavy processing and another sub-topology that is relatively light, the processing load is somewhat evenly distributed. This process only looks at active tasks; standby tasks are not given this consideration as we can end up in a state where clients have the same task assignments i.e [aT1, sT2] [aT2, sT1]. We plan to do a follow-on task at a later date where we weigh tasks with state stores to distribute tasks with state stores evenly. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > State-store workload-aware StreamsPartitionAssignor > --------------------------------------------------- > > Key: KAFKA-4969 > URL: https://issues.apache.org/jira/browse/KAFKA-4969 > Project: Kafka > Issue Type: Sub-task > Components: streams > Reporter: Matthias J. Sax > > Currently, {{StreamPartitionsAssigner}} does not distinguish different > "types" of tasks. For example, task can be stateless of have one or multiple > stores. > This can lead to an suboptimal task placement: assume there are 2 stateless > and 2 stateful tasks and the app is running with 2 instances. To share the > "store load" it would be good to place one stateless and one stateful task > per instance. Right now, there is no guarantee about this, and it can happen, > that one instance processed both stateless tasks while the other processes > both stateful tasks. > We should improve {{StreamPartitionAssignor}} and introduce "task types" > including a cost model for task placement. We should consider the following > parameters: > - number of stores > - number of sources/sinks > - number of processors > - regular task vs standby task > This improvement should be backed by a design document in the project wiki > (no KIP required though) as it's a fairly complex change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)