[
https://issues.apache.org/jira/browse/KAFKA-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321183#comment-16321183
]
ASF GitHub Bot commented on KAFKA-4969:
---------------------------------------
bbejeck opened a new pull request #4410: KAFKA-4969: attempt to evenly
distribute load of tasks
URL: https://github.com/apache/kafka/pull/4410
This PR is an initial attempt to evenly distribute tasks with heavy
processing across clients using a somewhat naive approach.
The rationale is by making sure each task is not comprised entirely of the
same `topicGroupId`s,
then if there is one sub-topology doing heavy processing and another
sub-topology that is relatively light, the processing load is somewhat evenly
distributed.
This process only looks at active tasks; standby tasks are not given this
consideration as we can end up in a state where clients have the same task
assignments i.e [aT1, sT2] [aT2, sT1].
We plan to do a follow-on task at a later date where we weigh tasks with
state stores to
distribute tasks with state stores evenly.
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> State-store workload-aware StreamsPartitionAssignor
> ---------------------------------------------------
>
> Key: KAFKA-4969
> URL: https://issues.apache.org/jira/browse/KAFKA-4969
> Project: Kafka
> Issue Type: Sub-task
> Components: streams
> Reporter: Matthias J. Sax
>
> Currently, {{StreamPartitionsAssigner}} does not distinguish different
> "types" of tasks. For example, task can be stateless of have one or multiple
> stores.
> This can lead to an suboptimal task placement: assume there are 2 stateless
> and 2 stateful tasks and the app is running with 2 instances. To share the
> "store load" it would be good to place one stateless and one stateful task
> per instance. Right now, there is no guarantee about this, and it can happen,
> that one instance processed both stateless tasks while the other processes
> both stateful tasks.
> We should improve {{StreamPartitionAssignor}} and introduce "task types"
> including a cost model for task placement. We should consider the following
> parameters:
> - number of stores
> - number of sources/sinks
> - number of processors
> - regular task vs standby task
> This improvement should be backed by a design document in the project wiki
> (no KIP required though) as it's a fairly complex change.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)