[ 
https://issues.apache.org/jira/browse/KAFKA-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321183#comment-16321183
 ] 

ASF GitHub Bot commented on KAFKA-4969:
---------------------------------------

bbejeck opened a new pull request #4410: KAFKA-4969: attempt to evenly 
distribute load of tasks
URL: https://github.com/apache/kafka/pull/4410
 
 
   This PR is an initial attempt to evenly distribute tasks with heavy 
processing across clients using a somewhat naive approach.
   
   The rationale is by making sure each task is not comprised entirely of the 
same `topicGroupId`s, 
   then if there is one sub-topology doing heavy processing and another 
sub-topology that is relatively light, the processing load is somewhat evenly 
distributed.
   
   This process only looks at active tasks; standby tasks are not given this 
consideration as we can end up in a state where clients have the same task 
assignments i.e [aT1, sT2] [aT2, sT1].
   
   We plan to do a follow-on task at a later date where we weigh tasks with 
state stores to
   distribute tasks with state stores evenly.
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> State-store workload-aware StreamsPartitionAssignor
> ---------------------------------------------------
>
>                 Key: KAFKA-4969
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4969
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: Matthias J. Sax
>
> Currently, {{StreamPartitionsAssigner}} does not distinguish different 
> "types" of tasks. For example, task can be stateless of have one or multiple 
> stores.
> This can lead to an suboptimal task placement: assume there are 2 stateless 
> and 2 stateful tasks and the app is running with 2 instances. To share the 
> "store load" it would be good to place one stateless and one stateful task 
> per instance. Right now, there is no guarantee about this, and it can happen, 
> that one instance processed both stateless tasks while the other processes 
> both stateful tasks.
> We should improve {{StreamPartitionAssignor}} and introduce "task types" 
> including a cost model for task placement. We should consider the following 
> parameters:
>  - number of stores
>  - number of sources/sinks
>  - number of processors
>  - regular task vs standby task
> This improvement should be backed by a design document in the project wiki 
> (no KIP required though) as it's a fairly complex change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to