Guozhang Wang created KAFKA-9756:
------------------------------------

             Summary: Refactor the main loop to process more than one record of 
one task at a time
                 Key: KAFKA-9756
                 URL: https://issues.apache.org/jira/browse/KAFKA-9756
             Project: Kafka
          Issue Type: New Feature
          Components: streams
            Reporter: Guozhang Wang
            Assignee: Guozhang Wang


Our current main loop is implemented as the following:

1. Loop over all tasks that have records to process, each time process one 
record at a time.
2. After finish processing one record from each task, check if commit / 
punctuate / pool etc is needed.

Because we process one record at a time from the task and then moves on to the 
next task, we are effectively spending lots of time on context switches. Maybe 
we can first investigate what if we just have each task to be hosted by an 
individual thread, and see if the context switch cost is is not worse already 
(which means our current implementation is already a baseline). If that's true 
we can consider working on one task at a time, and see if it is more efficient.


For num.Iterations:
1. process one record from each of the tasks thread owns.
2. check if commit / punctuate / poll / etc needed.

But in 1) above we process tasks A,B,C,A,B,C,... and effectively we are 
introducing context switches within the thread as it needs to load the task 
variables etc for each record processed.

What I was thinking is to process tasks as A,A,A,B,B,B,C,C,C... so that we can 
reduce the context switches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to