lucasbru opened a new pull request, #20172:
URL: https://github.com/apache/kafka/pull/20172

   The original implementation uses a linear search to find the least loaded 
process in O(n), and we can replace this by look-ups in a heap is O(log(n)), as 
described below
   
   Active tasks: For active tasks, we can do exactly the same assignment as in 
the original algorithm by first building a heap (by load) of all processes. 
When we assign a task, we pick the head off the heap, assign the task to it, 
update the load, and re-insert it into the heap in O(log(n)).
   
   Standby tasks: For standby tasks, we cannot do this optimization directly, 
because of the order in which we assign tasks:
   
   1. We first try to assign task A to a process that previously owned A.
   2. If we did not find such a process, we assign A to the least loaded node.
   3. We now try to assign task B to a process that previously owned B
   4. If we did not find such a process, we assign B to the least loaded node
      ...
   
   The problem is that we cannot efficiently keep a heap (by load) throughout 
this process, because finding and removing process that previously owned A (and 
B and…) in the heap is O(n). We therefore need to change the order of 
evaluation to be able to use a heap:
   
   1. Try to assign all tasks A, B.. to a process that previously owned the task
   2. Build a heap.
   3. Assign all remaining tasks to the least-loaded process that does not yet 
own the task. Since at most NumStandbyReplicas already own the task, we can do 
it by removing up to NumStandbyReplicas from the top of the heap in O(log(n)), 
so we get O(log(NumProcesses)*NumStandbyReplicas).
   
   Note that the change in order changes the resulting standby assignments 
(although this difference does not show up in the existing unit tests). I would 
argue that the new order of assignment will actually yield better assignments, 
since the assignment will be more sticky, which has the potential to reduce the 
amount of store we have to restore from the changelog topic after assingments.
   
   In our worst-performing benchmark, this improves the runtime by ~107x.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to