[ https://issues.apache.org/jira/browse/YUNIKORN-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720209#comment-17720209 ]
Peter Bacsko commented on YUNIKORN-1724: ---------------------------------------- [~wwei] the profiler shows {{FSM.Current()}} as the culprit. It can be expensive to call {{RLock()}} n*10000 times in a loop. We perform this twice each time we schedule. With few applications and lots of pods, sorting it's not a big deal if it's performed a few times per second. But I updated my PR with BTree so it's no longer a problem either. > Improve the performance of shim side scheduling cycle > ----------------------------------------------------- > > Key: YUNIKORN-1724 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1724 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Labels: pull-request-available > Attachments: getNewTasks.png > > > Performance testing of Yunikorn uncovered that a lot of time is spent in > {{Application.Schedule()}} in the shim. The problem is related to the fact > that we collect task objects based on their state which is maintained by > {{{}fsm.FSM{}}}. Even though we run {{Application.Schedule()}} once per > second, it's still an issue due to the large number of {{RWMutex.RLock()}} > calls. With a lot of pods, this consumes significant amount of CPU time. > Also, different code paths are affected: > The first is inside the switch-case part in {{{}Schedule(){}}}. We want to > know the number of tasks in "New" state and we end up scanning all task > objects for their status. > The second is retrieving the "New" tasks from {{taskMap}} structure. This is > done by {{GetNewTasks()}} / {{{}getTasks(){}}}, copying tasks based on their > respective state to a new slice. > To speed things up, we have to track the "New" tasks in a new map which is > dynamically maintained when a new task added and when it leaves the New state > (or the task gets removed). Knowing how many tasks we have also becomes > trivial and won't require slice iteration/filtering. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org