[ https://issues.apache.org/jira/browse/AURORA-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15679664#comment-15679664 ]
Zameer Manji commented on AURORA-1823: -------------------------------------- Upon some further analysis {{BatchWorker}} might not help us here. After some JMH benchmarking and profiling, the biggest problem with {{insertPendingTasks}} is that it doesn't use the bulk storage API {{saveTasks}}. Instead it calls {{mutateTask}} for every task that is moving to {{PENDING}}. I can get a 10x+ improvement in throughput by simply queueing up mutations and side effects that are a result of the state machine and then calling {{saveTasks}} once all of the mutations have been computed. I'm going to look into refactoring {{StateManagerImpl}} to support evaluating multiple task state machine concurrently and then merging all of the side effects from those state machines into a single operation. > `createJob` API uses single thread to move all tasks to PENDING > ---------------------------------------------------------------- > > Key: AURORA-1823 > URL: https://issues.apache.org/jira/browse/AURORA-1823 > Project: Aurora > Issue Type: Bug > Reporter: Zameer Manji > Priority: Minor > > If you create a single job with many tasks (lets say 10k+) the `createJob` > API will take a long time. This is because the `createJob` API only returns > when all of the tasks have moved to PENDING and it uses a single thread to do > so. Here is a snippet of the logs: > {noformat} > ... > I1116 17:11:53.964 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57114-8aff8e77-3bde-4a83-99eb-8c6e52f14a7a > state machine transition INIT -> PENDING > I1116 17:11:53.965 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57114-8aff8e77-3bde-4a83-99eb-8c6e52f14a7a > I1116 17:11:54.094 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57115-f5baa93f-78af-470d-bcdf-1d86c0b98c80 > state machine transition INIT -> PENDING > I1116 17:11:54.094 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57115-f5baa93f-78af-470d-bcdf-1d86c0b98c80 > I1116 17:11:54.223 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57116-0553d98c-f5de-4857-9a70-c5c748ddee03 > state machine transition INIT -> PENDING > I1116 17:11:54.224 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57116-0553d98c-f5de-4857-9a70-c5c748ddee03 > I1116 17:11:54.353 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57117-46e168f6-8753-4be0-873d-f18d1f562570 > state machine transition INIT -> PENDING > I1116 17:11:54.353 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57117-46e168f6-8753-4be0-873d-f18d1f562570 > I1116 17:11:54.482 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57118-ac94b4fb-f319-4ca2-b788-2ee093ef1c67 > state machine transition INIT -> PENDING > I1116 17:11:54.482 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57118-ac94b4fb-f319-4ca2-b788-2ee093ef1c67 > I1116 17:11:54.611 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57119-060ef7fc-7e17-4f8c-83dc-216550332153 > state machine transition INIT -> PENDING > I1116 17:11:54.612 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57119-060ef7fc-7e17-4f8c-83dc-216550332153 > I1116 17:11:54.741 [qtp1219612889-50, StateMachine$Builder:389] > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57120-c163c750-3658-44b7-b1ea-43f5d503f7c9 > state machine transition INIT -> PENDING > I1116 17:11:54.742 [qtp1219612889-50, TaskStateMachine:474] Adding work > command SAVE_STATE for > sparker1-devel-echo-8017fae7-f592-49c7-bfef-fac912abecaa-57120-c163c750-3658-44b7-b1ea-43f5d503f7c9 > ... > {noformat} > Observe that a single jetty thread is doing this. > We should leverage {{BatchWorker}} to have concurrent mutations here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)