[ https://issues.apache.org/jira/browse/TEZ-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119672#comment-14119672 ]
Rajesh Balamohan commented on TEZ-1522: --------------------------------------- Update: Patch for TEZ-1494 would not completely solve the issue listed here. I was able to simulate out of order execution scenario with the patch by having R3->M5 via Scatter_Gather and M7->M5 via broadcast (Instead of having 2 broadcast edges listed in this JIRA). > Scheduling can result in out of order execution and slowdown of upstream work > ----------------------------------------------------------------------------- > > Key: TEZ-1522 > URL: https://issues.apache.org/jira/browse/TEZ-1522 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Critical > Labels: performance > Attachments: TEZ-1522.am.log.gz, task_runtime.svg > > > M2 M7 > \ / > (sg) \ / > R3 / (b) > \ / > (b) \ / > \ / > M5 > | > R6 > Plz refer to the attachment (task runtime SVG). In this case, M5 got > scheduled much earlier than R3 (green color in the diagram) and retained lots > of containers. > R3 got less containers to work with. > Attaching the output from the status monitor when the job ran; Map_5 has > taken up almost all of cluster resource, whereas Reducer_3 got fraction of > the capacity. > Map_2: 1/1 Map_5: 0(+373)/1000 Map_7: 1/1 Reducer_3: 0/8000 > Reducer_6: 0/1 > Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0/8000 > Reducer_6: 0/1 > Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0(+1)/8000 > Reducer_6: 0/1 > .... > Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: > 14(+7)/8000 Reducer_6: 0/1 > Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: > 63(+14)/8000 Reducer_6: 0/1 > Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: > 159(+22)/8000 Reducer_6: 0/1 > Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: > 308(+29)/8000 Reducer_6: 0/1 > ... > Creating this JIRA as a placeholder for scheduler enhancement. One > possibililty could be to > schedule lesser number of tasks in downstream vertices, based on the > information available for the upstream vertex. -- This message was sent by Atlassian JIRA (v6.3.4#6332)