Rajesh Balamohan created TEZ-978:
------------------------------------
Summary: Enhance auto parallelism tuning for queries having empty
outputs or data skewness
Key: TEZ-978
URL: https://issues.apache.org/jira/browse/TEZ-978
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rajesh Balamohan
Running tpcds (query-92) with auto-tuning
"tez.am.shuffle-vertex-manager.enable.auto-parallel" degraded the performance
than original run.
Query has lots of empty outputs and these tasks tend to complete a lot more
faster than others. Tez computes the parallelism with the given information
(wherein most of the output is empty) and set the reducers to "1". When other
tasks complete, single reducer has to do the heavy lifting and this causes the
performance degradation.
Map 1: 2/181 Map 5: 16/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 2/181 Map 5: 22/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 2/181 Map 5: 25/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 2/181 Map 5: 30/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 2/181 Map 5: 35/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 2/181 Map 5: 36/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 2/181 Map 5: 39/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 3/181 Map 5: 43/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/166
Map 1: 5/181 Map 5: 46/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1 <===
ShuffleVertexManager changing parallelism
Map 1: 5/181 Map 5: 63/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 7/181 Map 5: 72/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 7/181 Map 5: 83/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 8/181 Map 5: 95/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 8/181 Map 5: 104/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 9/181 Map 5: 116/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 12/181 Map 5: 123/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 13/181 Map 5: 127/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 16/181 Map 5: 127/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 17/181 Map 5: 128/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 18/181 Map 5: 131/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 19/181 Map 5: 131/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 25/181 Map 5: 132/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 33/181 Map 5: 132/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 42/181 Map 5: 134/179 Map 7: 1/1 Map 8: 1/1 Reducer 2:
0/109 Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1 <===
ShuffleVertexManager changing parallelism
Map 1: 51/181 Map 5: 135/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1
Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 58/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1
Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 63/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1
Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Map 1: 70/181 Map 5: 136/179 Map 7: 1/1 Map 8: 1/1 Reducer 2: 0/1
Reducer 3: 0/137 Reducer 4: 0/1 Reducer 6: 0/1
Suggestion is to include
1. Empty output information when computing auto-parallelism.
2. Have a configurable value for determining the average output from the source
(e.g minimum of 1 MB output from each source). If the average task output size
does not meet this criteria (which means all the completed tasks are small
tasks), we can defer the computation of auto-parallelism until other tasks are
completed.
--
This message was sent by Atlassian JIRA
(v6.2#6252)