Bikramjeet Vig has posted comments on this change. (
http://gerrit.cloudera.org:8080/15926 )
Change subject: IMPALA-9655: Dynamic intra-node load balancing for HDFS scans
......................................................................
Patch Set 3:
(1 comment)
I also added an optimization for adding ranges marked to use hdfs cache to the
front of the shared queue. I found out that there was a regression of 10% for
TPCH q21 and had noticed that a scan node reading lineitem was slow which stood
out since it was already being read on other fragments in the plan on the same
node, so tried running the test again with this optimization and the results
ended up with no significant perf change.
Result without this optimization:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.01 | +2.80% | 4.38 | +1.15%
|
+----------+-----------------------+---------+------------+------------+----------------+
Results after this optimization:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 5.85 | +0.35% | 4.35 | +0.60%
|
+----------+-----------------------+---------+------------+------------+----------------+
http://gerrit.cloudera.org:8080/#/c/15926/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:
http://gerrit.cloudera.org:8080/#/c/15926/1/be/src/exec/hdfs-scan-node-base.cc@242
PS1, Line 242: for (auto ctx : instance_ctxs) {
> It's kinda weird that we split up the scan ranges between instances then me
Done. I initially thought of ripping out the per instance assignment along with
this in a separate patch, but didnt realize that kudu and hbase scan nodes
still use the per instance assignment. So instead just removed the LPT algo in
this itself
--
To view, visit http://gerrit.cloudera.org:8080/15926
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a101d0d98dff6e3779f85bc466e4c0bdb38094b
Gerrit-Change-Number: 15926
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 28 May 2020 19:12:37 +0000
Gerrit-HasComments: Yes