Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/14384 )
Change subject: IMPALA-8999: make union scheduling work with mt_dop ...................................................................... Patch Set 11: (9 comments) http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.h@381 PS11, Line 381: For HDFS, this attempts to load balance among instances by computing the average : /// number of bytes per instances and then in a single pass assigning scan ranges to : /// each instance to roughly meet that average. nit: looks like we forgot to update this in the commit for IMPALA-9015 http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.cc@291 PS11, Line 291: one or more scan nodes for the cases with non-union fragments, we are now considering the parallelism of all scan nodes in the fragment as opposed to the left most scan node, what are the cases that this can happen in a plan? Maybe add that as a test case too http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.cc@332 PS11, Line 332: Instance selection for an interior fragment what does instance selection and interior fragment mean here? http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.cc@360 PS11, Line 360: hosts nit: instances_per_host http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.cc@363 PS11, Line 363: only factor in nit: got confused with this, read it as "only factor... in" vs the intended meaning of "only... factor in". maybe say "only consider parallelism" http://gerrit.cloudera.org:8080/#/c/14384/11/be/src/scheduling/scheduler.cc@368 PS11, Line 368: input_fragment_hosts nit: input_fragment_instances_per_host http://gerrit.cloudera.org:8080/#/c/14384/11/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: http://gerrit.cloudera.org:8080/#/c/14384/11/fe/src/main/java/org/apache/impala/planner/UnionNode.java@122 PS11, Line 122: Union fragments are scheduled on the union of hosts that the child fragments run : // on. nit: Union fragments are scheduled on the union of hosts of all scans in the fragment as well as the hosts of all its input fragments http://gerrit.cloudera.org:8080/#/c/14384/11/testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-scheduling.test File testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-scheduling.test: http://gerrit.cloudera.org:8080/#/c/14384/11/testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-scheduling.test@52 PS11, Line 52: it nit: if http://gerrit.cloudera.org:8080/#/c/14384/11/testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-scheduling.test@133 PS11, Line 133: add a test for a union fragment where the scan ranges for scan node are more than 4 but the max instances gets bounded by mt_dop -- To view, visit http://gerrit.cloudera.org:8080/14384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d2e9c86b530da3053e49d42b837dca0b1348ff2 Gerrit-Change-Number: 14384 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Sat, 19 Oct 2019 00:54:34 +0000 Gerrit-HasComments: Yes