Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 )
Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE ...................................................................... Patch Set 10: (26 comments) http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@455 PS10, Line 455: jbyteArray thrift_in_query_options) { > Not sure what 'thrift_in_query_options' means. How about using 'tquery_opti I've renamed the parameter. I decided to pass in the existing query options because the thrift-generated C++ and Java TQueryOptions classes don't handle the isset flags properly. If I don't pass in the existing query options, then: 1. I'd have to create an "empty" TQueryOptions instance inside this function. 2. The __isset member of that instance would contain true values for every option even though they haven't been set explicitly. Therefore I'd have to overwrite __isset with false values somehow before calling impala::ParseQueryOptions(). 3. When FeSupport.ParseQueryOptions() deserializes the TQueryOptions instance returned by Java_org_apache_impala_service_FeSupport_NativeParseQueryOptions(), it would still find that some query options have been set, even if they haven't been touched (e.g. TQueryOptions.setExplain_levelIsSet() always returns true since explain_level != null even if it hasn't been set explicitly). I'm not sure why thrift-generated classes behave this way. Maybe there's a better way to work around the isset bugs, but I couldn't think of any. http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@457 PS10, Line 457: DeserializeThriftMsg(env, thrift_in_query_options, &options); > Needs error handling, e.g. using THROW_IF_ERROR_RET Done http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@458 PS10, Line 458: const char *o = env->GetStringUTFChars(options_str, NULL); > * You need to release the string UTF chars, take a look at JniUtfCharGuard Done http://gerrit.cloudera.org:8080/#/c/7564/10/be/src/service/fe-support.cc@461 PS10, Line 461: TParseQueryOptionsResult result; > I think it's simpler to convert all Status to an exception. That way we don Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/Planner.java@119 PS10, Line 119: // Create runtime filters. 'singleNodePlan' now points to the root of the distributed > Comment is wrong. 'singleNodePlan' definitely does not point to the root of Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@354 PS10, Line 354: public void addTarget(RuntimeFilterTarget target) { targets_.add(target); } > add newline before Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@397 PS10, Line 397: Preconditions.checkState(maxNumFilters >= 0); > Why is 0 not a valid value? 0 is a valid value. MAX_NUM_RUNTIME_FILTERS=0 effectively removes every runtime filter. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@423 PS10, Line 423: filter.computeNdvEstimate(); > My understanding is that IMPALA-3450 has been fixed and we can remove this Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@537 PS10, Line 537: * The assigned filters are the ones for which 'scanNode' can be used a destination > can be used as a destination node Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@539 PS10, Line 539: * 1. If DISABLE_ROW_RUNTIME_FILTERING query option is set, a filter is only assigned to > If the DISABLE_ROW_RUNTIME_FILTERING query option is set ... Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@541 PS10, Line 541: * 2. If RUNTIME_FILTER_MODE query option is set to LOCAL, a filter is only assigned to > If the RUNTIME_FILTER_MODE query option is set ... Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@542 PS10, Line 542: * 'scanNode' if the filter is local to the scan node. > This doesn't really explain what the LOCAL option means. How about: Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@563 PS10, Line 563: if (!isSingleNodeExec && runtimeFilterMode == TRuntimeFilterMode.LOCAL && > Why the !isSingleNodeExec condition? I think that !isLocalTarget implies !i Correct, removed !isSingleNodeExec http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@576 PS10, Line 576: static private boolean IsLocalTarget(RuntimeFilter filter, ScanNode targetNode) { > isLocalTarget() Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@583 PS10, Line 583: static private boolean IsBoundByPartitionColumns(Analyzer analyzer, Expr targetExpr, > isBoundByPartitionColumns() Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@92 PS10, Line 92: public native static byte[] NativeParseQueryOptions(String optionsStr, > to be clearer call the first arg csvQueryOptions Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@93 PS10, Line 93: > Can we get rid of the second argument and use PlannerTestBase.mergeQueryOpt See my response to be/src/service/fe-support.cc L455. The thrift-generated TQueryOptions classes don't handle the isset flags properly which would confuse mergeQueryOptions(). http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/main/java/org/apache/impala/service/FeSupport.java@282 PS10, Line 282: public static TParseQueryOptionsResult ParseQueryOptions(String optionsStr, > Public function needs a comment. I've modified the method to return TQueryOptions and throw an exception if there was an error. http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@300 PS10, Line 300: public void testRuntimeFilterPruning() { > testRuntimeFilterQueryOptions()? Done http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/10/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@288 PS10, Line 288: parseQueryOptions(currentTestCase); > Why do we need to call this here and in L338? Isn't the call in L338 enough L288-L293 handles test cases found in the middle of the test file. L338-L344 handles the very last test case found at the end of the test file. We need to parse the query-options section in both places. http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test: http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@2 PS10, Line 2: select count(*) from functional.alltypes a > When using join hints it is recommended to also use straight_join. Also ple Done http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@85 PS10, Line 85: select count(*) from functional.alltypes a > Can we make it more obvious which runtime filters are the most selective? T Done http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@91 PS10, Line 91: ---- PLAN > To reduce the number of test lines, let's only keep those sections that are Done http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@167 PS10, Line 167: # DISABLE_ROW_RUNTIME_FILTERING is set: only partition column filters are applied > What's the effect of setting MAX_NUM_RUNTIME_FILTERS=3 in this test? Does i Yes, it does. I've added a new test case to test this. http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@252 PS10, Line 252: # RUNTIME_FILTER_MODE is set to LOCAL: only local filters are applied > Same question as above. What happens when we set MAX_NUM_RUNTIME_FILTERS=4. Yes, it does. I've added a new test case to test this. http://gerrit.cloudera.org:8080/#/c/7564/10/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-pruning.test@418 PS10, Line 418: # RUNTIME_FILTER_MODE is OFF: no filters are applied > Is this the same as MAX_NUM_RUNTIME_FILTERS=0? Yes, it is. Added a new test case for MAX_NUM_RUNTIME_FILTERS=0 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 10 Gerrit-Owner: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <mjac...@apache.org> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Tue, 24 Oct 2017 18:59:40 +0000 Gerrit-HasComments: Yes