[Impala-ASF-CR] IMPALA-10434: Fix impala-shell's unicode regressions on Python2
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16960 ) Change subject: IMPALA-10434: Fix impala-shell's unicode regressions on Python2 .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8013/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4 Gerrit-Change-Number: 16960 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 17 Jan 2021 03:49:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10421: [DOCS] Documented the JOIN ROWS PRODUCED LIMIT query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16922 ) Change subject: IMPALA-10421: [DOCS] Documented the JOIN_ROWS_PRODUCED_LIMIT query option .. Patch Set 2: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/616/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/16922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d422889c433062456748a953b33e3d43799be14 Gerrit-Change-Number: 16922 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 17 Jan 2021 03:42:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10421: [DOCS] Documented the JOIN ROWS PRODUCED LIMIT query option
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16922 ) Change subject: IMPALA-10421: [DOCS] Documented the JOIN_ROWS_PRODUCED_LIMIT query option .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16922/1/docs/topics/impala_join_rows_produced_limit.xml File docs/topics/impala_join_rows_produced_limit.xml: http://gerrit.cloudera.org:8080/#/c/16922/1/docs/topics/impala_join_rows_produced_limit.xml@44 PS1, Line 44: any one of the > To be more accurate: ...when any one of the joins in the query produces mo Done -- To view, visit http://gerrit.cloudera.org:8080/16922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d422889c433062456748a953b33e3d43799be14 Gerrit-Change-Number: 16922 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 17 Jan 2021 03:35:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10421: [DOCS] Documented the JOIN ROWS PRODUCED LIMIT query option
Hello Aman Sinha, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16922 to look at the new patch set (#2). Change subject: IMPALA-10421: [DOCS] Documented the JOIN_ROWS_PRODUCED_LIMIT query option .. IMPALA-10421: [DOCS] Documented the JOIN_ROWS_PRODUCED_LIMIT query option - Minor edit Change-Id: I3d422889c433062456748a953b33e3d43799be14 --- M docs/impala.ditamap A docs/topics/impala_join_rows_produced_limit.xml 2 files changed, 73 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16922/2 -- To view, visit http://gerrit.cloudera.org:8080/16922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3d422889c433062456748a953b33e3d43799be14 Gerrit-Change-Number: 16922 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10421: [DOCS] Documented the JOIN ROWS PRODUCED LIMIT query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16922 ) Change subject: IMPALA-10421: [DOCS] Documented the JOIN_ROWS_PRODUCED_LIMIT query option .. Patch Set 2: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/616/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/16922 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d422889c433062456748a953b33e3d43799be14 Gerrit-Change-Number: 16922 Gerrit-PatchSet: 2 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sun, 17 Jan 2021 03:35:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10434: Fix impala-shell's unicode regressions on Python2
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16960 Change subject: IMPALA-10434: Fix impala-shell's unicode regressions on Python2 .. IMPALA-10434: Fix impala-shell's unicode regressions on Python2 To make impala-shell compatible for Python3, we explicitly distinguish bytes and text in Python2 by decoding the bytes for all inputs. Regression 1: multiple queries in one line with unicode chars will break In precmd() of impala-shell, if there are multiple queries present in one input line, we split it into individual queries (by sqlparse.split()) and append them back to the 'cmdqueue'. They will be passed to precmd() again. In our Python2 implementation, precmd() expects them to be str type, and will decode them into unicode type. However, the output type of sqlparse.split() is unicode which doesn't have a decode() method. Calling decode() on a unicode var will let Python2 implicitly encode it to str. This may cause UnicodeEncodeError since implicitly encoding use 'ascii'. Regression 2: multi-line query with unicode chars will break when command history is enabled In _check_for_command_completion(), when calling readline.replace_history_item in Python2. We encode the completed_cmd into bytes. However, we shouldn't replace it since the return type is expected to be unicode. Tests: - Add tests for these two regressions in Python2. Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4 --- M shell/impala_shell.py M tests/shell/test_shell_interactive.py 2 files changed, 30 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16960/1 -- To view, visit http://gerrit.cloudera.org:8080/16960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icc4a8d31311a5c59e5fc0e65fe09f770df41bea4 Gerrit-Change-Number: 16960 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-10296: Fix analytic limit pushdown when predicates are present
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16942 ) Change subject: IMPALA-10296: Fix analytic limit pushdown when predicates are present .. Patch Set 11: (12 comments) Sending comments based on 1st pass of the planner changes. http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@9 PS11, Line 9: when nit: remove http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@22 PS11, Line 22: limit did you mean partition limit >= order by limit ? http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@32 PS11, Line 32: was nit: 'is' http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@40 PS11, Line 40: This patch implements tie handling in the backend (I took most I had previously wondered about how the planner and backend work for this functionality would be combined but it now starts falling into place with the handling of the duplicates :-) http://gerrit.cloudera.org:8080/#/c/16942/11//COMMIT_MSG@68 PS11, Line 68: The for nit: 'The elapsed time for..' ? http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@507 PS11, Line 507: upper top-n. Since we do a distributed top-n, there are 3 top-n's in the plan and the 'upper top-n' may be confusing. Here it refers to the outermost top-n or final top-n or something similar. http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@511 PS11, Line 511: doesn't matter nit: worth clarifying that it doesn't matter for the purpose of the pushdown decision. http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@530 PS11, Line 530: include all of the rows in the final This does not literally mean all rows in the final partition right ? Should it be all eligible rows or all relevant rows ? (based on the P+N value) http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@531 PS11, Line 531: was nit: 'we' http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@603 PS11, Line 603: if (analyticLimit < limit) return falseStatus; One special case where this could work is if each partition had a maximum of analyticLimit rows. Then we know we are not excluding rows as we iterate through the partitions until we reach the final limit. Of course, this knowledge is not readily available or may not be relied upon due to ndv estimates but in practice I suspect this may not be uncommon. For this patch, it makes sense to be conservative in applying the optimization. http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/16942/11/fe/src/main/java/org/apache/impala/planner/SortNode.java@83 PS11, Line 83: row.s nit: 'rows' http://gerrit.cloudera.org:8080/#/c/16942/11/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test File testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test: http://gerrit.cloudera.org:8080/#/c/16942/11/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test@1139 PS11, Line 1139: # rank() predicate is not pushed down because TOPN_BYTES_LIMIT prevents conversion The plan shows the lower top-n which indicates the rank was pushed down. Perhaps the top_bytes_limit should be even smaller if this is supposed to be a negative test. -- To view, visit http://gerrit.cloudera.org:8080/16942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I801d7799b0d649c73d2dd1703729a9b58a662509 Gerrit-Change-Number: 16942 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 17 Jan 2021 00:36:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 48: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8012/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 48 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 16 Jan 2021 18:12:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#48). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds a new class of predicates called overlap predicates to aid in the determination of whether a Parquet row group or a page overlap with a range computed from an equi hash join. If not, then the entire row group or page are skipped. When a row survives this way, it can be subjected to the row-level overlapping test against the same overlap predicate. For the following query, the min and max in the overlap predicate are computed with the values from the join column from table 'b'. To evaluate the overlap predicate, these two values are compared against the min/max of each row group or page at the scan node for 'a'. select straight_join count(*) from lineitem a join [SHUFFLE] lineitem b where a.l_shipdate = b.l_receiptdate and b.l_commitdate = "1992-01-31"; An overlap predicate associated with the column type J (in hash table) and scan column type S will be formed when one of the following is true: Both J and S are booleans Both J and S are integers (tinyint, smallint, int, or bigint) Both J and S are approximate numeric (float or double) Both J and S are decimals with the same precision and scale Both J and S are strings (STRING, CHAR or VARCHAR) Both J and S are date Both J and S are timestamp The overlap predicate is implemented as a min/max filter. Unlike existing min/max filters, MAX_NUM_RUNTIME_FILTERS query option does not apply to min/max filters created for overlap predicates. An overlap predicate will be evaluated as long as the overlap ratio is less than a thresold specified in a new query option 'minmax_filter_threshold'. Setting the threshold to its minimal value 0.0 disables the feature, and setting it to the maximal value 1.0 applies the filtering in all cases. In addition, two new run-time profile counters are added to report the number of row groups or pages filtered out via the overlap predicates respectively: 1. NumRuntimeFilteredRowGroups 2. NumRuntimeFilteredPages Testing: 1. Unit tested on various column types with TPCH and TPCDS tables. Benefits were significant when the join column on the outer table is sorted and there exist many row groups or pages no overlapping with the implementing min/max filters; 2. Added new tests in min_max_filters.test to demonstrate the number of filtered out pages and row groups with the two new profile counters; 3. Added new tests in runtime-filter-propagation.test to demonstrate that the overlap predicates work with different column types; 4. Added data type specific overlap method tests in min-max-filter-test.cc; 5. Core testing; 6. Performance measurement. To do in follow-up JIRAs: 1. Improve filtering efficiency; 2. Apply the overlap predicate on partition columns; 3. IR code-gen for various MinMaxFilter::EvalOverlap methods. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/runtime/date-value.cc M be/src/runtime/date-value.h M be/src/runtime/raw-value.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/string-value.cc M be/src/runtime/string-value.h M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-ir.cc M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java M fe/src/main/java/org/apache/impala/analysis/Predicate.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit-large.test M testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test A testdata/workloads/functional-planner/queries/PlannerTest/disable-overlap-filter.test M testdata/workloads