[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8804/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 27 May 2021 01:46:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8803/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 27 May 2021 01:43:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables This patch enables min/max filters for equi-joins on sort by columns in a Parquet table created by Impala. This is to take advantage of Impala sorting the min/max values in column index in each data file for the table. When there are multiple sort by columns in the table, only the leading column will be assigned a min/max filter. The control knob is query option minmax_filter_sorted_columns, default to true. When minmax_filter_sorted_columns is true and the threshold (query option minmax_filter_threshold) is 0, the patch automatically assigns a reasonable value for the threshhold, and selects PAGE to be the filtering level (query option minmax_filtering_level). When the threshold is greater than 0, no adjustment will be made to either the threshold or the filtering level. When the min and max column stats exist on the leading sort column, these stats can be used to help select filters that are most likely helpful. When minmax_filter_sorted_columns is set to false, no min/max filters will be specifically assigned to the leading sort by columns. In the backend, the skiped pages are quickly identified by finding the lower and the upper bounds in the sorted min and max value arrays, given the min and max range in the filter. Testing: 1). Added two new tests in overlap_min_max_filters.test to verify a) Min/max filters are only created for leading sort by column; b) Query option minmax_filter_sorted_columns works. 2). Added new tests in parquet-page-index-test.cc to cover the fast code path; 3). Core [TBD] 4). Performance [TBD] Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-page-index-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 16 files changed, 605 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/20 -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/17478/19/be/src/exec/parquet/parquet-page-index-test.cc File be/src/exec/parquet/parquet-page-index-test.cc: http://gerrit.cloudera.org:8080/#/c/17478/19/be/src/exec/parquet/parquet-page-index-test.cc@264 PS19, Line 264: // Change the min/max values and the range in the filter: line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 27 May 2021 01:22:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables This patch enables min/max filters for equi-joins on sort by columns in a Parquet table created by Impala. This is to take advantage of Impala sorting the min/max values in column index in each data file for the table. When there are multiple sort by columns in the table, only the leading column will be assigned a min/max filter. The control knob is query option minmax_filter_sorted_columns, default to true. When minmax_filter_sorted_columns is true and the threshold (query option minmax_filter_threshold) is 0, the patch automatically assigns a reasonable value for the threshhold, and selects PAGE to be the filtering level (query option minmax_filtering_level). When the threshold is greater than 0, no adjustment will be made to either the threshold or the filtering level. When the min and max column stats exist on the leading sort column, these stats can be used to help select filters that are most likely helpful. When minmax_filter_sorted_columns is set to false, no min/max filters will be specifically assigned to the leading sort by columns. In the backend, the skiped pages are quickly identified by finding the lower and the upper bounds in the sorted min and max value arrays, given the min and max range in the filter. Testing: 1). Added two new tests in overlap_min_max_filters.test to verify a) Min/max filters are only created for leading sort by column; b) Query option minmax_filter_sorted_columns works. 2). Added new tests in parquet-page-index-test.cc to cover the fast code path; 3). Core [TBD] 4). Performance [TBD] Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-page-index-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 16 files changed, 608 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/19 -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17521 ) Change subject: IMPALA-10700: Add query options to skip deleting stats .. Patch Set 1: Code-Review+2 (7 comments) LGTM. Just have some minor comments. Feel free to forward my +2. http://gerrit.cloudera.org:8080/#/c/17521/1/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/17521/1/be/src/service/query-options.h@248 PS1, Line 248: QUERY_OPT_FN(skip_delete_table_stats, SKIP_DELETE_TABLE_STATS,\ : TQueryOptionLevel::ADVANCED)\ : QUERY_OPT_FN(skip_delete_column_stats, SKIP_DELETE_COLUMN_STATS,\ : TQueryOptionLevel::ADVANCED) nit: Can we mention TRUNCATE in the names since these only affects TRUNCATE? e.g. SKIP_DELETE_TABLE_STATS_IN_TRUNCATE SKIP_DELETE_COLUMN_STATS_IN_TRUNCATE http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@184 PS1, Line 184: import org.apache.impala.thrift.TQueryOptions; nit: unused import http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2193 PS1, Line 2193: to take nit: after taking http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2207 PS1, Line 2207: to truncate nit: after truncating http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2213 PS1, Line 2213: to create nit: after creating http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2239 PS1, Line 2239: to unset nit: after unsetting http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2312 PS1, Line 2312: if (!isTableBeingReplicated) { Ah, I think this fixes an existing bug. It'd be good to mention it in the commit message as well. -- To view, visit http://gerrit.cloudera.org:8080/17521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519 Gerrit-Change-Number: 17521 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 27 May 2021 01:11:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17521 ) Change subject: IMPALA-10700: Add query options to skip deleting stats .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7177/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519 Gerrit-Change-Number: 17521 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 27 May 2021 00:52:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8802/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 27 May 2021 00:29:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17521 ) Change subject: IMPALA-10700: Add query options to skip deleting stats .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8801/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519 Gerrit-Change-Number: 17521 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 27 May 2021 00:22:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 18: (3 comments) http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc File be/src/exec/parquet/parquet-page-index-test.cc: http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc@165 PS18, Line 165: // Expect to skip no pages. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc@182 PS18, Line 182: // Expect to skip all pages. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc@194 PS18, Line 194: // Expect to skip all pages. line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Thu, 27 May 2021 00:07:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables This patch enables min/max filters for equi-joins on sort by columns in a Parquet table created by Impala. This is to take advantage of Impala sorting the min/max values in column index in each data file for the table. When there are multiple sort by columns in the table, only the leading column will be assigned a min/max filter. The control knob is query option minmax_filter_sorted_columns, default to true. When minmax_filter_sorted_columns is true and the threshold (query option minmax_filter_threshold) is 0, the patch automatically assigns a reasonable value for the threshhold, and selects PAGE to be the filtering level (query option minmax_filtering_level). When the threshold is greater than 0, no adjustment will be made to either the threshold or the filtering level. When the min and max column stats exist on the leading sort column, these stats can be used to help select filters that are most likely helpful. When minmax_filter_sorted_columns is set to false, no min/max filters will be specifically assigned to the leading sort by columns. In the backend, the skiped pages are quickly identified by finding the lower and the upper bounds in the sorted min and max value arrays, given the min and max range in the filter. Testing: 1). Added two new tests in overlap_min_max_filters.test to verify a) Min/max filters are only created for leading sort by column; b) Query option minmax_filter_sorted_columns works. 2). Added new tests in parquet-page-index-test.cc to cover the fast code path; 3). Core [TBD] 4). Performance [TBD] Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-page-index-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 16 files changed, 529 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/18 -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17521 ) Change subject: IMPALA-10700: Add query options to skip deleting stats .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2239 PS1, Line 2239: "Time elapsed to unset partition and column statistics for table {}: {} msec", line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519 Gerrit-Change-Number: 17521 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 27 May 2021 00:02:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats
Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17521 Change subject: IMPALA-10700: Add query options to skip deleting stats .. IMPALA-10700: Add query options to skip deleting stats When a truncate table command is issued, in case of non-transactional tables, the table and column statistics for the table are also deleted by default. This can be a expensive operation especially when many truncate table commands are running concurrently. As the concurrency increases, the response time from Hive metastore slows down the delete table and column statistics RPC calls. In cases where truncate operation is to used to remove the existing data and then reload new data, it is likely that users will compute stats again as soon as the new data is reloaded. This would overwrite the existing statistics and hence the additional time spent by the truncate operation to delete column and table statistics becomes unnecessary. To improve this, this change introduces 2 new query options: 1. SKIP_DELETE_TABLE_STATS 2. SKIP_DELETE_COLUMN_STATS As the names suggest, when these query options are set to true or 1, a truncate operation will not delete the respective statistics for the table. Testing: Modified truncate-table.test to include variations of these query options and making sure that the statistics are deleted or skipped from deletion after truncate operation. Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/JniCatalog.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test 8 files changed, 424 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17521/1 -- To view, visit http://gerrit.cloudera.org:8080/17521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519 Gerrit-Change-Number: 17521 Gerrit-PatchSet: 1 Gerrit-Owner: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17484 ) Change subject: IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge() .. IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge() In gperftools 2.5, PageHeap::AllocLarge() has O(n) behavior. This switches to a patched version of gperftools 2.5 that changes the behavior to O(log n). This corresponds to these commits: https://github.com/gperftools/gperftools/commit/06c9414ec423ffe442c047b2560555f9d5847b1d https://github.com/gperftools/gperftools/commit/f1d3fe4a21e339a3fd6e4592ee784a7b92dc Testing: - Core job Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f Reviewed-on: http://gerrit.cloudera.org:8080/17484 Tested-by: Impala Public Jenkins Reviewed-by: Kurt Deschler Reviewed-by: Joe McDonnell --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Kurt Deschler: Looks good to me, but someone else must approve Joe McDonnell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f Gerrit-Change-Number: 17484 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler
[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17484 ) Change subject: IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge() .. Patch Set 2: Code-Review+2 Going ahead with this and bumping to +2 Performance testing of TPC-DS against Kudu showed no performance change. Tests of TPC-H also showed no performance change. -- To view, visit http://gerrit.cloudera.org:8080/17484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f Gerrit-Change-Number: 17484 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Wed, 26 May 2021 23:51:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8800/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 15 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 26 May 2021 23:42:05 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17482 ) Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() .. IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() In gperftools 2.5, PageHeap::AllocLarge() has O(n) behavior. This patches gperftool 2.5 to incorporate the patches that change PageHeap::AllocLarge() to have O(log n) searching: https://github.com/gperftools/gperftools/commit/06c9414ec423ffe442c047b2560555f9d5847b1d https://github.com/gperftools/gperftools/commit/f1d3fe4a21e339a3fd6e4592ee784a7b92dc This also adds a build for gperftools 2.8.1 to allow future development/testing for actually upgrading gperftools (IMPALA-6784). Testing: - Ran a toolchain build with this - Ran Impala locally with the resulting gperftools-2.5-p4 Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Reviewed-on: http://gerrit.cloudera.org:8080/17482 Tested-by: Joe McDonnell Reviewed-by: Kurt Deschler Reviewed-by: Joe McDonnell --- M buildall.sh A source/gperftools/gperftools-2.5-patches/0003-Implemented-O-log-n-searching-among-large-spans.patch A source/gperftools/gperftools-2.5-patches/0004-refactored-handling-of-reverse-span-set-iterator-for.patch A source/gperftools/gperftools-2.8.1-patches/0001-Port-gperftools-to-adapt-aarch64.patch 4 files changed, 873 insertions(+), 1 deletion(-) Approvals: Joe McDonnell: Looks good to me, approved; Verified Kurt Deschler: Looks good to me, but someone else must approve -- To view, visit http://gerrit.cloudera.org:8080/17482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Gerrit-Change-Number: 17482 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal
[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17482 ) Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() .. Patch Set 1: Bumped to +2, everything is passing on the Impala side -- To view, visit http://gerrit.cloudera.org:8080/17482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Gerrit-Change-Number: 17482 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 26 May 2021 23:48:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 15: (2 comments) http://gerrit.cloudera.org:8080/#/c/17298/15/tests/custom_cluster/test_metastore_service.py File tests/custom_cluster/test_metastore_service.py: http://gerrit.cloudera.org:8080/#/c/17298/15/tests/custom_cluster/test_metastore_service.py@443 PS15, Line 443: e flake8: E501 line too long (96 > 90 characters) http://gerrit.cloudera.org:8080/#/c/17298/15/tests/custom_cluster/test_metastore_service.py@444 PS15, Line 444: c flake8: E501 line too long (93 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 15 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 26 May 2021 23:21:22 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17482 ) Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Gerrit-Change-Number: 17482 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 26 May 2021 23:47:57 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17482 ) Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Gerrit-Change-Number: 17482 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 26 May 2021 23:41:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17298 to look at the new patch set (#15). Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis For transactional tables, catalogd already guarantees consitent table metadata reads based on the writeIdList passed in the request. For non transactional tables, the reads are eventually consistent as in event processor thread in the background, processes HMS events for the table and updates its metadata. In this patch, to ensure strong consistency guarantees for external tables,we invalidate the table metadata from cache if HMS DDL apis like alter/drop table/partition are accessed from catalogd's metastore server. As a result of which, any subsequent get table request fetches the table from HMS and loads it in cache. This ensures that any get_table/get_partition requests after DDL operations on same table return updated table metadata. This behavior has a performance penalty since metadata loading in cache takes time specially for large tables. The change is behind catalogd server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The flag needs to be turned off in case of a performance bottleneck. Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_metastore_service.py 6 files changed, 597 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/15 -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 15 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17484 ) Change subject: IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge() .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f Gerrit-Change-Number: 17484 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Wed, 26 May 2021 23:40:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 14: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8799/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 14 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 26 May 2021 22:35:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17298 ) Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. Patch Set 14: (2 comments) http://gerrit.cloudera.org:8080/#/c/17298/14/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java: http://gerrit.cloudera.org:8080/#/c/17298/14/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2978 PS14, Line 2978: private void renameNonTransactionalTableIfExists(String oldDbNameWithCatalog, String oldTableName, line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/17298/14/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2979 PS14, Line 2979:String newDbNameWithCatalog, String newTableName, line too long (100 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 14 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 26 May 2021 22:25:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17298 to look at the new patch set (#14). Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis .. IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis For transactional tables, catalogd already guarantees consitent table metadata reads based on the writeIdList passed in the request. For non transactional tables, the reads are eventually consistent as in event processor thread in the background, processes HMS events for the table and updates its metadata. In this patch, to ensure strong consistency guarantees for external tables,we invalidate the table metadata from cache if HMS DDL apis like alter/drop table/partition are accessed from catalogd's metastore server. As a result of which, any subsequent get table request fetches the table from HMS and loads it in cache. This ensures that any get_table/get_partition requests after DDL operations on same table return updated table metadata. This behavior has a performance penalty since metadata loading in cache takes time specially for large tables. The change is behind catalogd server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The flag needs to be turned off in case of a performance bottleneck. Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_metastore_service.py 6 files changed, 596 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/14 -- To view, visit http://gerrit.cloudera.org:8080/17298 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98 Gerrit-Change-Number: 17298 Gerrit-PatchSet: 14 Gerrit-Owner: Sourabh Goyal Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17484 ) Change subject: IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge() .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f Gerrit-Change-Number: 17484 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Wed, 26 May 2021 22:13:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8798/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Wed, 26 May 2021 20:24:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. Patch Set 17: (3 comments) http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc File be/src/exec/parquet/parquet-page-index-test.cc: http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc@165 PS17, Line 165: // Expect to skip no pages. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc@170 PS17, Line 170: // Expect to skip all pages. line has trailing whitespace http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc@182 PS17, Line 182: // Expect to skip all pages. line has trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Wed, 26 May 2021 20:04:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables
Qifan Chen has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17478 ) Change subject: IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables .. IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables This patch enables min/max filters for equi-joins on sort by columns in a Parquet table created by Impala. This is to take advantage of Impala sorting the min/max values in column index in each data file for the table. When there are multiple sort by columns in the table, only the leading column will be assigned a min/max filter. The control knob is query option minmax_filter_sorted_columns, default to true. When minmax_filter_sorted_columns is true and the threshold (query option minmax_filter_threshold) is 0, the patch automatically assigns a reasonable value for the threshhold, and selects PAGE to be the filtering level (query option minmax_filtering_level). When the threshold is greater than 0, no adjustment will be made to either the threshold or the filtering level. When the min and max column stats exist on the leading sort column, these stats can be used to help select filters that are most likely helpful. When minmax_filter_sorted_columns is set to false, no min/max filters will be specifically assigned to the leading sort by columns. In the backend, the skiped pages are quickly identified by finding the lower and the upper bounds in the sorted min and max value arrays, given the min and max range in the filter. Testing: 1). Added two new tests in overlap_min_max_filters.test to verify a) Min/max filters are only created for leading sort by column; b) Query option minmax_filter_sorted_columns works. 2). Added new tests in parquet-page-index-test.cc to cover the fast code path; 3). Core [TBD] 4). Performance [TBD] Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 --- M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-page-index-test.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test 16 files changed, 515 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/17 -- To view, visit http://gerrit.cloudera.org:8080/17478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963 Gerrit-Change-Number: 17478 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen
[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17469 ) Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation .. Patch Set 2: Verified+1 Build Successful https://jenkins.impala.io/job/gerrit-docs-auto-test/632/ : Doc tests passed. -- To view, visit http://gerrit.cloudera.org:8080/17469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d Gerrit-Change-Number: 17469 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 26 May 2021 18:15:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17469 ) Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation .. Patch Set 2: I talked with Shajini, and we agreed that I could go ahead and address my comments with a new upload. -- To view, visit http://gerrit.cloudera.org:8080/17469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d Gerrit-Change-Number: 17469 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 26 May 2021 18:09:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17469 ) Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation .. Patch Set 2: Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/632/ Testing docs change - this change appears to modify docs/ and no code. This is experimental - please report any issues to tarmstr...@cloudera.com or on this JIRA: IMPALA-7317 -- To view, visit http://gerrit.cloudera.org:8080/17469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d Gerrit-Change-Number: 17469 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 26 May 2021 18:08:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation
Joe McDonnell has uploaded a new patch set (#2) to the change originally created by Shajini Thayasingh. ( http://gerrit.cloudera.org:8080/17469 ) Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation .. IMPALA-9770: [DOCS] Remove Sentry references in documentation Updated all the associated topics. Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d --- M docs/shared/impala_common.xml M docs/topics/impala_adls.xml M docs/topics/impala_alter_database.xml M docs/topics/impala_alter_table.xml M docs/topics/impala_alter_view.xml M docs/topics/impala_authorization.xml M docs/topics/impala_create_role.xml M docs/topics/impala_delegation.xml M docs/topics/impala_drop_role.xml M docs/topics/impala_grant.xml M docs/topics/impala_insert.xml M docs/topics/impala_invalidate_metadata.xml M docs/topics/impala_kudu.xml M docs/topics/impala_langref_unsupported.xml M docs/topics/impala_ldap.xml M docs/topics/impala_lineage.xml M docs/topics/impala_logging.xml M docs/topics/impala_refresh.xml M docs/topics/impala_refresh_authorization.xml M docs/topics/impala_revoke.xml M docs/topics/impala_scaling_limits.xml M docs/topics/impala_security.xml M docs/topics/impala_security_files.xml M docs/topics/impala_show.xml M docs/topics/impala_ssl.xml 25 files changed, 156 insertions(+), 354 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17469/2 -- To view, visit http://gerrit.cloudera.org:8080/17469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d Gerrit-Change-Number: 17469 Gerrit-PatchSet: 2 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17469 ) Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/17469/1/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/shared/impala_common.xml@4579 PS1, Line 4579:Impala now does not support privileges of DELETE, : UPDATE, and UPSERT operations. > I'm thinking this sentence is already covered in the sentence just above it Removed http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_database.xml File docs/topics/impala_alter_database.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_database.xml@a68 PS1, Line 68: > Thanks Joe! The syntax is supported however the functionality is not due to Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed. http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_table.xml File docs/topics/impala_alter_table.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_table.xml@a74 PS1, Line 74: > Thanks Joe! The syntax is supported however the functionality is not due to Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed. http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_table.xml@a322 PS1, Line 322: > Thanks Joe! The syntax is supported however the functionality is not due to Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed. http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_view.xml File docs/topics/impala_alter_view.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_view.xml@a71 PS1, Line 71: > Thanks Joe! The syntax is supported however the functionality is not due to Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed. http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml File docs/topics/impala_authorization.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml@164 PS1, Line 164: before starting Impala cluster > I think we can omit this. Done http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml@229 PS1, Line 229: fe/src/test/resources/ > I don't think we should use this specific path. This is true for an Impala Returned to the old wording. http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml@306 PS1, Line 306: The following examples show how to set up authorization to deal with various scenarios : and how to grant privileges on objects to groups of users via roles, but note that you : could also grant privileges on objects to a user or a group directly without involving a : role. > This is a very long sentence with a lot going on. Let's cut it down: Done http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_grant.xml File docs/topics/impala_grant.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_grant.xml@101 PS1, Line 101: belonging to a group > Nit: I don't think this addition gets us much. The original phrase was clea Cut this out. http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_show.xml File docs/topics/impala_show.xml: http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_show.xml@38 PS1, Line 38: The following statements are supported in Impala through Ranger to : manage authorization. > Two things here: Removed this -- To view, visit http://gerrit.cloudera.org:8080/17469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d Gerrit-Change-Number: 17469 Gerrit-PatchSet: 1 Gerrit-Owner: Shajini Thayasingh Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Comment-Date: Wed, 26 May 2021 18:08:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17389 ) Change subject: IMPALA-10680: Replace StringToFloatInternal using fast_double_parser library .. Patch Set 6: > Agree that the performance numbers are OK WRT strings of 20 or 30 > bytes long. > > Would it be possible to make another performance by calling > strtod() instead? > > According to the performance numbers documented here > (https://github.com/lemire/fast_double_parser), fast_double is > about 9.4x faster than strtod, tested with Apple Clang. > > I just curious how fast fast_double is in Impala, compared to > strtod(). So for both 20 and 30 length string fast_double_parser also uses strtod so not any noticeable difference observed. On checking code, they specify that more than 19 digits will generally cause overflow and are highly unlikely to occur so they just send it to strtod(). Just that even for strtod we need a null terminated string. So I think short string optimisation should work well here as longer strings are not likely. -- To view, visit http://gerrit.cloudera.org:8080/17389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141 Gerrit-Change-Number: 17389 Gerrit-PatchSet: 6 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 26 May 2021 17:58:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17308 ) Change subject: IMPALA-10502: Handle CREATE/DROP events correctly .. Patch Set 7: I realized that for dynamically generated the partitions in the inserts we need to handle the ADD_PARTITION events as well. I will upload the patch once I handle that. -- To view, visit http://gerrit.cloudera.org:8080/17308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a Gerrit-Change-Number: 17308 Gerrit-PatchSet: 7 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sourabh Goyal Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 26 May 2021 17:29:45 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17482 ) Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() .. Patch Set 1: This already builds and works with Impala, adding +1 Verified -- To view, visit http://gerrit.cloudera.org:8080/17482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Gerrit-Change-Number: 17482 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 26 May 2021 16:37:16 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17482 ) Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge() .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2 Gerrit-Change-Number: 17482 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Comment-Date: Wed, 26 May 2021 16:37:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17484 ) Change subject: IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge() .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7176/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f Gerrit-Change-Number: 17484 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Wed, 26 May 2021 16:26:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17466 ) Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8797/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17466 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead Gerrit-Change-Number: 17466 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 26 May 2021 16:24:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API
Hello wangsheng, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17466 to look at the new patch set (#5). Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API .. IMPALA-10626: Add support for Iceberg's Catalogs API Iceberg recently switched to use its Catalogs class to define catalog and table properties. Catalog information is stored in a configuration file such as hive-site.xml. And the table properties contain information about which catalog is being used and what is the Iceberg table id. E.g. in the Hive conf we can have the following properties to define catalogs: iceberg.catalog..type = hadoop iceberg.catalog..warehouse = somelocation or iceberg.catalog..type = hive And at the table level we can have the following: iceberg.catalog = name = Table property 'iceberg.catalog' refers to a Catalog defined in the configuration file. This is in contradiction with Impala's current behavior where we are already using 'iceberg.catalog', and it can have the following values: * hive.catalog for HiveCatalog * hadoop.catalog for HadoopCatalog * hadoop.tables for HadoopTables To be backward-compatible and also support the new Catalogs properties Impala still recognizes the above special values. But, from now Impala doesn't define 'iceberg.catalog' by default. 'iceberg.catalog' being NULL means HiveCatalog for both Impala and Iceberg's Catalogs API, hence for Hive and Spark as well. If 'iceberg.catalog' has a different value than the special values it indicates that Iceberg's Catalogs API is being used, so Impala will try to look up the catalog configuration from the Hive config file. Testing: * added SHOW CREATE TABLE tests * added e2e tests that create/insert/drop Iceberg tables with Catalogs * manually tested interop behavior with Hive Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/resources/hive-site.xml.py A testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test M tests/query_test/test_iceberg.py 15 files changed, 476 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17466/5 -- To view, visit http://gerrit.cloudera.org:8080/17466 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead Gerrit-Change-Number: 17466 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 7: (10 comments) http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h File be/src/runtime/io/scan-buffer-manager.h: http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h@136 PS7, Line 136: void SetCachedBuffer() { nit. may inline. http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h@140 PS7, Line 140: void SetClientBuffer() { same as above. http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h@144 PS7, Line 144: void SetNoBuffer() { same as above http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h File be/src/runtime/io/scan-buffer-manager.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@169 PS4, Line 169: iomgr_buffer_cu > That's the name from the existing code and it still is a tag used for exter Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc File be/src/runtime/io/scan-buffer-manager.cc: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@199 PS4, Line 199: << " cancel_status: > Thats a precondition i.e., callers of PopReadyBuffer (like ScanRange::GetNe Done http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc File be/src/runtime/io/scan-buffer-manager.cc: http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc@28 PS7, Line 28: // Implementation of the buffer management for ScanRange. Each ScanRange contains a queue : // of ready buffers. For each ScanRange, there is only a single producer and consumer : // thread, i.e. only one disk thread will push to a scan range at any time and only one : // thread will remove from the queue. This is to guarantee that buffers are queued and : // read in file order. nit. This para describes the class very well. Maybe move it to the .h file? http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc@138 PS7, Line 138: DCHECK(scan_range_->is_locked(scan_range_lock)); Passing the scan_range_lock and calling DCHECK() is the price we pay for the refactoring. Just wonder if scan_range_lock can be removed entirely. That is, we assume the lock is acquired. The exception is the CloseReader() call below takes the lock. I was not able to find the implementation of the method. Maybe the lock argument is not needed? http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc@186 PS7, Line 186: for (auto& buffer : unused_iomgr_buffers_) nit. Should add '{' here because the statement below is not at the same line. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@427 PS4, Line 427: int disk_id, bool expected_local, int64_t mtime, const BufferOpts& buffer_opts, : vector&& sub_ranges, void* meta_data, DiskFile > This is an assignment not a check. I guess you meant to add a Setter which Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@432 PS4, Line 432: HECK(file != nullptr); : DCHECK_GE(len, 0); > This is an assignment not a check. I guess you meant to add a Setter which Done -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 26 May 2021 15:06:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7501: Slim down partition meta in LocalCatalog mode
Hello Aman Sinha, Vihang Karajgaonkar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17505 to look at the new patch set (#3). Change subject: IMPALA-7501: Slim down partition meta in LocalCatalog mode .. IMPALA-7501: Slim down partition meta in LocalCatalog mode In LocalCatalog mode, the coordinator caches HMS partition objects in its local cache. HMS partition contains many fields that Impala won't used, and some fields that are suboptimal, most of them are in the StorageDescriptor: - partition-level schema (list) which is never used. - location string which can be prefix-compressed since the prefix is usually the table location. - input/outputFormat string which is always duplicated and can be represented by an integer(enum). An experiment on a large table with 478 columns and 87320 partitions (one non-empty file per partition) shows that more than 90% of the memory space (1.1GB) are occupied by these fields. The dominant part is the partition-level schema which consumed 76% of the cache. On the other hand, these unused or suboptimal fields are got in one response from catalogd, wrapped in TPartialPartitionInfo which finally belongs to a TGetPartialCatalogObjectResponse. They dramatically increase the serialized thrift object size of the response, which has a 2GB array size limit in JVM. Fetching meta of many partitions from catalogd could cause it runs into OOM error that hits the 2GB limit (e.g. IMPALA-9896). This patch extracts the HMS partition object and replaces it with the fields that Impala actually uses. In the LocalCatalog cache, the HMS partition object is replaced with - hms parameters - write id - HdfsStorageDescriptor which represents the input/output format and some delimiters. - prefix-compressed location The hms_partition field of TPartialPartitionInfo is also extracted with corresponding fields. However, CatalogHmsAPIHelper still requires the whole hms partition object. So the hms_partition field is kept for its usage. To distinguish the different requirements, we add a new field, want_hms_partition in TTableInfoSelector. The existing 'want_partition_metadata' field means returning these extracted fields, and the 'want_hms_partition' field means returning the whole HMS partition object. Improvement results in the above case: - reduce the heap usage from 1.1GB to 113.2MB, objects from 41m to 2.3m - reduce the response size from 1.7GB to 28.41MB. Tests: - Run local-catalog related tests locally - Run CORE tests Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977 --- M be/src/runtime/descriptors.cc M common/thrift/CatalogObjects.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/GetPartialCatalogObjectRequestBuilder.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java 18 files changed, 279 insertions(+), 134 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/17505/3 -- To view, visit http://gerrit.cloudera.org:8080/17505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977 Gerrit-Change-Number: 17505 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8796/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 26 May 2021 13:51:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Amogh Margoor has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr Currently BufferManagement is tightly coupled with ScanRange. Every ScanRange maintains list of unused buffers and ready buffers. Unused buffers are buffers used to read scanned data and ready buffers are buffers with the data already read. For managing these buffers, ScanRange defines various functions like AddUnusedBuffer, GetUsedBuffer, EnqueueReadyBuffer and functions to allocate and cleanup buffers. This patch has created ScanBufferManager which would be responsible for the managing these buffers for ScanRange. ScanBufferManager's logic is still coupled with the ScanRange, but refactorig it into a seperate class is a good first step. As ScanRange's lock is used to synchronize various functions including the buffer management, a new class for lock store has been created too. Testing: 1. Ran these existing tests: EE, BackEnd, JDBC and Cluster test. Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-ranges.h A be/src/runtime/io/scan-buffer-manager.cc A be/src/runtime/io/scan-buffer-manager.h M be/src/runtime/io/scan-range.cc 7 files changed, 584 insertions(+), 230 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17413/7 -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 7 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8795/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 26 May 2021 11:30:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 5: (25 comments) http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h File be/src/runtime/io/request-ranges.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@250 PS4, Line 250: class ScanRange : public RequestRan > We need to specify what fields it protects in ScanRange. Or, we should stil Removed this class. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@252 PS4, Line 252: > nit. needs Code deleted. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514 PS4, Line 514: > +1 on "ensuring no new locks are taken in ScanBufferManager methods. This is done now. No locks acquired in ScanBufferManager. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h File be/src/runtime/io/scan-buffer-manager.h: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@36 PS4, Line 36: ange using this buffer manager. > Actually there isn't much info at 'external_buffer_tag_'. Could you add com Have updated the documentation. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@40 PS4, Line 40: ///This buffer is external to this buffer manager and is not managed by it > nit. missing '.'. Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@47 PS4, Line 47: > nit. nullptr. Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@52 PS4, Line 52: > nit. should be 'scan_range_'. Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@53 PS4, Line 53: > nit. this scan range (i.e. scan_range_). Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@62 PS4, Line 62: > nit. scan_range_->len(). Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@106 PS4, Line 106: scan_range_loc > nit. PopFirstReadyBuffer() describes the semantics better. Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@109 PS4, Line 109: /// Clean > nit. should be inline. Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@131 PS4, Line 131: /// ScanRange using thi > Instead of the friend declaration, can we add public functions that ScanRan Have removed friend from both the classes. Neither ScanRange can access private methods/members of ScanBufferManager nor the other way is possible now. This also prevents ScanBufferManager taking ScanRange's lock_. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@136 PS4, Line 136: void SetCachedBuffer() { : external_buffer_tag_ = ExternalBuffe > I think having this pointer is a bit confusing, probably it'd be a bit clea This code is deleted now as discussed. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@169 PS4, Line 169: iomgr_buffer_cu > nit: maybe name it BufferTag, since the buffer is managed by this new class That's the name from the existing code and it still is a tag used for external buffers. External buffers are tagged as CLIENT_BUFFER and CACHED_BUFFER, whereas managed buffers NO_BUFFER (to depict that it is not an external buffer). http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@177 PS4, Line 177: > nit: add +2 indentation Done http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc File be/src/runtime/io/scan-buffer-manager.cc: http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@28 PS4, Line 28: mplementation of the buffer managem > This probably should be moved inside the cstr body, after DCHECK(). Code is deleted now. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@85 PS4, Line 85: nused_iomgr_buffer_bytes_ -= result->buffer_le > please update comment Done. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@135 PS4, Line 135: ::CleanUpBu > nit. can we replace goto with the actual code below? done. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@162 PS4, Line 162: Buff > this->scan_range? done. http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@165 PS4, Line 165: return false; : } : *buffer = move(ready_buffers_.front()); : ready_buffers_.pop_front(); : // If eosr is seen, then unused buffer should be empty. : DCHECK(!(*buffer)->eosr() ||
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17413/5/be/src/runtime/io/scan-buffer-manager.cc File be/src/runtime/io/scan-buffer-manager.cc: http://gerrit.cloudera.org:8080/#/c/17413/5/be/src/runtime/io/scan-buffer-manager.cc@89 PS5, Line 89: void ScanBufferManager::EnqueueReadyBuffer(const std::unique_lock& scan_range_lock, line too long (95 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 26 May 2021 11:11:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr
Amogh Margoor has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17413 ) Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr .. IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr Currently BufferManagement is tightly coupled with ScanRange. Every ScanRange maintains list of unused buffers and ready buffers. Unused buffers are buffers used to read scanned data and ready buffers are buffers with the data already read. For managing these buffers, ScanRange defines various functions like AddUnusedBuffer, GetUsedBuffer, EnqueueReadyBuffer and functions to allocate and cleanup buffers. This patch has created ScanBufferManager which would be responsible for the managing these buffers for ScanRange. ScanBufferManager's logic is still coupled with the ScanRange, but refactorig it into a seperate class is a good first step. As ScanRange's lock is used to synchronize various functions including the buffer management, a new class for lock store has been created too. Testing: 1. Ran these existing tests: EE, BackEnd, JDBC and Cluster test. Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 --- M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-ranges.h A be/src/runtime/io/scan-buffer-manager.cc A be/src/runtime/io/scan-buffer-manager.h M be/src/runtime/io/scan-range.cc 7 files changed, 585 insertions(+), 231 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17413/5 -- To view, visit http://gerrit.cloudera.org:8080/17413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97 Gerrit-Change-Number: 17413 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor Gerrit-Reviewer: Amogh Margoor Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10197 (Part 2): Add KUDU REPLICA SELECTION query option
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17502 ) Change subject: IMPALA-10197 (Part 2): Add KUDU_REPLICA_SELECTION query option .. IMPALA-10197 (Part 2): Add KUDU_REPLICA_SELECTION query option The previous patch added a new test case in PlannerTest for Kudu, which check for specific number of hosts and instances for distributed plan with query option KUDU_REPLICA_SELECTION set as LEADER_ONLY. However, the leadership isn't deterministic since each Kudu partition has its own Raft group and the leaders of these groups are not exactly balanced across hosts in a cluster. There's no guarantee that we'll get a certain number of leader hosts for a query which access multiple KUDU partitions. This makes the unit-test flaky. This patch removed the distributed plans to avoid nondeterminism. Testing: - Reran the Planner test and verified the issue did not happen. Change-Id: I3e23667c06c273a261e03de3d81fc7ee1f6b0682 Reviewed-on: http://gerrit.cloudera.org:8080/17502 Reviewed-by: Quanlong Huang Tested-by: Impala Public Jenkins --- M testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-closest-replica.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test 2 files changed, 11 insertions(+), 182 deletions(-) Approvals: Quanlong Huang: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/17502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3e23667c06c273a261e03de3d81fc7ee1f6b0682 Gerrit-Change-Number: 17502 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10197 (Part 2): Add KUDU REPLICA SELECTION query option
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17502 ) Change subject: IMPALA-10197 (Part 2): Add KUDU_REPLICA_SELECTION query option .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17502 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e23667c06c273a261e03de3d81fc7ee1f6b0682 Gerrit-Change-Number: 17502 Gerrit-PatchSet: 5 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 26 May 2021 09:59:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 5 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 26 May 2021 06:41:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function
Gabor Kaszab has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17373 ) Change subject: IMPALA-10688: Implement ds_cpc_stringify() function .. IMPALA-10688: Implement ds_cpc_stringify() function This function receives a string that is a serialized Apache DataSketches CPC sketch and returns its stringified format. A stringified format should look like and contains the following data: select ds_cpc_stringify(ds_cpc_sketch(float_col)) from functional_parquet.alltypestiny; ++ | ds_cpc_stringify(ds_cpc_sketch(float_col)) | ++ | ### CPC sketch summary:| |lg_k : 11 | |seed hash : 93cc | |C : 2 | |flavor : 1 | |merged : true | |intresting col : 0 | |table entries : 2 | |window : not allocated | | ### End sketch summary | || ++ Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Reviewed-on: http://gerrit.cloudera.org:8080/17373 Tested-by: Impala Public Jenkins Reviewed-by: Gabor Kaszab --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test 4 files changed, 59 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Verified Gabor Kaszab: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17373 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285 Gerrit-Change-Number: 17373 Gerrit-PatchSet: 6 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins