[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8804/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 27 May 2021 01:46:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8803/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 27 May 2021 01:43:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful.

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skiped pages are quickly identified by finding the
lower and the upper bounds in the sorted min and max value arrays,
given the min and max range in the filter.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
  a) Min/max filters are only created for leading sort by column;
  b) Query option minmax_filter_sorted_columns works.
  2). Added new tests in parquet-page-index-test.cc to cover the fast
  code path;
  3). Core [TBD]
  4). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
16 files changed, 605 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/20
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17478/19/be/src/exec/parquet/parquet-page-index-test.cc
File be/src/exec/parquet/parquet-page-index-test.cc:

http://gerrit.cloudera.org:8080/#/c/17478/19/be/src/exec/parquet/parquet-page-index-test.cc@264
PS19, Line 264:   // Change the min/max values and the range in the filter:
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 27 May 2021 01:22:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful.

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skiped pages are quickly identified by finding the
lower and the upper bounds in the sorted min and max value arrays,
given the min and max range in the filter.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
  a) Min/max filters are only created for leading sort by column;
  b) Query option minmax_filter_sorted_columns works.
  2). Added new tests in parquet-page-index-test.cc to cover the fast
  code path;
  3). Core [TBD]
  4). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
16 files changed, 608 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/19
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats

2021-05-26 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17521 )

Change subject: IMPALA-10700: Add query options to skip deleting stats
..


Patch Set 1: Code-Review+2

(7 comments)

LGTM. Just have some minor comments. Feel free to forward my +2.

http://gerrit.cloudera.org:8080/#/c/17521/1/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/17521/1/be/src/service/query-options.h@248
PS1, Line 248:   QUERY_OPT_FN(skip_delete_table_stats, SKIP_DELETE_TABLE_STATS,\
 :   TQueryOptionLevel::ADVANCED)\
 :   QUERY_OPT_FN(skip_delete_column_stats, 
SKIP_DELETE_COLUMN_STATS,\
 :   TQueryOptionLevel::ADVANCED)
nit: Can we mention TRUNCATE in the names since these only affects TRUNCATE? 
e.g.

 SKIP_DELETE_TABLE_STATS_IN_TRUNCATE
 SKIP_DELETE_COLUMN_STATS_IN_TRUNCATE


http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@184
PS1, Line 184: import org.apache.impala.thrift.TQueryOptions;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2193
PS1, Line 2193: to take
nit: after taking


http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2207
PS1, Line 2207: to truncate
nit: after truncating


http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2213
PS1, Line 2213: to create
nit: after creating


http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2239
PS1, Line 2239: to unset
nit: after unsetting


http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2312
PS1, Line 2312: if (!isTableBeingReplicated) {
Ah, I think this fixes an existing bug. It'd be good to mention it in the 
commit message as well.



--
To view, visit http://gerrit.cloudera.org:8080/17521
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
Gerrit-Change-Number: 17521
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 27 May 2021 01:11:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17521 )

Change subject: IMPALA-10700: Add query options to skip deleting stats
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7177/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17521
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
Gerrit-Change-Number: 17521
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 27 May 2021 00:52:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8802/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 27 May 2021 00:29:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17521 )

Change subject: IMPALA-10700: Add query options to skip deleting stats
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8801/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17521
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
Gerrit-Change-Number: 17521
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 27 May 2021 00:22:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 18:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc
File be/src/exec/parquet/parquet-page-index-test.cc:

http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc@165
PS18, Line 165:   // Expect to skip no pages.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc@182
PS18, Line 182:   // Expect to skip all pages.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17478/18/be/src/exec/parquet/parquet-page-index-test.cc@194
PS18, Line 194:   // Expect to skip all pages.
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 27 May 2021 00:07:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful.

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skiped pages are quickly identified by finding the
lower and the upper bounds in the sorted min and max value arrays,
given the min and max range in the filter.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
  a) Min/max filters are only created for leading sort by column;
  b) Query option minmax_filter_sorted_columns works.
  2). Added new tests in parquet-page-index-test.cc to cover the fast
  code path;
  3). Core [TBD]
  4). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
16 files changed, 529 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/18
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17521 )

Change subject: IMPALA-10700: Add query options to skip deleting stats
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17521/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2239
PS1, Line 2239:   "Time elapsed to unset partition and column 
statistics for table {}: {} msec",
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17521
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
Gerrit-Change-Number: 17521
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 27 May 2021 00:02:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10700: Add query options to skip deleting stats

2021-05-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17521


Change subject: IMPALA-10700: Add query options to skip deleting stats
..

IMPALA-10700: Add query options to skip deleting stats

When a truncate table command is issued, in case of
non-transactional tables, the table and column statistics for the table
are also deleted by default. This can be a expensive operation
especially when many truncate table commands are running concurrently.
As the concurrency increases, the response time from Hive metastore
slows down the delete table and column statistics RPC calls.

In cases where truncate operation is to used to remove the existing
data and then reload new data, it is likely that users will compute
stats again as soon as the new data is reloaded. This would overwrite
the existing statistics and hence the additional time spent by
the truncate operation to delete column and table statistics becomes
unnecessary.

To improve this, this change introduces 2 new query options:
1. SKIP_DELETE_TABLE_STATS
2. SKIP_DELETE_COLUMN_STATS

As the names suggest, when these query options are set to true or 1,
a truncate operation will not delete the respective statistics for the
table.

Testing:
Modified truncate-table.test to include variations of these query
options and making sure that the statistics are deleted or skipped
from deletion after truncate operation.

Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/JniCatalog.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test
8 files changed, 424 insertions(+), 25 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17521/1
--
To view, visit http://gerrit.cloudera.org:8080/17521
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9400c3586b4bdf46d9b4056ea1023aabae8cc519
Gerrit-Change-Number: 17521
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17484 )

Change subject: IMPALA-8737: Switch to gperftool with O(log n) 
PageHeap::AllocLarge()
..

IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

In gperftools 2.5, PageHeap::AllocLarge() has O(n) behavior.
This switches to a patched version of gperftools 2.5 that
changes the behavior to O(log n). This corresponds to
these commits:

https://github.com/gperftools/gperftools/commit/06c9414ec423ffe442c047b2560555f9d5847b1d
https://github.com/gperftools/gperftools/commit/f1d3fe4a21e339a3fd6e4592ee784a7b92dc

Testing:
 - Core job

Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Reviewed-on: http://gerrit.cloudera.org:8080/17484
Tested-by: Impala Public Jenkins 
Reviewed-by: Kurt Deschler 
Reviewed-by: Joe McDonnell 
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Kurt Deschler: Looks good to me, but someone else must approve
  Joe McDonnell: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Gerrit-Change-Number: 17484
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17484 )

Change subject: IMPALA-8737: Switch to gperftool with O(log n) 
PageHeap::AllocLarge()
..


Patch Set 2: Code-Review+2

Going ahead with this and bumping to +2

Performance testing of TPC-DS against Kudu showed no performance change. Tests 
of TPC-H also showed no performance change.


-- 
To view, visit http://gerrit.cloudera.org:8080/17484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Gerrit-Change-Number: 17484
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Wed, 26 May 2021 23:51:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8800/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 15
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 26 May 2021 23:42:05 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17482 )

Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..

IMPALA-8737: Patch gperftools to implement O(log n) searching for 
PageHeap::AllocLarge()

In gperftools 2.5, PageHeap::AllocLarge() has O(n) behavior.
This patches gperftool 2.5 to incorporate the patches that
change PageHeap::AllocLarge() to have O(log n) searching:

https://github.com/gperftools/gperftools/commit/06c9414ec423ffe442c047b2560555f9d5847b1d
https://github.com/gperftools/gperftools/commit/f1d3fe4a21e339a3fd6e4592ee784a7b92dc

This also adds a build for gperftools 2.8.1 to allow future
development/testing for actually upgrading gperftools (IMPALA-6784).

Testing:
 - Ran a toolchain build with this
 - Ran Impala locally with the resulting gperftools-2.5-p4

Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Reviewed-on: http://gerrit.cloudera.org:8080/17482
Tested-by: Joe McDonnell 
Reviewed-by: Kurt Deschler 
Reviewed-by: Joe McDonnell 
---
M buildall.sh
A 
source/gperftools/gperftools-2.5-patches/0003-Implemented-O-log-n-searching-among-large-spans.patch
A 
source/gperftools/gperftools-2.5-patches/0004-refactored-handling-of-reverse-span-set-iterator-for.patch
A 
source/gperftools/gperftools-2.8.1-patches/0001-Port-gperftools-to-adapt-aarch64.patch
4 files changed, 873 insertions(+), 1 deletion(-)

Approvals:
  Joe McDonnell: Looks good to me, approved; Verified
  Kurt Deschler: Looks good to me, but someone else must approve

--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17482 )

Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..


Patch Set 1:

Bumped to +2, everything is passing on the Impala side


--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 26 May 2021 23:48:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17298/15/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/15/tests/custom_cluster/test_metastore_service.py@443
PS15, Line 443: e
flake8: E501 line too long (96 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/17298/15/tests/custom_cluster/test_metastore_service.py@444
PS15, Line 444: c
flake8: E501 line too long (93 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 15
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 26 May 2021 23:21:22 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17482 )

Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 26 May 2021 23:47:57 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17482 )

Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..


Patch Set 1: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 26 May 2021 23:41:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-05-26 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#15).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 597 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/15
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 15
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

2021-05-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17484 )

Change subject: IMPALA-8737: Switch to gperftool with O(log n) 
PageHeap::AllocLarge()
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Gerrit-Change-Number: 17484
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Wed, 26 May 2021 23:40:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 14:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8799/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 14
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 26 May 2021 22:35:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17298/14/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/14/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2978
PS14, Line 2978:   private void renameNonTransactionalTableIfExists(String 
oldDbNameWithCatalog, String oldTableName,
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/14/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2979
PS14, Line 2979:String 
newDbNameWithCatalog, String newTableName,
line too long (100 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 14
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 26 May 2021 22:25:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-05-26 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#14).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 596 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/14
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 14
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17484 )

Change subject: IMPALA-8737: Switch to gperftool with O(log n) 
PageHeap::AllocLarge()
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Gerrit-Change-Number: 17484
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Wed, 26 May 2021 22:13:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8798/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 26 May 2021 20:24:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 17:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc
File be/src/exec/parquet/parquet-page-index-test.cc:

http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc@165
PS17, Line 165:   // Expect to skip no pages.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc@170
PS17, Line 170:   // Expect to skip all pages.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17478/17/be/src/exec/parquet/parquet-page-index-test.cc@182
PS17, Line 182:   // Expect to skip all pages.
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 26 May 2021 20:04:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-05-26 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful.

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skiped pages are quickly identified by finding the
lower and the upper bounds in the sorted min and max value arrays,
given the min and max range in the filter.

Testing:
  1). Added two new tests in overlap_min_max_filters.test to verify
  a) Min/max filters are only created for leading sort by column;
  b) Query option minmax_filter_sorted_columns works.
  2). Added new tests in parquet-page-index-test.cc to cover the fast
  code path;
  3). Core [TBD]
  4). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
16 files changed, 515 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/17
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..


Patch Set 2: Verified+1

Build Successful

https://jenkins.impala.io/job/gerrit-docs-auto-test/632/ : Doc tests passed.


--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 26 May 2021 18:15:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..


Patch Set 2:

I talked with Shajini, and we agreed that I could go ahead and address my 
comments with a new upload.


--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 26 May 2021 18:09:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..


Patch Set 2:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/632/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 26 May 2021 18:08:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#2) to the change originally 
created by Shajini Thayasingh. ( http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..

IMPALA-9770: [DOCS] Remove Sentry references in documentation

Updated all the associated topics.

Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
---
M docs/shared/impala_common.xml
M docs/topics/impala_adls.xml
M docs/topics/impala_alter_database.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_alter_view.xml
M docs/topics/impala_authorization.xml
M docs/topics/impala_create_role.xml
M docs/topics/impala_delegation.xml
M docs/topics/impala_drop_role.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_insert.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_langref_unsupported.xml
M docs/topics/impala_ldap.xml
M docs/topics/impala_lineage.xml
M docs/topics/impala_logging.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_refresh_authorization.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_scaling_limits.xml
M docs/topics/impala_security.xml
M docs/topics/impala_security_files.xml
M docs/topics/impala_show.xml
M docs/topics/impala_ssl.xml
25 files changed, 156 insertions(+), 354 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17469/2
--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/shared/impala_common.xml@4579
PS1, Line 4579:Impala now does not support privileges of 
DELETE,
  :   UPDATE, and 
UPSERT operations. 
> I'm thinking this sentence is already covered in the sentence just above it
Removed


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_database.xml
File docs/topics/impala_alter_database.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_database.xml@a68
PS1, Line 68:
> Thanks Joe! The syntax is supported however the functionality is not due to
Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed.


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_table.xml
File docs/topics/impala_alter_table.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_table.xml@a74
PS1, Line 74:
> Thanks Joe! The syntax is supported however the functionality is not due to
Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed.


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_table.xml@a322
PS1, Line 322:
> Thanks Joe! The syntax is supported however the functionality is not due to
Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed.


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_view.xml
File docs/topics/impala_alter_view.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_alter_view.xml@a71
PS1, Line 71:
> Thanks Joe! The syntax is supported however the functionality is not due to
Since it doesn't work, we'll remove it for now until IMPALA-10712 is fixed.


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml
File docs/topics/impala_authorization.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml@164
PS1, Line 164: before starting Impala cluster 
> I think we can omit this.
Done


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml@229
PS1, Line 229: fe/src/test/resources/
> I don't think we should use this specific path. This is true for an Impala
Returned to the old wording.


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_authorization.xml@306
PS1, Line 306:   The following examples show how to set up 
authorization to deal with various scenarios
 :   and how to grant privileges on objects to groups of 
users via roles, but note that you
 :   could also grant privileges on objects to a user or a 
group directly without involving a
 :   role.
> This is a very long sentence with a lot going on. Let's cut it down:
Done


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_grant.xml
File docs/topics/impala_grant.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_grant.xml@101
PS1, Line 101: belonging to a group
> Nit: I don't think this addition gets us much. The original phrase was clea
Cut this out.


http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_show.xml
File docs/topics/impala_show.xml:

http://gerrit.cloudera.org:8080/#/c/17469/1/docs/topics/impala_show.xml@38
PS1, Line 38: The following statements are supported in Impala through Ranger to
:   manage authorization.
> Two things here:
Removed this



--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 1
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 26 May 2021 18:08:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10680: Replace StringToFloatInternal using fast double parser library

2021-05-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17389 )

Change subject: IMPALA-10680: Replace StringToFloatInternal using 
fast_double_parser library
..


Patch Set 6:

> Agree that the performance numbers are OK WRT strings of 20 or 30
 > bytes long.
 >
 > Would it be possible to make another performance by calling
 > strtod() instead?
 >
 > According to the performance numbers documented here
 > (https://github.com/lemire/fast_double_parser), fast_double is
 > about 9.4x faster than strtod, tested with Apple Clang.
 >
 > I just curious how fast fast_double is in Impala, compared to
 > strtod().

So for both 20 and 30 length string fast_double_parser also uses strtod so not 
any noticeable difference observed. On checking code, they specify that more 
than 19 digits will generally cause overflow and are highly unlikely to occur 
so they just send it to strtod(). Just that even for strtod we need a null 
terminated string. So I think short string optimisation should work well here 
as longer strings are not likely.


--
To view, visit http://gerrit.cloudera.org:8080/17389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic105ad38a2fcbf2fb4e8ae8af6d9a8e251a9c141
Gerrit-Change-Number: 17389
Gerrit-PatchSet: 6
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 26 May 2021 17:58:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10502: Handle CREATE/DROP events correctly

2021-05-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17308 )

Change subject: IMPALA-10502: Handle CREATE/DROP events correctly
..


Patch Set 7:

I realized that for dynamically generated the partitions in the inserts we need 
to handle the ADD_PARTITION events as well. I will upload the patch once I 
handle that.


--
To view, visit http://gerrit.cloudera.org:8080/17308
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2c5e96b48abac015240f20295b3ec3b1d71f24a
Gerrit-Change-Number: 17308
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 26 May 2021 17:29:45 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17482 )

Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..


Patch Set 1:

This already builds and works with Impala, adding +1 Verified


--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 26 May 2021 16:37:16 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-8737: Patch gperftools to implement O(log n) searching for PageHeap::AllocLarge()

2021-05-26 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17482 )

Change subject: IMPALA-8737: Patch gperftools to implement O(log n) searching 
for PageHeap::AllocLarge()
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f41f2d296e26e160fd9bcc8be29085d1e9a8bc2
Gerrit-Change-Number: 17482
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Comment-Date: Wed, 26 May 2021 16:37:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8737: Switch to gperftool with O(log n) PageHeap::AllocLarge()

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17484 )

Change subject: IMPALA-8737: Switch to gperftool with O(log n) 
PageHeap::AllocLarge()
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7176/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6377f7087111cf10ae35ce102beabcafb505579f
Gerrit-Change-Number: 17484
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Wed, 26 May 2021 16:26:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17466 )

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8797/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 26 May 2021 16:24:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10626: Add support for Iceberg's Catalogs API

2021-05-26 Thread Zoltan Borok-Nagy (Code Review)
Hello wangsheng, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17466

to look at the new patch set (#5).

Change subject: IMPALA-10626: Add support for Iceberg's Catalogs API
..

IMPALA-10626: Add support for Iceberg's Catalogs API

Iceberg recently switched to use its Catalogs class to define
catalog and table properties. Catalog information is stored in
a configuration file such as hive-site.xml. And the table properties
contain information about which catalog is being used and what is
the Iceberg table id.

E.g. in the Hive conf we can have the following properties to define
catalogs:

 iceberg.catalog..type = hadoop
 iceberg.catalog..warehouse = somelocation

 or

 iceberg.catalog..type = hive

And at the table level we can have the following:

iceberg.catalog = 
name = 

Table property 'iceberg.catalog' refers to a Catalog defined in the
configuration file. This is in contradiction with Impala's current
behavior where we are already using 'iceberg.catalog', and it can
have the following values:

 * hive.catalog for HiveCatalog
 * hadoop.catalog for HadoopCatalog
 * hadoop.tables for HadoopTables

To be backward-compatible and also support the new Catalogs properties
Impala still recognizes the above special values. But, from now Impala
doesn't define 'iceberg.catalog' by default. 'iceberg.catalog' being
NULL means HiveCatalog for both Impala and Iceberg's Catalogs API,
hence for Hive and Spark as well.

If 'iceberg.catalog' has a different value than the special values it
indicates that Iceberg's Catalogs API is being used, so Impala will
try to look up the catalog configuration from the Hive config file.

Testing:
 * added SHOW CREATE TABLE tests
 * added e2e tests that create/insert/drop Iceberg tables with Catalogs
 * manually tested interop behavior with Hive

Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalog.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/resources/hive-site.xml.py
A testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/query_test/test_iceberg.py
15 files changed, 476 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17466/5
--
To view, visit http://gerrit.cloudera.org:8080/17466
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5dfa150986117fc55b28034c4eda38a736460ead
Gerrit-Change-Number: 17466
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h
File be/src/runtime/io/scan-buffer-manager.h:

http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h@136
PS7, Line 136:   void SetCachedBuffer() {
nit. may inline.


http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h@140
PS7, Line 140:   void SetClientBuffer() {
same as above.


http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.h@144
PS7, Line 144:   void SetNoBuffer() {
same as above


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h
File be/src/runtime/io/scan-buffer-manager.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@169
PS4, Line 169:   iomgr_buffer_cu
> That's the name from the existing code and it still is a tag used for exter
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc
File be/src/runtime/io/scan-buffer-manager.cc:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@199
PS4, Line 199:  << " cancel_status:
> Thats a precondition i.e., callers of PopReadyBuffer (like ScanRange::GetNe
Done


http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc
File be/src/runtime/io/scan-buffer-manager.cc:

http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc@28
PS7, Line 28: // Implementation of the buffer management for ScanRange. Each 
ScanRange contains a queue
: // of ready buffers. For each ScanRange, there is only a single 
producer and consumer
: // thread, i.e. only one disk thread will push to a scan range at 
any time and only one
: // thread will remove from the queue. This is to guarantee that 
buffers are queued and
: // read in file order.
nit. This para describes the class very well. Maybe move it to the .h file?


http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc@138
PS7, Line 138: DCHECK(scan_range_->is_locked(scan_range_lock));
Passing the scan_range_lock and calling DCHECK() is the price we pay for the 
refactoring. Just wonder if scan_range_lock can be removed entirely. That is, 
we assume the lock is acquired.

The exception is the CloseReader() call below takes the lock. I was not able to 
find the implementation of the method. Maybe the lock argument is not needed?


http://gerrit.cloudera.org:8080/#/c/17413/7/be/src/runtime/io/scan-buffer-manager.cc@186
PS7, Line 186:   for (auto& buffer : unused_iomgr_buffers_)
nit. Should add '{' here because the statement below is not at the same line.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@427
PS4, Line 427: int disk_id, bool expected_local, int64_t mtime, const 
BufferOpts& buffer_opts,
 : vector&& sub_ranges, void* meta_data, DiskFile
> This is an assignment not a check. I guess you meant to add a Setter which
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-range.cc@432
PS4, Line 432: HECK(file != nullptr);
 :   DCHECK_GE(len, 0);
> This is an assignment not a check. I guess you meant to add a Setter which
Done



--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 26 May 2021 15:06:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down partition meta in LocalCatalog mode

2021-05-26 Thread Quanlong Huang (Code Review)
Hello Aman Sinha, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17505

to look at the new patch set (#3).

Change subject: IMPALA-7501: Slim down partition meta in LocalCatalog mode
..

IMPALA-7501: Slim down partition meta in LocalCatalog mode

In LocalCatalog mode, the coordinator caches HMS partition objects in
its local cache. HMS partition contains many fields that Impala won't
used, and some fields that are suboptimal, most of them are in the
StorageDescriptor:
 - partition-level schema (list) which is never used.
 - location string which can be prefix-compressed since the prefix is
   usually the table location.
 - input/outputFormat string which is always duplicated and can be
   represented by an integer(enum).
An experiment on a large table with 478 columns and 87320 partitions
(one non-empty file per partition) shows that more than 90% of the
memory space (1.1GB) are occupied by these fields. The dominant part is
the partition-level schema which consumed 76% of the cache.

On the other hand, these unused or suboptimal fields are got in one
response from catalogd, wrapped in TPartialPartitionInfo which finally
belongs to a TGetPartialCatalogObjectResponse. They dramatically
increase the serialized thrift object size of the response, which has a
2GB array size limit in JVM. Fetching meta of many partitions from
catalogd could cause it runs into OOM error that hits the 2GB limit
(e.g. IMPALA-9896).

This patch extracts the HMS partition object and replaces it with the
fields that Impala actually uses. In the LocalCatalog cache, the HMS
partition object is replaced with
 - hms parameters
 - write id
 - HdfsStorageDescriptor which represents the input/output format and
   some delimiters.
 - prefix-compressed location
The hms_partition field of TPartialPartitionInfo is also extracted with
corresponding fields. However, CatalogHmsAPIHelper still requires the
whole hms partition object. So the hms_partition field is kept for its
usage. To distinguish the different requirements, we add a new field,
want_hms_partition in TTableInfoSelector. The existing
'want_partition_metadata' field means returning these extracted fields,
and the 'want_hms_partition' field means returning the whole HMS
partition object.

Improvement results in the above case:
 - reduce the heap usage from 1.1GB to 113.2MB, objects from 41m to 2.3m
 - reduce the response size from 1.7GB to 28.41MB.

Tests:
 - Run local-catalog related tests locally
 - Run CORE tests

Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977
---
M be/src/runtime/descriptors.cc
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M 
fe/src/main/java/org/apache/impala/catalog/GetPartialCatalogObjectRequestBuilder.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/local/CatalogdMetaProviderTest.java
18 files changed, 279 insertions(+), 134 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/17505/3
--
To view, visit http://gerrit.cloudera.org:8080/17505
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977
Gerrit-Change-Number: 17505
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8796/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 26 May 2021 13:51:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..

IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

Currently BufferManagement is tightly coupled with ScanRange.
Every ScanRange maintains list of unused buffers and ready buffers.
Unused buffers are buffers used to read scanned data and ready
buffers are buffers with the data already read. For managing these
buffers, ScanRange defines various functions like AddUnusedBuffer,
GetUsedBuffer, EnqueueReadyBuffer and functions to allocate and
cleanup buffers. This patch has created ScanBufferManager which
would be responsible for the managing these buffers for ScanRange.
ScanBufferManager's logic is still coupled with the ScanRange,
but refactorig it into a seperate class is a good first step.

As ScanRange's lock is used to synchronize various functions
including the buffer management, a new class for lock store
has been created too.

Testing:
 1. Ran these existing tests: EE, BackEnd, JDBC and Cluster test.

Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-ranges.h
A be/src/runtime/io/scan-buffer-manager.cc
A be/src/runtime/io/scan-buffer-manager.h
M be/src/runtime/io/scan-range.cc
7 files changed, 584 insertions(+), 230 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17413/7
--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 7
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8795/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 26 May 2021 11:30:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 5:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@250
PS4, Line 250: class ScanRange : public RequestRan
> We need to specify what fields it protects in ScanRange. Or, we should stil
Removed this class.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@252
PS4, Line 252:
> nit. needs
Code deleted.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/request-ranges.h@514
PS4, Line 514:
> +1 on "ensuring no new locks are taken in ScanBufferManager methods.
This is done now. No locks acquired in ScanBufferManager.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h
File be/src/runtime/io/scan-buffer-manager.h:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@36
PS4, Line 36: ange using this buffer manager.
> Actually there isn't much info at 'external_buffer_tag_'. Could you add com
Have updated the documentation.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@40
PS4, Line 40:   ///This buffer is external to this buffer manager and is 
not managed by it
> nit. missing '.'.
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@47
PS4, Line 47:
> nit. nullptr.
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@52
PS4, Line 52:
> nit. should be 'scan_range_'.
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@53
PS4, Line 53:
> nit. this scan range (i.e. scan_range_).
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@62
PS4, Line 62:
> nit. scan_range_->len().
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@106
PS4, Line 106: scan_range_loc
> nit. PopFirstReadyBuffer() describes the semantics better.
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@109
PS4, Line 109: /// Clean
> nit. should be inline.
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@131
PS4, Line 131: /// ScanRange using thi
> Instead of the friend declaration, can we add public functions that ScanRan
Have removed friend from both the classes. Neither ScanRange can access private 
methods/members of ScanBufferManager nor the other way is possible now. This 
also prevents ScanBufferManager taking ScanRange's lock_.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@136
PS4, Line 136:   void SetCachedBuffer() {
 : external_buffer_tag_ = ExternalBuffe
> I think having this pointer is a bit confusing, probably it'd be a bit clea
This code is deleted now as discussed.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@169
PS4, Line 169:   iomgr_buffer_cu
> nit: maybe name it BufferTag, since the buffer is managed by this new class
That's the name from the existing code and it still is a tag used for external 
buffers. External buffers are tagged as CLIENT_BUFFER and CACHED_BUFFER, 
whereas managed buffers NO_BUFFER (to depict that it is not an external buffer).


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.h@177
PS4, Line 177: 
> nit: add +2 indentation
Done


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc
File be/src/runtime/io/scan-buffer-manager.cc:

http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@28
PS4, Line 28: mplementation of the buffer managem
> This probably should be moved inside the cstr body, after DCHECK().
Code is deleted now.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@85
PS4, Line 85: nused_iomgr_buffer_bytes_ -= result->buffer_le
> please update comment
Done.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@135
PS4, Line 135: ::CleanUpBu
> nit. can we replace goto with the actual code below?
done.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@162
PS4, Line 162: Buff
> this->scan_range?
done.


http://gerrit.cloudera.org:8080/#/c/17413/4/be/src/runtime/io/scan-buffer-manager.cc@165
PS4, Line 165:   return false;
 :   }
 :   *buffer = move(ready_buffers_.front());
 :   ready_buffers_.pop_front();
 :   // If eosr is seen, then unused buffer should be empty.
 :   DCHECK(!(*buffer)->eosr() || 

[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17413/5/be/src/runtime/io/scan-buffer-manager.cc
File be/src/runtime/io/scan-buffer-manager.cc:

http://gerrit.cloudera.org:8080/#/c/17413/5/be/src/runtime/io/scan-buffer-manager.cc@89
PS5, Line 89: void ScanBufferManager::EnqueueReadyBuffer(const 
std::unique_lock& scan_range_lock,
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 26 May 2021 11:11:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-05-26 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..

IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

Currently BufferManagement is tightly coupled with ScanRange.
Every ScanRange maintains list of unused buffers and ready buffers.
Unused buffers are buffers used to read scanned data and ready
buffers are buffers with the data already read. For managing these
buffers, ScanRange defines various functions like AddUnusedBuffer,
GetUsedBuffer, EnqueueReadyBuffer and functions to allocate and
cleanup buffers. This patch has created ScanBufferManager which
would be responsible for the managing these buffers for ScanRange.
ScanBufferManager's logic is still coupled with the ScanRange,
but refactorig it into a seperate class is a good first step.

As ScanRange's lock is used to synchronize various functions
including the buffer management, a new class for lock store
has been created too.

Testing:
 1. Ran these existing tests: EE, BackEnd, JDBC and Cluster test.

Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
---
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-ranges.h
A be/src/runtime/io/scan-buffer-manager.cc
A be/src/runtime/io/scan-buffer-manager.h
M be/src/runtime/io/scan-range.cc
7 files changed, 585 insertions(+), 231 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17413/5
--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10197 (Part 2): Add KUDU REPLICA SELECTION query option

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17502 )

Change subject: IMPALA-10197 (Part 2): Add KUDU_REPLICA_SELECTION query option
..

IMPALA-10197 (Part 2): Add KUDU_REPLICA_SELECTION query option

The previous patch added a new test case in PlannerTest for Kudu,
which check for specific number of hosts and instances for distributed
plan with query option KUDU_REPLICA_SELECTION set as LEADER_ONLY.
However, the leadership isn't deterministic since each Kudu partition
has its own Raft group and the leaders of these groups are not exactly
balanced across hosts in a cluster. There's no guarantee that we'll get
a certain number of leader hosts for a query which access multiple KUDU
partitions. This makes the unit-test flaky.

This patch removed the distributed plans to avoid nondeterminism.

Testing:
 - Reran the Planner test and verified the issue did not happen.

Change-Id: I3e23667c06c273a261e03de3d81fc7ee1f6b0682
Reviewed-on: http://gerrit.cloudera.org:8080/17502
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 
---
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-closest-replica.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-replica-selection-leader-only.test
2 files changed, 11 insertions(+), 182 deletions(-)

Approvals:
  Quanlong Huang: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/17502
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3e23667c06c273a261e03de3d81fc7ee1f6b0682
Gerrit-Change-Number: 17502
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10197 (Part 2): Add KUDU REPLICA SELECTION query option

2021-05-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17502 )

Change subject: IMPALA-10197 (Part 2): Add KUDU_REPLICA_SELECTION query option
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17502
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e23667c06c273a261e03de3d81fc7ee1f6b0682
Gerrit-Change-Number: 17502
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 26 May 2021 09:59:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 26 May 2021 06:41:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10688: Implement ds cpc stringify() function

2021-05-26 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17373 )

Change subject: IMPALA-10688: Implement ds_cpc_stringify() function
..

IMPALA-10688: Implement ds_cpc_stringify() function

This function receives a string that is a serialized Apache
DataSketches CPC sketch and returns its stringified format.

A stringified format should look like and contains the following data:

select ds_cpc_stringify(ds_cpc_sketch(float_col)) from
functional_parquet.alltypestiny;
++
| ds_cpc_stringify(ds_cpc_sketch(float_col)) |
++
| ### CPC sketch summary:|
|lg_k   : 11 |
|seed hash  : 93cc   |
|C  : 2  |
|flavor : 1  |
|merged : true   |
|intresting col : 0  |
|table entries  : 2  |
|window : not allocated  |
| ### End sketch summary |
||
++

Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Reviewed-on: http://gerrit.cloudera.org:8080/17373
Tested-by: Impala Public Jenkins 
Reviewed-by: Gabor Kaszab 
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
4 files changed, 59 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Gabor Kaszab: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8c9d089bfada6bebd078d8f388d2e146c79e5285
Gerrit-Change-Number: 17373
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins