[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 17: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7187/


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 17
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 07:13:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7501: Slim down partition metadata in LocalCatalog mode

2021-06-02 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17505 )

Change subject: IMPALA-7501: Slim down partition metadata in LocalCatalog mode
..


Patch Set 6:

Thank Aman and Vihang's feedback! Addressed the comments.


--
To view, visit http://gerrit.cloudera.org:8080/17505
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977
Gerrit-Change-Number: 17505
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 08:12:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-06-02 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17413/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17413/9//COMMIT_MSG@24
PS9, Line 24: Change-
> Ran exhaustive + TSAN and it passed: https://master-03.jenkins.cloudera.com
Just exhaustive test pass: https://jenkins.impala.io/job/pre-review-test/967/



--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 10:05:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 37:

4 tests FAILED, but they may be unrelated as running them locally on master 
they also failed.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 37
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 10:29:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7556: Decouple BufferManagement from the ScanRange and IoMgr

2021-06-02 Thread Amogh Margoor (Code Review)
Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17413 )

Change subject: IMPALA-7556: Decouple BufferManagement from the ScanRange and 
IoMgr
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17413/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17413/9//COMMIT_MSG@24
PS9, Line 24: Change-
> Just exhaustive test pass: https://jenkins.impala.io/job/pre-review-test/96
Private exhaustive tests passed too: 
https://master-03.jenkins.cloudera.com/job/impala-private-parameterized/233/



--
To view, visit http://gerrit.cloudera.org:8080/17413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd74691b50b46114f95a8641034c05d07ddeec97
Gerrit-Change-Number: 17413
Gerrit-PatchSet: 10
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 12:03:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5569: Add statement ALTER TABLE UNSET TBLPROPERTIES/SERDEPROPERTIES

2021-06-02 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17530 )

Change subject: IMPALA-5569: Add statement ALTER TABLE UNSET 
TBLPROPERTIES/SERDEPROPERTIES
..

IMPALA-5569: Add statement ALTER TABLE UNSET TBLPROPERTIES/SERDEPROPERTIES

This patch adds ability to unset or delete table properties or serde
properties for a table. It supports 'IF EXISTS' clause in case users
are not sure if property being unset exists. Without 'IF EXISTS',
trying to unset property that doesn't exist will fail.

Tests:
1. Added Unit tests and end-to-end tests
2. Covered tables of different storage type like Kudu,
   Iceberg, HDFS table.
Change-Id: Ife4f6561dcdcd20c76eb299c6661c778e342509d
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AlterTableUnSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
12 files changed, 448 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17530/2
--
To view, visit http://gerrit.cloudera.org:8080/17530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife4f6561dcdcd20c76eb299c6661c778e342509d
Gerrit-Change-Number: 17530
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-5569: Add statement ALTER TABLE UNSET TBLPROPERTIES/SERDEPROPERTIES

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17530 )

Change subject: IMPALA-5569: Add statement ALTER TABLE UNSET 
TBLPROPERTIES/SERDEPROPERTIES
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8832/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17530
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife4f6561dcdcd20c76eb299c6661c778e342509d
Gerrit-Change-Number: 17530
Gerrit-PatchSet: 2
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 02 Jun 2021 13:32:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#38). ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..

IMPALA-10640: Support reading Parquet Bloom filters - most common types

This change adds read support for Parquet Bloom filters for types that
can reasonably be supported in Impala. Other types, such as CHAR(N),
would be very difficult to support because the length may be different
in Parquet and in Impala which results in truncation or padding, and
that changes the hash which makes using the Bloom filter impossible.
Write support will be added in a later change.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

The following types are not supported for the given reasons:

 
|Impala type |  Problem  |
||
|VARCHAR(N)  | truncation can change hash|
|CHAR(N) | padding / truncation can change hash  |
|DECIMAL | multiple encodings supported  |
|TIMESTAMP   | multiple encodings supported, timezone conversion |
|DATE| not considered yet|
 

Support may be added for these types later, see IMPALA-10641.

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests
whether Parquet Bloom filtering works for the supported types and
that we do not incorrectly discard row groups for the unsupported
type VARCHAR. The Parquet file used in the test was generated with
an external tool.
  - Added unit tests for ParquetBloomFilter in file
be/src/util/parquet-bloom-filter-test.cc
  - A minor, unrelated change was done in
be/src/util/bloom-filter-test.cc: the MakeRandom() function had
return type uint64_t, the documentation claimed it returned a 64 bit
random number, but the actual number of random bits is 32, which is
what is intended in the tests. The return type and documentation
have been corrected to use 32 bits.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
A be/src/exec/parquet/parquet-bloom-filter-util.cc
A be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter-test.cc
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M bin/jenkins/critique-gerrit-review.py
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter-disabled.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
36 files changed, 7,310 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/38
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 38
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 38:

(182 comments)

http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@70
PS38, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@92
PS38, Line 92:  *  
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@113
PS38, Line 113: #  elif defined (__cplusplus) || (defined (__STDC_VERSION__) && 
(__STDC_VERSION__ >= 199901L) /* C99 */)
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@243
PS38, Line 243: #  define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, 
XXH3_64bits_reset_withSecret)
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@253
PS38, Line 253: #  define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, 
XXH3_128bits_reset_withSeed)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@254
PS38, Line 254: #  define XXH3_128bits_reset_withSecret 
XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret)
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@270
PS38, Line 270: #define XXH_VERSION_NUMBER  (XXH_VERSION_MAJOR *100*100 + 
XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@429
PS38, Line 429:  * @param statePtr A pointer to an @ref XXH32_state_t allocated 
with @ref XXH32_createState().
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@441
PS38, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, 
const XXH32_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@476
PS38, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@628
PS38, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, 
const XXH64_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@631
PS38, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@700
PS38, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@724
PS38, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* 
data, size_t len, const void* secret, size_t secretSize);
line too long (120 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@743
PS38, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, 
const XXH3_state_t* src_state);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@756
PS38, Line 756: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@766
PS38, Line 766: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t 
secretSize);
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@768
PS38, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* 
statePtr, const void* input, size_t length);
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@791
PS38, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/38/be/src/thirdparty/xxhash/xxhash.h@792
PS38, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const 
void* data, size_t len, const void* secret, size_t secretSize);
line too long (122 > 90)


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 39:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7188/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 39
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 14:56:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 38:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8833/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 38
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 15:12:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 39: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 39
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 15:49:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10721: MetastoreServiceHandler should extend AbstractThriftHiveMetastore

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17523 )

Change subject: IMPALA-10721: MetastoreServiceHandler should extend 
AbstractThriftHiveMetastore
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5ca0d5f632f8d4461570c4fc89ea468019a7e9df
Gerrit-Change-Number: 17523
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 16:33:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10721: MetastoreServiceHandler should extend AbstractThriftHiveMetastore

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17523 )

Change subject: IMPALA-10721: MetastoreServiceHandler should extend 
AbstractThriftHiveMetastore
..

IMPALA-10721: MetastoreServiceHandler should extend AbstractThriftHiveMetastore

MetastoreServiceHandler should extend AbstractThriftHiveMetastore
which has default implementation of all the HMS APIs.
This avoids broken builds in Impala, whenever it
dynamically picks Hive GBN, which might have new HMS APIs.

Change-Id: I5ca0d5f632f8d4461570c4fc89ea468019a7e9df
Reviewed-on: http://gerrit.cloudera.org:8080/17523
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M bin/impala-config.sh
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
2 files changed, 14 insertions(+), 13 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5ca0d5f632f8d4461570c4fc89ea468019a7e9df
Gerrit-Change-Number: 17523
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Sourabh Goyal (Code Review)
Sourabh Goyal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17298/17/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/17/tests/custom_cluster/test_metastore_service.py@443
PS17, Line 443: e
> flake8: E501 line too long (96 > 90 characters)
Ack


http://gerrit.cloudera.org:8080/#/c/17298/17/tests/custom_cluster/test_metastore_service.py@444
PS17, Line 444: c
> flake8: E501 line too long (93 > 90 characters)
Ack



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 17
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 17:28:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#18).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 605 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/18
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 18
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7189/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 18
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 17:30:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7501: Slim down partition metadata in LocalCatalog mode

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17505 )

Change subject: IMPALA-7501: Slim down partition metadata in LocalCatalog mode
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17505/4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/17505/4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@960
PS4, Line 960: hdfsStorageDescriptor
> Yeah, the optimization is borrown from the legacy catalog mode.
Ah, I see. Even though it is supported it is not common practice AFAIK. I see a 
TODO in 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L543
 to optimize this. Since legacy mode also duplicates this for each partition, I 
think it is okay to take this up as a separate patch.



--
To view, visit http://gerrit.cloudera.org:8080/17505
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977
Gerrit-Change-Number: 17505
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 17:35:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down partition metadata in LocalCatalog mode

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17505 )

Change subject: IMPALA-7501: Slim down partition metadata in LocalCatalog mode
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17505
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977
Gerrit-Change-Number: 17505
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 17:35:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8834/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 18
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 17:51:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-06-02 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#28). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful. When
minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skipped pages can be quickly found by taking a
fast code path to find the corresponding lower and the upper bounds
in the sorted min and max value arrays, given a range in the filter.
The skipped pages are expessed as page ranges which later are
translated into row ranges.

A new query option minmax_filter_fast_code_path is added to control
the work of the fast code path. It can take ON (default), OFF, or
VERIFICATION three values. The last helps verify the results from
both the fast and the regular code path are the same.

Also fixed are abnormal min/max displays in "Final filter table"
section in a profile for DECIMAL, TIMESTAMP and DATE data types.

Testing:
  1). Added new tests in overlap_min_max_filters.test to verify
  a) Min/max filters are only created for leading sort by column;
  b) Query option minmax_filter_sorted_columns works.
  2). Added new tests in parquet-page-index-test.cc to cover the fast
  code path under various conditions;
  3). Core [TBD]
  4). Performance [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/raw-value.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
21 files changed, 972 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/28
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Shajini Thayasingh (Code Review)
Shajini Thayasingh has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17533


Change subject: IMPALA-10700: [DOCS] add a new query option
..

IMPALA-10700: [DOCS] add a new query option

Introduced a new query option to skip deleting column statistics on truncate 
operation.

Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
---
M docs/impala.ditamap
A docs/topics/impala_delete_stats_in_truncate.xml
2 files changed, 66 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/17533/1
--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 1
Gerrit-Owner: Shajini Thayasingh 


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..


Patch Set 1:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/633/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 1
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 02 Jun 2021 18:37:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 28:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8835/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Wed, 02 Jun 2021 18:44:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..


Patch Set 1: Verified+1

Build Successful

https://jenkins.impala.io/job/gerrit-docs-auto-test/633/ : Doc tests passed.


--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 1
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 02 Jun 2021 18:44:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17533/1/docs/topics/impala_delete_stats_in_truncate.xml
File docs/topics/impala_delete_stats_in_truncate.xml:

http://gerrit.cloudera.org:8080/#/c/17533/1/docs/topics/impala_delete_stats_in_truncate.xml@39
PS1, Line 39: This release introduces a new query option
I think we should skip part because once we release 4.1 this text would seem 
misleading.



--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 1
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:05:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#19).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 706 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/19
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 19
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17298/19/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/19/tests/custom_cluster/test_metastore_service.py@696
PS19, Line 696: ,
flake8: E231 missing whitespace after ','



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 19
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:22:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Shajini Thayasingh (Code Review)
Hello Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17533

to look at the new patch set (#2).

Change subject: IMPALA-10700: [DOCS] add a new query option
..

IMPALA-10700: [DOCS] add a new query option

Introduced a new query option to skip deleting column statistics on truncate 
operation.
Updated text to incorporate the comments received.
Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
---
M docs/impala.ditamap
A docs/topics/impala_delete_stats_in_truncate.xml
2 files changed, 67 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/17533/2
--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..


Patch Set 2:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/634/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:29:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-06-02 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:34:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9770: [DOCS] Remove Sentry references in documentation

2021-06-02 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17469 )

Change subject: IMPALA-9770: [DOCS] Remove Sentry references in documentation
..

IMPALA-9770: [DOCS] Remove Sentry references in documentation

Updated all the associated topics.

Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Reviewed-on: http://gerrit.cloudera.org:8080/17469
Tested-by: Impala Public Jenkins 
Reviewed-by: Joe McDonnell 
---
M docs/shared/impala_common.xml
M docs/topics/impala_adls.xml
M docs/topics/impala_alter_database.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_alter_view.xml
M docs/topics/impala_authorization.xml
M docs/topics/impala_create_role.xml
M docs/topics/impala_delegation.xml
M docs/topics/impala_drop_role.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_insert.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_langref_unsupported.xml
M docs/topics/impala_ldap.xml
M docs/topics/impala_lineage.xml
M docs/topics/impala_logging.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_refresh_authorization.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_scaling_limits.xml
M docs/topics/impala_security.xml
M docs/topics/impala_security_files.xml
M docs/topics/impala_show.xml
M docs/topics/impala_ssl.xml
25 files changed, 156 insertions(+), 354 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Joe McDonnell: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id4c5e9aa4d060ceaa426908a444d280a5564749d
Gerrit-Change-Number: 17469
Gerrit-PatchSet: 3
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..


Patch Set 2: Verified+1

Build Successful

https://jenkins.impala.io/job/gerrit-docs-auto-test/634/ : Doc tests passed.


--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:37:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8836/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 19
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:42:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7190/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 19
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:44:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..

IMPALA-10700: [DOCS] add a new query option

Introduced a new query option to skip deleting column statistics on truncate 
operation.
Updated text to incorporate the comments received.
Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Reviewed-on: http://gerrit.cloudera.org:8080/17533
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M docs/impala.ditamap
A docs/topics/impala_delete_stats_in_truncate.xml
2 files changed, 67 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 3
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10700: [DOCS] add a new query option

2021-06-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17533 )

Change subject: IMPALA-10700: [DOCS] add a new query option
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17533
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie753f84b233b06bf4554cab71263671aff36f570
Gerrit-Change-Number: 17533
Gerrit-PatchSet: 2
Gerrit-Owner: Shajini Thayasingh 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 19:55:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 39: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7188/


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 39
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 02 Jun 2021 20:59:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-06-02 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 9:

(21 comments)

I have made a first full pass over this.

I think this provides a level of functionality that will allow us to develop 
the client side like Impyla and others. There are pieces of functionality that 
can be expanded in later changes (more than RSA algorithms, support more JWKS 
options, etc). Looking at the RFCs, we may be requiring fields that are 
optional, etc.

http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/rpc/authentication.cc@156
PS9, Line 156: DEFINE_string(jwt_custom_claim_username
Do we have any code that would validate this value? (i.e. it should not be the 
empty string)


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/rpc/authentication.cc@156
PS9, Line 156: "Custom claim 'username'"
Nit: I would expand on this description to note that this specifies the custom 
claim in the JWT that contains the username for the session.


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/rpc/authentication.cc@1432
PS9, Line 1432: if (use_saml) {
  :   SecureAuthProvider* sap = NULL;
  :   external_http_auth_provider_.reset(sap = new 
SecureAuthProvider(false));
  :   sap->InitSaml();
  :   LOG(INFO) <<
  :   "External communication is authenticated for hs2-http 
protocol with SAML2 SSO";
  : } else {
  :   LOG(INFO) << "External communication is not authenticated 
for hs2-http protocol";
  : }
I think we want to be able to have JWT as the only authentication method 
(without LDAP or anything else), which means that this code needs to detect JWT 
support and use SecureAuthProvider() like it does for SAML.


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/transport/THttpServer.cpp
File be/src/transport/THttpServer.cpp:

http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/transport/THttpServer.cpp@278
PS9, Line 278: } else if (use_jwt_token_
 : && stripped_bearer_auth_token.find('.') != 
string::npos) {
 :   // Since Impala supports multiple auth mechanism in 
parallel, falls back to JWT.
 :   if 
(callbacks_.jwt_token_auth_fn(stripped_bearer_auth_token)) {
 : authorized = true;
 : if (metrics_enabled_) {
 :   
http_metrics_->total_jwt_token_auth_success_->Increment(1);
 : }
 :   }
The other way to do this is to move the below JWT auth section above this "if 
(!authorized && has_saml_)" section. The JWT section would always fall through 
to the SAML section if it fails.


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util-test.cc
File be/src/util/jwt-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util-test.cc@145
PS9, Line 145:   // Delete this temporary file
 :   void Delete();
If this isn't used directly, then you can make it private.


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util-test.cc@149
PS9, Line 149: const char* Filename() const { return name_.c_str(); }
Since JsonWebKeySet::Init() takes the filename in as a "const std::string&", 
this can return a "const std::string&".


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util-test.cc@169
PS9, Line 169: NULL
Nit: Use nullptr rather than NULL for new code.


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util-test.cc@224
PS9, Line 224:   TempTestDataFile* jwks_file = new TempTestDataFile(
 :   "{"
 :   "  \"keys\": ["
 :   "{"
 :   "  \"use\": \"sig\","
 :   "  \"kty\": \"RSA\","
 :   "  \"alg\": \"RS256\","
 :   "  \"n\": 
\"sttddbg-_yjXzcFpbMJB1fIFam9lQBeXWbTqzJwbuFbspHMsRowa8FaPw\","
 :   "  \"e\": \"AQAB\""
 :   "}"
 :   "  ]"
 :   "}");
 :   JsonWebKeySet* jwks = new JsonWebKeySet();
Here and elsewhere in this file, use smart pointers / std::unique_ptr wherever 
possible rather than new/delete pairs.


http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util-test.cc@348
PS9, Line 348: TEST(JwtUtilTest, VerifyJwtRS512) {
 :   // Sign JWT token with RS512.
 :   TempTestDataFile jwks_file(Substitute(jwks_file_format, kid_1, 
"RS512", rsa512_pub_key,
 :   kid_2, "RS512", rsa512_pub_key_invalid));
 :   JsonWebKeySet* jwks = new JsonWebKeySet();
 :   Status status = jwks->Init(jwks_file.Filename());

[Impala-ASF-CR] IMPALA-10489: Implement JWT support

2021-06-02 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17435 )

Change subject: IMPALA-10489: Implement JWT support
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util.cc
File be/src/util/jwt-util.cc:

http://gerrit.cloudera.org:8080/#/c/17435/9/be/src/util/jwt-util.cc@133
PS9, Line 133:  if (strcmp("use", member->name.GetString()) == 0) {
 : RETURN_IF_ERROR(ReadKeyProperty("use", json_key, 
&key_use, /*required*/ false));
> Do we care about what "use" is set to? Do we want to require any specific v
It sounds like this is optional, so maybe we don't need to require a specific 
value.



--
To view, visit http://gerrit.cloudera.org:8080/17435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b71fa854c9ddc8ca882878853395e1eb866143c
Gerrit-Change-Number: 17435
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 02 Jun 2021 21:24:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 18: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7189/


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 18
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 02 Jun 2021 23:26:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#20).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 708 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/20
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17298/20/tests/custom_cluster/test_metastore_service.py
File tests/custom_cluster/test_metastore_service.py:

http://gerrit.cloudera.org:8080/#/c/17298/20/tests/custom_cluster/test_metastore_service.py@698
PS20, Line 698: ,
flake8: E231 missing whitespace after ','



--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 03 Jun 2021 00:15:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8837/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 20
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 03 Jun 2021 00:34:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7190/


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 19
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 03 Jun 2021 01:45:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-02 Thread Yu-Wen Lai (Code Review)
Yu-Wen Lai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17538


Change subject: IMPALA-10724: Add mutable validWriteIdList
..

IMPALA-10724: Add mutable validWriteIdList

In this patch, we add a new class for manually updating writeIdList.
In terms of updating writeIdList, we introduce three methods including
addOpenWriteId, addAbortedWriteIds, and addCommittedWriteIds.

We will use this class in MetastoreEventProcessor for fine-grained
table refreshing. With the control of writeIdList, we will be able to
update the transactional table partially and keep it consistent.

There are some restrictions for MutableValidWriteIdList.
1. We need to mark a writeId open before mark it committed/aborted.
2. We only allow two writeId state transitions, open -> committed or
open -> aborted. Any other transition is NOT allowed.

Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A 
fe/src/main/java/org/apache/impala/hive/common/MutableValidReaderWriteIdList.java
A fe/src/main/java/org/apache/impala/hive/common/MutableValidWriteIdList.java
A 
fe/src/test/java/org/apache/impala/hive/common/MutableValidReaderWriteIdListTest.java
4 files changed, 557 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/17538/1
--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7191/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 03 Jun 2021 01:52:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10724: Add mutable validWriteIdList

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17538 )

Change subject: IMPALA-10724: Add mutable validWriteIdList
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8838/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28e60db0afd5d4398af24449b72abc928421f7c6
Gerrit-Change-Number: 17538
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 03 Jun 2021 02:11:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-06-02 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#29). ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..

IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in 
Parquet tables

This patch enables min/max filters for equi-joins on sort-by
columns in a Parquet table created by Impala. This is to take advantage
of Impala sorting the min/max values in column index in each data
file for the table. When there are multiple sort-by columns in the
table, only the leading column will be assigned a min/max filter. The
control knob is query option minmax_filter_sorted_columns, default to
true.

When minmax_filter_sorted_columns is true and the threshold (query
option minmax_filter_threshold) is 0, the patch automatically assigns
a reasonable value for the threshhold, and selects PAGE to be the
filtering level (query option minmax_filtering_level). When the
threshold is greater than 0, no adjustment will be made to either the
threshold or the filtering level. When the min and max column stats
exist on the leading sort column, these stats can be used to help
select filters that are most likely helpful.

When minmax_filter_sorted_columns is set to false, no min/max filters
will be specifically assigned to the leading sort by columns.

In the backend, the skipped pages can be quickly found by taking a
fast code path to find the corresponding lower and the upper bounds
in the sorted min and max value arrays, given a range in the filter.
The skipped pages are expessed as page ranges which are translated
into row ranges later on.

A new query option minmax_filter_fast_code_path is added to control
the work of the fast code path. It can take ON (default), OFF, or
VERIFICATION three values. The last helps verify that the results
from both the fast and the regular code path are the same.

Preliminary performance testing (joining into a simpplified TPCH
lineitem table of 2 sorted BIG INT columns and a total of 6001215
rows) confirms that min/max filtering on leading sort-by columns
improves the performance of scan operators greatly. The best result
is seen with pages containing no more than 24000 rows: 84.62ms
(page level filtering) vs. 115.27ms (row group level filtering)
vs 137.14ms (no filtering). The query utilized is as follows.

  select straight_join a.l_orderkey from
  simpflified_lineitem a join [SHUFFLE] tpch_parquet.lineitem b
  where a.l_orderkey = b.l_orderkey and b.l_receiptdate = "1998-12-31"

Also fixed in the patch are abnormal min/max displays in "Final
filter table" section in a profile for DECIMAL, TIMESTAMP and DATE
data types, and reading DATE column index in batch without validation.

Testing:
  1). Added new tests in overlap_min_max_filters.test to verify
  a) Min/max filters are only created for leading sort by column;
  b) Query option minmax_filter_sorted_columns works;
  c) Query option minmax_filter_fast_code_path works.
  2). Added new tests in parquet-page-index-test.cc to test fast
  code path under various conditions;
  3). Core [TBD]

Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-common.cc
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-page-index-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/raw-value.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
21 files changed, 972 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/17478/29
--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 


[Impala-ASF-CR] IMPALA-10709: Min/max filters should be enabled for joins on sorted columns in Parquet tables

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17478 )

Change subject: IMPALA-10709: Min/max filters should be enabled for joins on 
sorted columns in Parquet tables
..


Patch Set 29:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8839/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28c19c4b39b01ffa7d275fb245be85c28e9b2963
Gerrit-Change-Number: 17478
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Thu, 03 Jun 2021 03:26:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 39:

The only failed test is TestFetchAndSpooling.test_rows_sent_counters which is 
totally unrelated to this change and was known to be flaky in the past (it 
should be fixed though, see IMPALA-8957)


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 39
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 03 Jun 2021 06:23:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 40: Verified+1 Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 40
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 03 Jun 2021 06:31:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..

IMPALA-10640: Support reading Parquet Bloom filters - most common types

This change adds read support for Parquet Bloom filters for types that
can reasonably be supported in Impala. Other types, such as CHAR(N),
would be very difficult to support because the length may be different
in Parquet and in Impala which results in truncation or padding, and
that changes the hash which makes using the Bloom filter impossible.
Write support will be added in a later change.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

The following types are not supported for the given reasons:

 
|Impala type |  Problem  |
||
|VARCHAR(N)  | truncation can change hash|
|CHAR(N) | padding / truncation can change hash  |
|DECIMAL | multiple encodings supported  |
|TIMESTAMP   | multiple encodings supported, timezone conversion |
|DATE| not considered yet|
 

Support may be added for these types later, see IMPALA-10641.

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests
whether Parquet Bloom filtering works for the supported types and
that we do not incorrectly discard row groups for the unsupported
type VARCHAR. The Parquet file used in the test was generated with
an external tool.
  - Added unit tests for ParquetBloomFilter in file
be/src/util/parquet-bloom-filter-test.cc
  - A minor, unrelated change was done in
be/src/util/bloom-filter-test.cc: the MakeRandom() function had
return type uint64_t, the documentation claimed it returned a 64 bit
random number, but the actual number of random bits is 32, which is
what is intended in the tests. The return type and documentation
have been corrected to use 32 bits.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Reviewed-on: http://gerrit.cloudera.org:8080/17026
Reviewed-by: Csaba Ringhofer 
Tested-by: Csaba Ringhofer 
---
M LICENSE.txt
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
A be/src/exec/parquet/parquet-bloom-filter-util.cc
A be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter-test.cc
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M bin/jenkins/critique-gerrit-review.py
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M common/thrift/parquet.thrift
M testdata/data/README
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter-disabled.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
36 files changed, 7,310 insertions(+), 125 deletions(-)

Approvals:
  Csaba Ringhofer: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 41
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 

[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-06-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 40:

Submitting manually.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 40
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 03 Jun 2021 06:32:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Sourabh Goyal (Code Review)
Hello Quanlong Huang, Vihang Karajgaonkar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#21).

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..

IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

For transactional tables, catalogd already guarantees consitent table
metadata reads based on the writeIdList passed in the request. For
non transactional tables, the reads are eventually consistent as in
event processor thread in the background, processes HMS events for the
table and updates its metadata.
In this patch, to ensure strong consistency guarantees for external
tables,we invalidate the table metadata from cache if HMS DDL apis
like alter/drop table/partition are accessed from catalogd's metastore
server. As a result of which, any subsequent get table request fetches
the table from HMS and loads it in cache. This ensures that any
get_table/get_partition requests after DDL operations on same table
return updated table metadata. This behavior has a performance penalty
since metadata loading in cache takes time specially for large tables.
The change is behind catalogd server's flag:
invalidate_hms_cache_on_ddls which is enabled by default. The flag
needs to be turned off in case of a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 713 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/21
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 21
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7192/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 21
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 03 Jun 2021 06:36:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table metadata cache for HMS DDL apis

2021-06-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table metadata cache for HMS 
DDL apis
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8840/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 21
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sourabh Goyal 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 03 Jun 2021 06:56:47 +
Gerrit-HasComments: No