[Impala-ASF-CR] IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse functions

2021-01-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16908 )

Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring 
and reverse functions
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8018/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
Gerrit-Change-Number: 16908
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Jan 2021 03:38:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse functions

2021-01-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16908 )

Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring 
and reverse functions
..


Patch Set 11:

Added tests for using utf8 functions in where, group by and having clauses.


--
To view, visit http://gerrit.cloudera.org:8080/16908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
Gerrit-Change-Number: 16908
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Jan 2021 03:17:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse functions

2021-01-24 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16908

to look at the new patch set (#11).

Change subject: IMPALA-2019(Part-1): Provide UTF-8 support in length, substring 
and reverse functions
..

IMPALA-2019(Part-1): Provide UTF-8 support in length, substring and reverse 
functions

A unicode character can be encoded into 1-4 bytes in UTF-8. String
functions will return undesired results when the input contains unicode
characters, because we deal with a string as a byte array. For instance,
length() returns the length in bytes, not in unicode characters.

UTF-8 is the dominant unicode encoding used in the Hadoop ecosystem.
This patch adds UTF-8 support in some string functions so they can have
UTF-8 aware behavior. For compatibility with the old versions, a new
query option, UTF8_MODE, is added for turning on/off the UTF-8 aware
behavior. Currently, only length(), substring() and reverse() support
it. Other function supports will be added in later patches.

String functions will check the query option and switch to use the
desired implementation. It's similar to how we use the decimal_v2 query
option in builtin functions.

For easy testing, the UTF-8 aware version of string functions are
also exposed as builtin functions (named by utf8_*, e.g. utf8_length).

Tests:
 - Add BE tests for utf8 functions.
 - Add e2e tests for the UTF8_MODE query option.

Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
---
M be/src/codegen/llvm-codegen.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
M be/src/runtime/runtime-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/udf/udf-internal.h
M be/src/udf/udf.cc
M be/src/util/bit-util.h
M common/function-registry/impala_functions.py
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M testdata/datasets/functional/functional_schema_template.sql
A 
testdata/workloads/functional-query/queries/QueryTest/utf8-string-functions.test
A tests/query_test/test_utf8_strings.py
16 files changed, 402 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16908/11
--
To view, visit http://gerrit.cloudera.org:8080/16908
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0aaf3544e89f8a3d531ad6afe056b3658b525b7c
Gerrit-Change-Number: 16908
Gerrit-PatchSet: 11
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] [WIP] IMPALA-9234: Support Ranger row filtering policies

2021-01-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16976 )

Change subject: [WIP] IMPALA-9234: Support Ranger row filtering policies
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8017/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I580517be241225ca15e45686381b78890178d7cc
Gerrit-Change-Number: 16976
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 25 Jan 2021 02:56:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10447: Add a newline when exporting shell output to a file.

2021-01-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16966 )

Change subject: IMPALA-10447: Add a newline when exporting shell output to a 
file.
..


Patch Set 3: Code-Review+2

(1 comment)

Thanks for fixing this quickly! LGTM, just have a minor question.

http://gerrit.cloudera.org:8080/#/c/16966/3/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/16966/3/tests/shell/test_shell_commandline.py@1089
PS3, Line 1089: 'i_item_sk'
Is it intended to make this a constant string literal?



--
To view, visit http://gerrit.cloudera.org:8080/16966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
Gerrit-Change-Number: 16966
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 25 Jan 2021 02:52:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-9234: Support Ranger row filtering policies

2021-01-24 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16976


Change subject: [WIP] IMPALA-9234: Support Ranger row filtering policies
..

[WIP] IMPALA-9234: Support Ranger row filtering policies

Ranger row filtering policies provide customized expressions to filter
out rows for specific users when reading from a table. This patch adds
support for this feature. A new feature flag, enable_row_filtering, is
added to disable this experimental feature. It defaults to be true so
the feature is enabled by default.

Note that row filtering policies take effects prior to any column
masking policies, because column masking policies apply on result data.

Implementation:
The existing table masking view infrastructure can be extended to
support row filtering. Currently when analyzing a table with column
masking policies, we replace the TableRef with an InlineViewRef which
contains a SelectStmt wrapping the columns with masking expressions.
This patch adds the row filtering expressions to the WhereClause of the
SelectStmt.

TODO: refactor some common codes with column masking
TODO: add more tests (including audit tests)

Tests:
 - Add FE test for error message when disabling row filtering
 - Add e2e test with row filtering policies
 - Add e2e test with column masking and row filtering policies both take
   place

Change-Id: I580517be241225ca15e45686381b78890178d7cc
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java
M fe/src/main/java/org/apache/impala/authorization/NoopAuthorizationFactory.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationTestBase.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
A 
testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
M tests/authorization/test_ranger.py
16 files changed, 209 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/16976/1
--
To view, visit http://gerrit.cloudera.org:8080/16976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I580517be241225ca15e45686381b78890178d7cc
Gerrit-Change-Number: 16976
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10447: Add a newline when exporting shell output to a file.

2021-01-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16966 )

Change subject: IMPALA-10447: Add a newline when exporting shell output to a 
file.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8016/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
Gerrit-Change-Number: 16966
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 24 Jan 2021 17:26:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10447: Add a newline when exporting shell output to a file.

2021-01-24 Thread Andrew Sherman (Code Review)
Andrew Sherman has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16966 )

Change subject: IMPALA-10447: Add a newline when exporting shell output to a 
file.
..

IMPALA-10447: Add a newline when exporting shell output to a file.

Impala shell outputs a batch of rows using OutputStream. Inside
OutputStream, output to a file is handled slightly differently from
output that is written to stdout. When writing to stdout we use print()
(which appends a newline) while when writing to a file we use write()
(which adds nothing). This difference was introduced in IMPALA-3343 so
this bug may be a regression introduced then. To ensure that output is
the same in either case we need to add a newline after writing each
batch of rows to a file.

TESTING:
Added a new test for this case.

Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
---
M shell/shell_output.py
M tests/shell/test_shell_commandline.py
2 files changed, 29 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/16966/3
--
To view, visit http://gerrit.cloudera.org:8080/16966
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I078a06c54e0834bc1f898626afbfff4ded579fa9
Gerrit-Change-Number: 16966
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang