[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Reviewed-on: http://gerrit.cloudera.org:8080/21131
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 152 insertions(+), 20 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 28: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 09 May 2024 15:09:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-09 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 27: Code-Review+2

Thanks Pranav!


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 09 May 2024 10:06:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 28: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 09 May 2024 10:07:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 28:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10624/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 09 May 2024 10:07:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 27: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 21:48:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 27:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16140/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 18:39:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 27: Verified+1 Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 18:15:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 27:

(1 comment)

> Uploaded patch set 27.

http://gerrit.cloudera.org:8080/#/c/21131/26/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/26/tests/query_test/test_insert.py@343
PS26, Line 343:
> This.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 18:12:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#27). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 152 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/27
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 26:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/26/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/26/tests/query_test/test_insert.py@343
PS26, Line 343: =
> flake8: E225 missing whitespace around operator
This.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 18:01:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 26: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 15:12:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 26:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10619/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 08 May 2024 10:43:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16122/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 21:16:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 152 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/26
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 26:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/26/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/26/tests/query_test/test_insert.py@343
PS26, Line 343: =
flake8: E225 missing whitespace around operator



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 20:51:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 25: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 20:34:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 25:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16120/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 20:21:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 24:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/16119/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 20:18:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 25:

(5 comments)

> Uploaded patch set 25.

http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc@94
PS23, Line 94:   string test_path = "/home/impala/directory/";
> These strings could also be extracted into variables.
Done


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320
PS23, Line 320: is
> "is a"
Done


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320
PS23, Line 320:  the result
> "the result line..."
Done


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@321
PS23, Line 321: we only verify that the value is present in str
> "we only verify that the value is present in string representation of the w
Done


http://gerrit.cloudera.org:8080/#/c/21131/24/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/24/tests/query_test/test_insert.py@301
PS24, Line 301: "create table {}(i int) partitioned by (p string) 
stored as parquet".format(tbl))
I've added a new test for iceberg, since there are a lot of differences in how 
we create, insert and how show partition works in iceberg, I've skipped the 
helper function to improve the readabilty. The only thing that was left common 
were a few asserts and the strings. I also think that if we're to add any new 
tests for iceberg, it'd be easier and more convenient if it'll have a separate 
function.
I've an optimized version for the same as well and have shared the same with 
you, but I think it just gets more complicated when we'll need to handle 
iceberg and parquet tables separately because of their show partition 
evaluation.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 19:56:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#25). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 141 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/25
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#24). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 141 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/24
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@301
PS23, Line 301: "create table {}(i int) partitioned by (p string) 
stored as parquet".format(tbl))
One more thing: you should run this test with an Iceberg table too. You could 
extract the common parts (mostly everything except for the create and insert 
statements) into a helper function.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 14:19:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320
PS23, Line 320: result line
"the result line..."


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@321
PS23, Line 321: the values present inside it are just verified.
"we only verify that the value is present in string representation of the whole 
row"



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 12:46:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(2 comments)

Just a few nits.

http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/23/be/src/util/coding-util-test.cc@94
PS23, Line 94:   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", false);
These strings could also be extracted into variables.


http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/23/tests/query_test/test_insert.py@320
PS23, Line 320: are
"is a"



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 12:24:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16108/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 06:25:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23: Code-Review+1

LGTM


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 06:23:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 23:

(1 comment)

> Uploaded patch set 23.

http://gerrit.cloudera.org:8080/#/c/21131/22/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/22/tests/query_test/test_insert.py@310
PS22, Line 310: \
> We need double back slashes to fix this, i.e. "\\"
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 06:00:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-07 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#23). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 113 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/23
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16107/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 05:50:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/22/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/22/tests/query_test/test_insert.py@310
PS22, Line 310: \
> flake8: W605 invalid escape sequence '\{'
We need double back slashes to fix this, i.e. "\\"



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 05:33:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 22:

(1 comment)

> Uploaded patch set 22.

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@299
PS20, Line 299:   tb
> Not done yet. In PS21, it's still using 4 spaces.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 05:25:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 113 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/22
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/22/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/22/tests/query_test/test_insert.py@310
PS22, Line 310: \
flake8: W605 invalid escape sequence '\{'



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 05:25:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 21:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16106/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 05:03:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@299
PS20, Line 299:
> Done
Not done yet. In PS21, it's still using 4 spaces.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 04:49:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 21:

(5 comments)

> Patch Set 21:
>
> (11 comments)
>
> > Uploaded patch set 21.

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@96
PS20, Line 96:   string test = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r\x0E\x0F\x10"
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@301
PS21, Line 301: )
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@310
PS21, Line 310: \
> flake8: W605 invalid escape sequence '\{'
Done


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@311
PS21, Line 311: \
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@320
PS21, Line 320: e
> flake8: E501 line too long (94 > 90 characters)
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 04:39:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 21:

(11 comments)

> Uploaded patch set 21.

http://gerrit.cloudera.org:8080/#/c/21131/19/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/19/be/src/util/coding-util-test.cc@97
PS19, Line 97: 
"\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F\x7F\"#%'"
> Why is the single quote escaped?
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@98
PS20, Line 98: "*/:=?\\{[]^";
> Please add '^'
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@104
PS20, Line 104: }
> nit: remove blank line
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@297
PS20, Line 297:   def test_escaped_partition_values(self, unique_database):
> nit: move this inside the method, i.e.
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@299
PS20, Line 299:
> nit: use 2 spaces indention
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@300
PS20, Line 300:
> wrap this to be
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@301
PS20, Line 301: )
> flake8: E501 line too long (104 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@302
PS20, Line 302:
> Single quote is missing here. The two single quotes in the middle doesn't r
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@303
PS20, Line 303:
> wrap this to be
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@305
PS20, Line 305:   special_characters = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
> This just verifies the partition value. We still need to verify the partiti
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@306
PS20, Line 306:
"\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
> Can you add a comment about why splitting this out was necessary?
"" is used to represent a backslash as it gets escaped again during parsing 
of the insert statement. It is no more needed as we introduce a part_value 
variable to store the correct string.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 04:38:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 21:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@301
PS21, Line 301: )
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@310
PS21, Line 310: \
flake8: W605 invalid escape sequence '\{'


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@311
PS21, Line 311: \
flake8: E501 line too long (91 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/21/tests/query_test/test_insert.py@320
PS21, Line 320: e
flake8: E501 line too long (94 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 04:38:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test,
coding-util-test.cc and test_insert.py.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 113 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/21
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 20:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@98
PS20, Line 98: "*/:=?\\{[]";
Please add '^'


http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@104
PS20, Line 104:
nit: remove blank line


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@297
PS20, Line 297:   """Test for special characters in partition values."""
nit: move this inside the method, i.e.

  def test_escaped_partition_values(self, unique_database):
"""Test for special characters in partition values."""


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@299
PS20, Line 299:
nit: use 2 spaces indention


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@300
PS20, Line 300: e
> flake8: E501 line too long (106 > 90 characters)
wrap this to be

self.execute_query(
"create table {}(i int) partitioned by (p string) stored as 
parquet".format(tbl))


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@302
PS20, Line 302: 7
> flake8: E501 line too long (108 > 90 characters)
Single quote is missing here. The two single quotes in the middle doesn't 
represent a single quote. They become the end and start of two strings. We can 
double quote the string and wrap it as

special_characters_ = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
  
"\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
  "\x1C\x1D\x1E\x1F\"\x7F'%*/:=?\\{[]#^"

Also add '^' and remove "()"


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@303
PS20, Line 303: _
> flake8: E501 line too long (104 > 90 characters)
wrap this to be

self.execute_query(
"insert into {} partition(p='{}') values (0)".format(tbl, 
special_characters_))


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@305
PS20, Line 305:   assert a.data[0] == special_characters_
This just verifies the partition value. We still need to verify the partition 
dir created by Impala is correct. This seems better:

  def test_escaped_partition_values(self, unique_database):
"""Test for special characters in partition values."""
tbl = unique_database + ".tbl"
self.execute_query(
"create table {}(i int) partitioned by (p string) stored as 
parquet".format(tbl))
# Use "\\'" for single quote since the insert statement use single quote on 
the
# partition value. Use "" for back slash since it will be escaped again 
in
# parsing the insert statement
special_characters = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
 
"\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
 "\x1C\x1D\x1E\x1F\"\x7F\\'%*/:=?{[]#^"
part_value = "SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
 "\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
 "\x1C\x1D\x1E\x1F\"\x7F'%*/:=?\{[]#^"
part_dir = 
"p=SpecialCharacters%01%02%03%04%05%06%07%08%09%0A%0B%0C%0D%0E%0F%10%11" \
   
"%12%13%14%15%16%17%18%19%1A%1B%1C%1D%1E%1F%22%7F%27%25%2A%2F%3A%3D%3F" \
   "%5C%7B%5B%5D%23%5E"
res = self.execute_query(
"insert into {} partition(p='{}') values (0)".format(tbl, 
special_characters))
assert res.data[0] == part_dir + ": 1"
res = self.client.execute("select p from {}".format(tbl))
assert res.data[0] == part_value
res = self.execute_query("show partitions " + tbl)
# There are "\t" in the partition value so we can't easily split the result 
line.
# Just verify the values are inside it.
assert part_value in res.data[0]
assert part_dir in res.data[0]



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 07 May 2024 03:26:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16102/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 21:46:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16101/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 21:44:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 20: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21131/19/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/19/be/src/util/coding-util-test.cc@97
PS19, Line 97: 
"\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F\x7F\"#%'"
> Why is the single quote escaped?
Done


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@302
PS19, Line 302:  
'\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F\"\x7F''()%*/:=?{[]#')
> This looks like you forgot to escape the double quote. The rest are omitted
Done


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@306
PS20, Line 306:   self.execute_query("insert into {} partition(p='') 
values (0)".format(tbl))
Can you add a comment about why splitting this out was necessary?



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 21:21:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 100 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/20
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
M tests/query_test/test_insert.py
5 files changed, 96 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/19
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@96
PS20, Line 96:   string test = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r\x0E\x0F\x10\x11"
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@300
PS20, Line 300: e
flake8: E501 line too long (106 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@301
PS20, Line 301: E
flake8: E501 line too long (104 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@302
PS20, Line 302: 7
flake8: E501 line too long (108 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@303
PS20, Line 303: _
flake8: E501 line too long (104 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 21:21:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 19:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21131/19/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/19/be/src/util/coding-util-test.cc@96
PS19, Line 96:   string test = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r\x0E\x0F\x10\x11"
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@300
PS19, Line 300: e
flake8: E501 line too long (106 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@301
PS19, Line 301: E
flake8: E501 line too long (104 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@302
PS19, Line 302: "
flake8: E501 line too long (107 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@302
PS19, Line 302: #
flake8: E261 at least two spaces before inline comment


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@302
PS19, Line 302: #
flake8: E262 inline comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/21131/19/tests/query_test/test_insert.py@303
PS19, Line 303: _
flake8: E501 line too long (104 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 21:19:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 18:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/18/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/18/be/src/util/coding-util-test.cc@96
PS18, Line 96:   
TestUrl("Impala\x01""dev\x02""\tenv\v\x14\x12""for\x1D\x05\b\x03\x04"
 :   
"\x05\x06\x07\x09\x0C\r\x0E""beginners\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18"
 :   "\x19\x10""Amuststart\x1B\x1C\x1D\x1E\x1F\x7F",
 :   "Impala%01dev%02%09env%0b%14%12for%1d%05"
 :   
"%08%03%04%05%06%07%09%0c%0d%0ebeginners%0f%10%11%12%13%14%15%16%17%18%19%10Amust"
 :   "start%1b%1c%1d%1e%1f%7f", false);
 :   
TestUrl("Impala\x01""dev\x02""\tenv\v\x14\x12""for\x1D\x05\b\x03\x04\x05\x06"
 :   
"\x07\x09\x0C\r\x0E""beginners\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19"
 :   "\x10""Amuststart\x1B\x1C\x1D\x1E\x1F\x7F",
 :   "Impala%01dev%02%09env%0b%14%12for%1d"
 :   
"%05%08%03%04%05%06%07%09%0c%0d%0ebeginners%0f%10%11%12%13%14%15%16%17%18%19%10"
 :   "Amuststart%1b%1c%1d%1e%1f%7f", true);
The tests look a bit messy for me. Not sure if all escaped characters are 
covered. Can we add a simple case that the input string just consists of all 
the escaped characters?
Please extract the strings into variables to avoid writing them twice.

If the following tests don't add more coverage, we can probably remove them. 
Just keep this one and add the simple case before this.
If they do provide more coverage, please add some comments to explain it.


http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66: !isalnum(static_cast(ch)) && 
!ShouldNotEscape(ch))) {
> Done.
We can write the test in python codes instead of using test files, e.g. add 
this in tests/query_test/test_insert.py under class TestInsertPartKey:

  def test_escaped_partition_values(self, unique_database):
tbl = unique_database + ".tbl"
self.execute_query("create table {}(i int) partitioned by(p 
string)".format(tbl))
import struct
# TODO: add more values
for b in range(1, 32):
  stmt = "insert into {} partition(p='{}') values (0)".format(tbl, 
struct.pack("B", b))
  self.execute_query(stmt)
res = self.execute_query("show partitions " + tbl)
# TODO: verify more values
for i in range(8):
  columns = res.data[i].split("\t")
  # Verify the partition value
  assert columns[0] == struct.pack("B", i+1)
  # Verify the partition location
  assert "p=%0" + str(i+1) in columns[8]
assert len(res.data) == 32

Note that we still use python2 in tests so we need 'struct.pack("B", b)' to 
create a byte string. In python3, we can use bytes directly.
The above test is incomplete. Please resolve the TODOs.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 07:15:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16095/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 04:54:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-05 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 18:

(2 comments)

> Uploaded patch set 18.

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66: !isalnum(static_cast(ch)) && 
!ShouldNotEscape(ch))) {
> I played with it a bit and it doesn't work with for example '\t' and '\n' (
Done.
Note: I tested those cases manually but to include those tests we'll have to 
bring a change in the test framework, which in its current state prevents us to 
write such tests. For example, '\t ' is considered a separator in our current 
test framework, so to incorporate such tests we might have to change the test 
framework which might lead to other errors, That's why I haven't included such 
cases in the current tests.


http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66:  c
> I think it would be good if we first cast it to unsigned char because of wh
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 04:37:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-05 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 177 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/18
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 15:

(2 comments)

Thanks Pranav! In the meantime I discovered something else, but that shouldn't 
be too hard to fix.

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66:   ss << '%' << static_cast(ch);
I played with it a bit and it doesn't work with for example '\t' and '\n' (for 
example
  insert into my_part_tbl partition(p='a\tt') values (0);
).
Let's take '\t'. Its value is 9, which only has a single hexadecimal digit. Now 
we'll escape it as "%9" but Hive has "%09". To match Hive's behaviour, do this:
1. Add this include in the appropriate include group: #include 
2. Add "<< setfill('0')" after "hex" on L58.
3. Add "<< setw(2)" after inserting the % sign on L66.

This will pad hex values so that they are always at least 2 chars wide.
Note that setw is reset after insertion, so it has to be within the loop.

This also reveals a test deficiency: we should ideally test every character 
that is in 'SpecialCharacters'. I don't know if we can insert arbitrary byte 
values in Impala strings, maybe Quanlong can help there. But even if that's not 
possible, we should cover the ones for which  an escape sequence exists (i.e. 
'\t', '\n', '\v' etc.).


http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66: ch
I think it would be good if we first cast it to unsigned char because of what 
happens if we encounter a character > 127. This is possible if the input 
contains one, and we are in non-Hive-compat mode.

If char is a signed type (that is implementation dependent), then the value > 
127 will be interpreted as a negative number. When we convert it to uint32_t, 
it will be sign-extended. For example, instead of "%FF" we will have 
"%". If we convert it first to unsigned char, it will not be 
sign-extended.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 14:01:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16065/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 12:11:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16064/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 12:08:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 15:

(1 comment)

> Uploaded patch set 15: Commit message was updated.

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: makes thr
> I think that's inferred.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:45:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 177 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/15
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 14:

(4 comments)

> Uploaded patch set 14: Commit message was updated.

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: thr
> I've almost forgotten the third: the list of characters to be escaped was e
Done


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: addresses
> makes
I think that's inferred.


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@19
PS13, Line 19: the set of charac
> I didn't mean it as a literal text to be included. Instead, it could be:
Done


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@25
PS13, Line 25: n
> Nit: "... byte and whose inclusion ...".
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:44:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch addresses three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 177 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/14
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16063/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:38:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: addresses
makes


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@18
PS13, Line 18: two
I've almost forgotten the third: the list of characters to be escaped was 
extended to match the current list in Hive.


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@19
PS13, Line 19: there was a "set"
I didn't mean it as a literal text to be included. Instead, it could be:
"Before the change, the set of characters that need to be escaped was stored as 
a string. The current patch uses an unordered_set instead."


http://gerrit.cloudera.org:8080/#/c/21131/13//COMMIT_MSG@25
PS13, Line 25:
Nit: "... byte and whose inclusion ...".



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:31:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 13:

(6 comments)

> Uploaded patch set 13.

http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@15
PS12, Line 15: "
> Use double quotes here ("\u00FF"), this is not a valid character literal bu
Done


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: wo key changes:
> "the special characters that need to be escaped" would be better, it's not
Done


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18:
> I think it could be formulated better. There are two things that have chang
Done


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18:
> Nit: no need for the comma here.
Done


http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@19
PS12, Line 19:
> Let's fix include groups: there should be an empty line after #include "uti
Done


http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@50
PS12, Line 50: '\f',
> This could be '\f' (form feed), see https://www.compart.com/en/unicode/U+00
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:13:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch addresses two key changes:
1. Before the change, there was a "set" of characters that needed
to be encoded, which were stored as a string. The current patch
uses an unordered_set which contains all the special characters
that needs to be escaped to handle the character encoding
efficiently.

2. '\xFF', which is an invalid UTF-8 byte whose inclusion was erroneous
from the beginning, is replaced with '\x7F', which is a control
character for DELETE, ensuring consistency and correctness in URL
encoding.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 177 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/13
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 12:

(6 comments)

Thanks Pranav, we're close.

http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@15
PS12, Line 15: '
Use double quotes here ("\u00FF"), this is not a valid character literal but it 
is a valid string literal.
Same also later on this line.


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: the special characters
"the special characters that need to be escaped" would be better, it's not all 
special chars.


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: ,
Nit: no need for the comma here.


http://gerrit.cloudera.org:8080/#/c/21131/12//COMMIT_MSG@18
PS12, Line 18: included
I think it could be formulated better. There are two things that have changed:
1. '\xFF' was corrected to '\x7F' and
2. before this change we also had a "set" of characters that needed to be 
encoded, but it used to be stored as a string and now it's stored as an 
unordered_set (which is much better in my opinion). I think we should mention 
these separately.


http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@19
PS12, Line 19: #include 
Let's fix include groups: there should be an empty line after #include 
"util/coding-util.h": after it come the standard library includes. Also, leave 
a line after  because the boost and sasl includes are includes 
from other libraries than std.

Usually we group includes like this:

1. header belonging to the current .cc file
2. standard library includes
3. other (third party) library includes
4. impala includes

Each group is ordered alphabetically and there is an empty line between groups.


http://gerrit.cloudera.org:8080/#/c/21131/12/be/src/util/coding-util.cc@50
PS12, Line 50: '\x0C'
This could be '\f' (form feed), see https://www.compart.com/en/unicode/U+000C.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 10:36:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16060/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 05:39:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 12:

(16 comments)

> Uploaded patch set 12.

http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@9
PS11, Line 9: An
> An error - we're introducing it now.
Done


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@14
PS11, Line 14: matched one of th
> matched one of the ...
Done


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@15
PS11, Line 15: '\u00FF'
> '\u00FF' is not a valid character literal in C++. "\u00FF" was included in
Done


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@18
PS11, Line 18: A set
> See the comment above, we should explain that it was a mistake from the beg
Done


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22
PS11, Line 22:
> Parquet and Iceberg are not exclusive things, we also use Parquet in our Ic
Done


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22
PS11, Line 22:
> It would be good to include in which file the new tests are.
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51
PS11, Line 51:  '\x1A
> Why adding space character here? Hive doesn't escape it in partition paths.
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@57
PS11, Line 57: "upp
> It would be clearer like this: "uppercase" and "hex" only affect the insert
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@62
PS11, Line 62: // the character is not alphanumeric and it is not one of 
the characters specifically
> This doesn't match the code. Try this:
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test
File testdata/workloads/functional-query/queries/QueryTest/insert.test:

http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test@450
PS11, Line 450: 'R
> We just need double single quotes in the RESULTS section to escape a single
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test@452
PS11, Line 452:
> I don't think need this. It's used for non-ascii characters.
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test@470
PS11, Line 470:
> This will fail if we escape space.
Ack


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:

http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@327
PS11, Line 327:  TYPES
> You don't need to drop the table, the tables are created in a unique databa
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@337
PS11, Line 337:
> I don't think this is needed.
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@344
PS11, Line 344:
> You should do the same insertions into the Iceberg table as into the other
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@354
PS11, Line 354:
> See L327.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 05:14:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the '\u00FF' literal.Inclusion of '\u00FF' was likely
a mistake from the beginning and it should have been '\x7F'.

A set containing all the special characters has been included and,
'\xFF', which is an invalid UTF-8 byte whose inclusion was erroneous
from the beginning, is replaced with '\x7F', which is a control
character for DELETE, ensuring consistency and correctness in URL
encoding.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.
Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 178 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/12
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51
PS11, Line 51: '\x20'
> For those characters that have a readable literal representation, e.g. " ",
Why adding space character here? Hive doesn't escape it in partition paths. E.g.

 beeline -u "jdbc:hive2://localhost:11050"
 hive> create table my_part_tbl(i int) partitioned by (p string);
 hive> alter table my_part_tbl add partition(p='a b');
 hive> show partitions my_part_tbl;
++
| partition  |
++
| p=a b  |
++
 hive> describe formatted my_part_tbl partition(p='a b');
...
| Location: | 
hdfs://localhost:20500/test-warehouse/my_part_tbl/p=a b | NULL  
|
...


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test
File testdata/workloads/functional-query/queries/QueryTest/insert.test:

http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test@450
PS11, Line 450: ''
We just need double single quotes in the RESULTS section to escape a single 
quote. Don't need to do this in the QUERY section.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test@452
PS11, Line 452: RAW_STRING
I don't think need this. It's used for non-ascii characters.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/insert.test@470
PS11, Line 470:
This will fail if we escape space.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 30 Apr 2024 01:47:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 11:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@9
PS11, Line 9: The
An error - we're introducing it now.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@14
PS11, Line 14: was matching with
matched one of the ...


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@15
PS11, Line 15: '\u00FF'
'\u00FF' is not a valid character literal in C++. "\u00FF" was included in a 
string, it was part of a string literal. At last I found out that it can 
actually be used in C++ to specify arbitrary unicode values, but of course it 
may be encoded on multiple bytes, see 
https://en.cppreference.com/w/cpp/language/escape.

Also include in the commit message that including this character was probably a 
mistake from the beginning, it should have been '\x7F'.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@18
PS11, Line 18: '\xFF'
See the comment above, we should explain that it was a mistake from the 
beginning.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22
PS11, Line 22: Some
It would be good to include in which file the new tests are.


http://gerrit.cloudera.org:8080/#/c/21131/11//COMMIT_MSG@22
PS11, Line 22: for both parquet and iceberg
Parquet and Iceberg are not exclusive things, we also use Parquet in our 
Iceberg tables. Parquet is a file format and Iceberg is a table format. We 
could say "tests on both traditional Hive tables and Iceberg tables ...".


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@25
PS11, Line 25: #include 
The includes within a group should be in alphabetical order, so this should go 
before L23.


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@31
PS11, Line 31: #include "sasl/saslutil.h"
Includes from the standard library should be the first group after the header, 
i.e. this should come after #include .


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51
PS11, Line 51: '\x20'
> Represents space character
For those characters that have a readable literal representation, e.g. " ", 
"\n" etc., we should use that even if Hive doesn't.


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@57
PS11, Line 57: Used
It would be clearer like this: "uppercase" and "hex" only affect the insertion 
of integers, not that of char values.


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@62
PS11, Line 62: // Hive-compat mode, and the character is not alphanumeric 
or is one
This doesn't match the code. Try this:
"... b) we are not in Hive-compat mode and the character is not alphanumeric 
and it is not one of the
characters specifically excluded from escaping (see ShouldNotEscape())."


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:

http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@302
PS11, Line 302: show tables;
I don't think it's needed. Also, if another table is added later before this 
query, this will have to be changed as well.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@309
PS11, Line 309: insert into unicode_partition_values partition(p='运营业务数据') 
values (0);
You don't need to put these inserts into separate QUERY segments. It could also 
go together with the select on L318.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@327
PS11, Line 327: drop table unicode_partition_values;
You don't need to drop the table, the tables are created in a unique database, 
see test_column_unicode.py::TestColumnUnicode::test_create_unicode_table.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@337
PS11, Line 337: show tables;
I don't think this is needed.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@344
PS11, Line 344: insert into unicode_partition_values_iceberg values (0, '运');
You should do the same insertions into the Iceberg table as into the other one.


http://gerrit.cloudera.org:8080/#/c/21131/11/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@354
PS11, Line 354: drop table 

[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16052/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 29 Apr 2024 12:07:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 11:

(5 comments)

> Patch Set 11:
>
> (2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc@94
PS10, Line 94:   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", false);
 :   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", true);
> Please add some tests like these for newly added marks like '\n', '\r', '^'
Done


http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@a40
PS10, Line 40:
 :
 :
 :
> Can you keep this comment? We should still mention common/src/java/org/apac
Done


http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@45
PS10, Line 45: // characters it will encode.
 : // See common/src/java/org/apache/hadoop/hive/common
> We are still missing several characters here, e.g. single quote, '\n' and '
Done


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util.cc@51
PS11, Line 51: '\x20'
Represents space character


http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test
File testdata/workloads/functional-query/queries/QueryTest/insert.test:

http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test@452
PS10, Line 452:  RESULTS: RAW_STRING
  : '0','O''Reilly'
  :  TYPES
  : STRING, STRING
  : 
  :  QUERY
  : truncate insert_string_partitioned;
  : INSERT INTO TABLE insert_string_partitioned 
PARTITION(s2="Impala''#%*/:=?{}[]^") values ('0');
  : SELECT * FROM insert_string_partitioned where s2= 
"Impala''#%*/:=?{}[]^";
  :  RESULTS: RAW_STRING
  : '0','Impala''#%*/:=?{}[]^'
  :  TYPES
> e2e tests for ASCII characters can be added here, e.g.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 29 Apr 2024 11:43:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

The error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte was matching with one of the two bytes that
represented the '\u00FF' literal.

A set containing all the special characters has been included and,
'\xFF', which is an invalid UTF-8 byte is replaced with '\x7F',
which is a control character for DELETE, ensuring consistency and
correctness in URL encoding.

Testing: Some basic tests for both parquet and iceberg are provided
in unicode-column-name.test.
Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 204 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/11
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-29 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util-test.cc@100
PS11, Line 100:   
"%08%03%04%05%06%07%09%0c%0d%0ebeginners%0f%10%11%12%13%14%15%16%17%18%19%10Amuststart"
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/21131/11/be/src/util/coding-util-test.cc@103
PS11, Line 103:   
"\x07\x09\x0C\r\x0E""beginners\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x10"""
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 29 Apr 2024 11:42:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-27 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util-test.cc@94
PS10, Line 94:   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", false);
 :   TestUrl("/home/impala/directory/", 
"%2Fhome%2Fimpala%2Fdirectory%2F", true);
Please add some tests like these for newly added marks like '\n', '\r', '^', 
etc.

EDIT: please include all the escaped characters.


http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@a40
PS10, Line 40:
 :
 :
 :
Can you keep this comment? We should still mention 
common/src/java/org/apache/hadoop/hive/common/FileUtils.java since the list 
would change there.

Also mention here that "Impala creates the partition directories using these 
encoded strings. So it's crucial to keep it consistent with Hive."


http://gerrit.cloudera.org:8080/#/c/21131/10/be/src/util/coding-util.cc@45
PS10, Line 45: static const std::unordered_set SpecialCharacters = {'"', 
'#', '\\', '*', '/',
 : ':', '=', '?', '%', '^', '[', ']', '{', '\x7F'};
We are still missing several characters here, e.g. single quote, '\n' and '\r'. 
I think we should use the same list as Hive's. The list in Hive consists of 
control characters, i.e. ASCII characters from 0 to 31 and 127 (0x7F), and 
marks like '"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '{', '[', ']', 
'^'.

Code snipper of Hive:

  static BitSet charToEscape = new BitSet(128);
  static {
for (char c = 0; c < ' '; c++) {
  charToEscape.set(c);
}

/**
 * ASCII 01-1F are HTTP control characters that need to be escaped.
 * \u000A and \u000D are \n and \r, respectively.
 */
char[] clist = new char[] {'\u0001', '\u0002', '\u0003', '\u0004',
'\u0005', '\u0006', '\u0007', '\u0008', '\u0009', '\n', '\u000B',
'\u000C', '\r', '\u000E', '\u000F', '\u0010', '\u0011', '\u0012',
'\u0013', '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019',
'\u001A', '\u001B', '\u001C', '\u001D', '\u001E', '\u001F',
'"', '#', '%', '\'', '*', '/', ':', '=', '?', '\\', '\u007F', '{',
'[', ']', '^'};

for (char c : clist) {
  charToEscape.set(c);
}

Note that Hive adds some characters twice. ASCII code of ' ' is 32. So in the 
first loop, 0-31 are already added.
In Java, char is a "16-bit integer" that represents a single 16-bit Unicode 
character. So Hive code uses unicodes like '\u007F'. But in C++, such unicodes 
are represented by UTF-8 bytes. And char in C++ is just 8 bits. So in the 
original code "\u00FF" is actually two chars.


http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test
File testdata/workloads/functional-query/queries/QueryTest/insert.test:

http://gerrit.cloudera.org:8080/#/c/21131/10/testdata/workloads/functional-query/queries/QueryTest/insert.test@452
PS10, Line 452: INSERT INTO TABLE insert_string_partitioned PARTITION(s2="_.~ 
+")
  : SELECT "value" FROM functional.alltypessmall LIMIT 1;
  :  RESULTS
  : s2=_.~ +: 1
  : 
  :  QUERY
  : # select with unencoded partition key
  : SELECT * FROM insert_string_partitioned;
  :  RESULTS
  : 'value','_.~ +'
  :  TYPES
  : string, string
e2e tests for ASCII characters can be added here, e.g.

INSERT INTO TABLE insert_string_partitioned PARTITION(s2="O'Reilly") values 
('0');

Or use a string with all escaped marks.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sun, 28 Apr 2024 02:12:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sat, 27 Apr 2024 09:13:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16042/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sat, 27 Apr 2024 04:56:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10:

(2 comments)

> Patch Set 10:
>
> Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10587/ 
> DRY_RUN=true

http://gerrit.cloudera.org:8080/#/c/21131/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/9//COMMIT_MSG@18
PS9, Line 18: '\xFF', which is an invalid UTF-8 byte is replaced with '\x7F',
> I don't see '\x7F' in the final code, did you change the approach and not u
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@69
PS6, Line 69: *out = "";
> Not without an adapter. string_view(in, in_len) would work.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sat, 27 Apr 2024 04:37:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10587/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Sat, 27 Apr 2024 04:32:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

The error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte was matching with one of the two bytes that
represented the '\u00FF' literal.

A set containing all the special characters has been included and,
'\xFF', which is an invalid UTF-8 byte is replaced with '\x7F',
which is a control character for DELETE, ensuring consistency and
correctness in URL encoding.

Testing: Some basic tests for both parquet and iceberg are provided
in unicode-column-name.test.
Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
2 files changed, 79 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/10
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/9//COMMIT_MSG@18
PS9, Line 18: '\xFF', which is an invalid UTF-8 byte is replaced with '\x7F',
I don't see '\x7F' in the final code, did you change the approach and not 
update the commit? Daniel's comment suggested we should include it.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@69
PS6, Line 69:   if (in.empty()) {
> Could use a range-based for-loop: for (char ch : in) {...}.
Not without an adapter. string_view(in, in_len) would work.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 26 Apr 2024 21:10:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16035/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 26 Apr 2024 17:13:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

The error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte was matching with one of the two bytes that
represented the '\u00FF' literal.

A set containing all the special characters has been included and,
'\xFF', which is an invalid UTF-8 byte is replaced with '\x7F',
which is a control character for DELETE, ensuring consistency and
correctness in URL encoding.

Testing: Some basic tests for both parquet and iceberg are provided
in unicode-column-name.test.
Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
2 files changed, 79 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/9
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 8:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/16032/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 26 Apr 2024 16:10:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

The error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte was matching with one of the two bytes that
represented the '\u00FF' literal.

A set containing all the special characters has been included and,
'\xFF', which is an invalid UTF-8 byte is replaced with '\x7F',
which is a control character for DELETE, ensuring consistency and
correctness in URL encoding.

Testing: Some basic tests for both parquet and iceberg are provided
in unicode-column-name.test.
Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
2 files changed, 79 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/8
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 8:

(15 comments)

> Uploaded patch set 8: Commit message was updated.

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
> We don't usually break the title into two lines, and it is not too long eit
Done


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@8
PS6, Line 8:
> nit: let's put the title in one line
Done


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@12
PS6, Line 12:  bytes
> Nit: surround with quotes ('specialCharacterMap').
Ack


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@13
PS6, Line 13: 0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
> It's unclear to me how the unicode characters are handled incorrectly. E.g.
Ack


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@18
PS6, Line 18: '\xFF', which is an invalid UTF-8 byte is replaced
> '*out' was assigned to also before this change, so its old contents were di
Ack


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@23
PS6, Line 23: #include 
> I don't think we need iostream.
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@46
PS6, Line 46: ':', '=', '?',
> Could also be static. Alternatively, we could put all these static function
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@55
PS6, Line 55: / cha
> The problem in the old code was that "\u00FF" translates to two bytes: [194
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@66
PS6, Line 66: }
> No need to clear it if it is assigned to at the end of the function.
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@74
PS6, Line 74:
:
> Could be misleading: does "not" also apply to "one of the commonly..."? If
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80
PS6, Line 80:
> This is easier to understand if converted to a form without parentheses:
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80
PS6, Line 80:
> It's an existing issue but I think we should cast 'ch' to unsigned char as
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85
PS6, Line 85:
> nit: I think we can ignore explicitly using "std::" since std::uppercase is
Done


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85
PS6, Line 85:
> We could put std::uppercase and std::hex after L68, before the loop, as the
Done


http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:

http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test@298
PS3, Line 298:
> This comment has not been addressed.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 26 Apr 2024 15:51:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@7
PS6, Line 7: IMPALA-11499: Refactor UrlEncode function to handle special
We don't usually break the title into two lines, and it is not too long either.


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@12
PS6, Line 12: specialCharacterMap
Nit: surround with quotes ('specialCharacterMap').


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@18
PS6, Line 18: Additionally, the function clears the output string
'*out' was assigned to also before this change, so its old contents were 
discarded. This sentence could be removed.


http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@48
PS3, Line 48: {'#', "%23"},
> Should be const.
This comment has not been addressed.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@23
PS6, Line 23: #include 
I don't think we need iostream.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@46
PS6, Line 46: std::unordered_map
Could also be static. Alternatively, we could put all these static functions 
and variables into an anonymous namespace, which is a more modern way of 
ensuring that these are not visible outside this translation unit.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@55
PS6, Line 55: '\xFF
The problem in the old code was that "\u00FF" translates to two bytes: 
[194,191]. Note that the notation '\u' is Java-specific. The process is 
broken because the code takes the input bytes one by one.

The hanzi character '?', in UTF-8, is [232, 191, 144]. Its second character, 
191, matched with the second character of "\u00FF" and that caused the problem.

I don't know why '\xFF' was included in the first place. If we'd like to handle 
UTF-8 only, then '\xFF' is not a valid byte. If we'd like to handle non-UTF-8 
bytes, then we should handle all other bytes > 127.

After some digging in Hive, I found this commit that expanded the list of 
escaped characters: 
https://github.com/apache/hive/commit/32b046bf93e3d041b2bb07a1c9b4e1ef5d977ddf
The list before that commit is very similar to what we have here in the old 
code, except they have '\u007F' instead of '\u00FF'. That makes more sense as 
that is a control character for DELETE, see 
https://www.compart.com/en/unicode/U+007F.

If we remove '\xFF', we can see that the escaped values in the map are the same 
as how we HEX encode values on L85, so there is no need for these values to be 
included in a map. Instead, we could have a set (std::unordered_set) with the 
keys in the current map (without '\xFF' but including '\x7F'), and we would 
encode input characters if they are in the set (taking into account hive_compat 
of course). That would also simplify the code.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@66
PS6, Line 66:   (*out).clear();
No need to clear it if it is assigned to at the end of the function.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@69
PS6, Line 69:   for (int i = 0; i < in_len; ++i) {
Could use a range-based for-loop: for (char ch : in) {...}.


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@74
PS6, Line 74: is not
: // alphanumeric or one of the commonly excluded characters
Could be misleading: does "not" also apply to "one of the commonly..."? If not, 
we could write for example "is not alphanumeric or is one of the commonly 
excluded ...".


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80
PS6, Line 80: !(std::isalnum(ch) || ShouldNotEscape(ch))
This is easier to understand if converted to a form without parentheses:

!std::isalnum(ch) && !ShouldNotEscape(ch)


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85
PS6, Line 85: std::uppercase << std::hex
We could put std::uppercase and std::hex after L68, before the loop, as these 
settings don't get reset automatically, so no need to set them every time. On 
the other hand, I would include a comment stating that this only applies to 
when integer values are inserted and not when char values, therefore it doesn't 
cause a problem when not escaping characters (like on L88).


http://gerrit.cloudera.org:8080/#/c/21131/3/testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
File 
testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test:


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-26 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@8
PS6, Line 8: characters
nit: let's put the title in one line


http://gerrit.cloudera.org:8080/#/c/21131/6//COMMIT_MSG@13
PS6, Line 13: that accurately maps special characters to their URL-encoded 
forms.
It's unclear to me how the unicode characters are handled incorrectly. E.g. the 
one mentioned in the JIRA description is "?" which is encoded into 3 bytes in 
UTF-8: 0xe8 0xbf 0x90. Could you mention in the commit message how this example 
fails before and works now?

FWIW, there are online tools to convert Unicode to UTF-8, e.g. 
https://onlinetools.com/unicode/convert-unicode-to-utf8


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@80
PS6, Line 80: std::isalnum(ch)
It's an existing issue but I think we should cast 'ch' to unsigned char as 
memtioned in https://en.cppreference.com/w/cpp/string/byte/isalnum

> the behavior of std::isalnum is undefined if the argument's value is neither 
> representable as unsigned char nor equal to EOF. To use these functions 
> safely with plain chars (or signed chars), the argument should first be 
> converted to unsigned char:
> std::isalnum(static_cast(ch));


http://gerrit.cloudera.org:8080/#/c/21131/6/be/src/util/coding-util.cc@85
PS6, Line 85: std::
nit: I think we can ignore explicitly using "std::" since std::uppercase is 
introduced at L37.

For std::hex, it's introduced in "common/names.h":
https://github.com/apache/impala/blob/b39cd79ae84c415e0aebec2c2b4d7690d2a0cc7a/be/src/common/names.h#L82

Maybe the same for "std::isalnum".



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Fri, 26 Apr 2024 09:03:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-25 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 25 Apr 2024 15:42:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-24 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16014/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 25 Apr 2024 03:46:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-24 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 6:

(1 comment)

> Uploaded patch set 6.

http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc@117
PS4, Line 117: istringstream is(in.substr(i + 1, 2));
> Not actually done, there's still a newline added here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 25 Apr 2024 03:20:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-24 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special
characters

The previous code suffered from URL encoding errors due to incomplete
character whitelisting and inadequate handling of Unicode characters.
This commit addresses these issues by introducing a specialCharacterMap
that accurately maps special characters to their URL-encoded forms.

The UrlEncode function now utilizes this map to determine whether a
character should be escaped. If present in the map, the character's
encoded form is used; otherwise, it falls back to hexadecimal encoding.
Additionally, the function clears the output string before encoding to
ensure a clean start and efficiently accumulates encoded characters
using stringstream.

These improvements ensure accurate encoding of all characters,
including special and Unicode characters, resolving the errors
encountered during URL encoding.

Testing: Some basic tests are provided in unicode-column-name.test.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
2 files changed, 77 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/6
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc@83
PS4, Line 83:   } else {
> Done
Ack


http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc@117
PS4, Line 117: int value = 0;
> Done
Not actually done, there's still a newline added here.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 24 Apr 2024 18:30:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15981/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 22 Apr 2024 16:59:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-22 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 5:

(3 comments)

> Uploaded patch set 5.

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83
PS3, Line 83:   } else {
> The exception is thrown here in the catalogd:
I've tried adding all the cases in the map now.


http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc@83
PS4, Line 83:   } else {
> Repeated lookup on specialCharacterMap. You could do
Done


http://gerrit.cloudera.org:8080/#/c/21131/4/be/src/util/coding-util.cc@117
PS4, Line 117: int value = 0;
> Avoid unnecessary whitespace change.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 22 Apr 2024 16:33:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-04-22 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special
characters

The previous code suffered from URL encoding errors due to incomplete
character whitelisting and inadequate handling of Unicode characters.
This commit addresses these issues by introducing a specialCharacterMap
that accurately maps special characters to their URL-encoded forms.

The UrlEncode function now utilizes this map to determine whether a
character should be escaped. If present in the map, the character's
encoded form is used; otherwise, it falls back to hexadecimal encoding.
Additionally, the function clears the output string before encoding to
ensure a clean start and efficiently accumulates encoded characters
using stringstream.

These improvements ensure accurate encoding of all characters,
including special and Unicode characters, resolving the errors
encountered during URL encoding.

Testing: Some basic tests are provided in unicode-column-name.test.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
2 files changed, 78 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/5
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-03-19 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83
PS3, Line 83:   if (!hive_compat) {
> I think '$' and other special characters are not supported or is it another
That was previously possible. The following works for me before this change:

  create table foo (i int) partitioned by (s string);
  insert into foo partition (s = "$") values (1), (2);



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 19 Mar 2024 18:52:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-03-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10380/


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 18 Mar 2024 14:59:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-03-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10380/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 18 Mar 2024 10:24:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-03-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 3:

(1 comment)

> Patch Set 3:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/3/be/src/util/coding-util.cc@83
PS3, Line 83:   if (!hive_compat) {
> Are these additional encodings the same that were done before this change?
I think '$' and other special characters are not supported or is it another bug:
ERROR: TableLoadingException: Failed to load metadata for table: 
default.my_part_tbl
CAUSED BY: NullPointerException: Invalid partition name: p=$@%23%^&



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 18 Mar 2024 09:24:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-03-12 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/1/be/src/util/coding-util.cc@84
PS1, Line 84: // Encode forward slashes to prevent misinterpretation in 
paths or URLs
> Yes, that's correct. The UrlEncode function in the provided code is designe
As Daniel said, please add a clearer description of the problem and solution to 
the commit message. What you just said here is a good basis for that.



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Tue, 12 Mar 2024 18:40:19 +
Gerrit-HasComments: Yes


  1   2   >