[Impala-ASF-CR] IMPALA-13020: Change thrift rpc max message size to int64 t

2024-05-05 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21367 )

Change subject: IMPALA-13020: Change thrift_rpc_max_message_size to int64_t
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21367/4/be/src/rpc/thrift-util.cc
File be/src/rpc/thrift-util.cc:

http://gerrit.cloudera.org:8080/#/c/21367/4/be/src/rpc/thrift-util.cc@61
PS4, Line 61: DEFINE_int64(thrift_rpc_max_message_size, 4L * 1024 * 1024 * 1024,
Can we set it to unlimited or a larger value? We have customers using 
impala-3.4 and their catalog-topic size reaches 4GB. Once they upgrade to 
Impala-4.x, the catalog-topic size will grow larger since impala-3.4 has better 
compression rate on catalog objects. The reason is in Impala-3.4, partition 
objects are sent inside the table object so more duplicated bytes can be saved. 
In Impala-4.x, partition objects are compressed and sent individually for 
incremental updates (IMPALA-3127).



--
To view, visit http://gerrit.cloudera.org:8080/21367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I681b1849cc565dcb25de8c070c18776ce69cbb87
Gerrit-Change-Number: 21367
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 May 2024 06:06:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16095/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 04:54:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-05 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..


Patch Set 18:

(2 comments)

> Uploaded patch set 18.

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc
File be/src/util/coding-util.cc:

http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66: !isalnum(static_cast(ch)) && 
!ShouldNotEscape(ch))) {
> I played with it a bit and it doesn't work with for example '\t' and '\n' (
Done.
Note: I tested those cases manually but to include those tests we'll have to 
bring a change in the test framework, which in its current state prevents us to 
write such tests. For example, '\t ' is considered a separator in our current 
test framework, so to incorporate such tests we might have to change the test 
framework which might lead to other errors, That's why I haven't included such 
cases in the current tests.


http://gerrit.cloudera.org:8080/#/c/21131/15/be/src/util/coding-util.cc@66
PS15, Line 66:  c
> I think it would be good if we first cast it to unsigned char because of wh
Done



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Mon, 06 May 2024 04:37:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11499: Refactor UrlEncode function to handle special characters

2024-05-05 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
..

IMPALA-11499: Refactor UrlEncode function to handle special characters

An error came from an issue with URL encoding, where certain Unicode
characters were being incorrectly encoded due to their UTF-8
representation matching characters in the set of characters to escape.
For example, the string '运', which consists of three bytes
0xe8 0xbf 0x90 was wrongly getting encoded into '\E8%FFBF\90',
because the middle byte matched one of the two bytes that
represented the "\u00FF" literal. Inclusion of "\u00FF" was likely
a mistake from the beginning and it should have been '\x7F'.

The patch makes three key changes:
1. Before the change, the set of characters that need to be escaped
was stored as a string. The current patch uses an unordered_set
instead.

2. '\xFF', which is an invalid UTF-8 byte and whose inclusion was
erroneous from the beginning, is replaced with '\x7F', which is a
control character for DELETE, ensuring consistency and correctness in
URL encoding.

3. The list of characters to be escaped is extended to match the
current list in Hive.

Testing: Tests on both traditional Hive tables and Iceberg tables
are included in unicode-column-name.test, insert.test and
coding-util-test.cc.

Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
---
M be/src/util/coding-util-test.cc
M be/src/util/coding-util.cc
M testdata/workloads/functional-query/queries/QueryTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/unicode-column-name.test
4 files changed, 177 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/21131/18
--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Zihao Ye