Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21131 )

Change subject: IMPALA-11499: Refactor UrlEncode function to handle special 
characters
......................................................................


Patch Set 20:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@98
PS20, Line 98:                 "*/:=?\\{[]";
Please add '^'


http://gerrit.cloudera.org:8080/#/c/21131/20/be/src/util/coding-util-test.cc@104
PS20, Line 104:
nit: remove blank line


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@297
PS20, Line 297:   """Test for special characters in partition values."""
nit: move this inside the method, i.e.

  def test_escaped_partition_values(self, unique_database):
    """Test for special characters in partition values."""


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@299
PS20, Line 299:
nit: use 2 spaces indention


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@300
PS20, Line 300: e
> flake8: E501 line too long (106 > 90 characters)
wrap this to be

    self.execute_query(
        "create table {}(i int) partitioned by (p string) stored as 
parquet".format(tbl))


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@302
PS20, Line 302: 7
> flake8: E501 line too long (108 > 90 characters)
Single quote is missing here. The two single quotes in the middle doesn't 
represent a single quote. They become the end and start of two strings. We can 
double quote the string and wrap it as

    special_characters_ = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
                          
"\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
                          "\x1C\x1D\x1E\x1F\"\x7F'%*/:=?\\{[]#^"

Also add '^' and remove "()"


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@303
PS20, Line 303: _
> flake8: E501 line too long (104 > 90 characters)
wrap this to be

    self.execute_query(
        "insert into {} partition(p='{}') values (0)".format(tbl, 
special_characters_))


http://gerrit.cloudera.org:8080/#/c/21131/20/tests/query_test/test_insert.py@305
PS20, Line 305:       assert a.data[0] == special_characters_
This just verifies the partition value. We still need to verify the partition 
dir created by Impala is correct. This seems better:

  def test_escaped_partition_values(self, unique_database):
    """Test for special characters in partition values."""
    tbl = unique_database + ".tbl"
    self.execute_query(
        "create table {}(i int) partitioned by (p string) stored as 
parquet".format(tbl))
    # Use "\\'" for single quote since the insert statement use single quote on 
the
    # partition value. Use "\\\\" for back slash since it will be escaped again 
in
    # parsing the insert statement
    special_characters = 
"SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
                         
"\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
                         "\x1C\x1D\x1E\x1F\"\x7F\\'%*/:=?\\\\{[]#^"
    part_value = "SpecialCharacters\x01\x02\x03\x04\x05\x06\x07\b\t\n\v\f\r" \
                 "\x0E\x0F\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B" \
                 "\x1C\x1D\x1E\x1F\"\x7F'%*/:=?\{[]#^"
    part_dir = 
"p=SpecialCharacters%01%02%03%04%05%06%07%08%09%0A%0B%0C%0D%0E%0F%10%11" \
               
"%12%13%14%15%16%17%18%19%1A%1B%1C%1D%1E%1F%22%7F%27%25%2A%2F%3A%3D%3F" \
               "%5C%7B%5B%5D%23%5E"
    res = self.execute_query(
        "insert into {} partition(p='{}') values (0)".format(tbl, 
special_characters))
    assert res.data[0] == part_dir + ": 1"
    res = self.client.execute("select p from {}".format(tbl))
    assert res.data[0] == part_value
    res = self.execute_query("show partitions " + tbl)
    # There are "\t" in the partition value so we can't easily split the result 
line.
    # Just verify the values are inside it.
    assert part_value in res.data[0]
    assert part_dir in res.data[0]



--
To view, visit http://gerrit.cloudera.org:8080/21131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88c4aba5d811dfcec809583d0c16fcbc0ca730fb
Gerrit-Change-Number: 21131
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward <pranav.lo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <pranav.lo...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zihao Ye <eyiz...@163.com>
Gerrit-Comment-Date: Tue, 07 May 2024 03:26:22 +0000
Gerrit-HasComments: Yes

Reply via email to