Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: IMPALA-10319: Support arbitrary encodings on Text files
......................................................................


Patch Set 24: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py
File tests/query_test/test_charcodec.py:

http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py@72
PS21, Line 72: class TestCharCodecGen(ImpalaTestSuite):
> If it's not important to have 'disable_codegen' both True and False of exha
I would be surprised if files sizes would really matter for Impala (maybe for 
the file generation it does matter). The current speed seems acceptable to me, 
but it could be still interesting to see what takes time, probably some slow 
tests could be moved to exhaustive only.


http://gerrit.cloudera.org:8080/#/c/22049/23/tests/query_test/test_charcodec.py
File tests/query_test/test_charcodec.py:

http://gerrit.cloudera.org:8080/#/c/22049/23/tests/query_test/test_charcodec.py@379
PS23, Line 379:         "charcod
> This comes down simply to 'IS_HDFS' check and I thought many tests use this
There are some Impala test environments where Hive is not available (e.g. s3, 
mainly for historical reasons), so tests that use Hive are skipped.


http://gerrit.cloudera.org:8080/#/c/22049/23/tests/query_test/test_charcodec.py@432
PS23, Line 432:     """Write table with Impala and read it back with Hive."""
              :     db = unique_database
              :     enc = vector.get_value('charset')
              :
> Interestingly enough. after adding 'order by name' to both queries the comp
The ordering comes after decoding.
My guess is that the difference comes from different ordering for unicode 
characters in Hive vs Impala. This is actually something interesting to know, 
even if we don't want to "fix" it.


http://gerrit.cloudera.org:8080/#/c/22049/24/tests/query_test/test_charcodec.py
File tests/query_test/test_charcodec.py:

http://gerrit.cloudera.org:8080/#/c/22049/24/tests/query_test/test_charcodec.py@426
PS24, Line 426:
nit: extra whitespace



--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 24
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 29 May 2025 15:12:35 +0000
Gerrit-HasComments: Yes

Reply via email to