Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/22049 )
Change subject: IMPALA-10319: Support arbitrary encodings on Text files ...................................................................... Patch Set 24: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py File tests/query_test/test_charcodec.py: http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py@72 PS21, Line 72: class TestCharCodecGen(ImpalaTestSuite): > If it's not important to have 'disable_codegen' both True and False of exha I would be surprised if files sizes would really matter for Impala (maybe for the file generation it does matter). The current speed seems acceptable to me, but it could be still interesting to see what takes time, probably some slow tests could be moved to exhaustive only. http://gerrit.cloudera.org:8080/#/c/22049/23/tests/query_test/test_charcodec.py File tests/query_test/test_charcodec.py: http://gerrit.cloudera.org:8080/#/c/22049/23/tests/query_test/test_charcodec.py@379 PS23, Line 379: "charcod > This comes down simply to 'IS_HDFS' check and I thought many tests use this There are some Impala test environments where Hive is not available (e.g. s3, mainly for historical reasons), so tests that use Hive are skipped. http://gerrit.cloudera.org:8080/#/c/22049/23/tests/query_test/test_charcodec.py@432 PS23, Line 432: """Write table with Impala and read it back with Hive.""" : db = unique_database : enc = vector.get_value('charset') : > Interestingly enough. after adding 'order by name' to both queries the comp The ordering comes after decoding. My guess is that the difference comes from different ordering for unicode characters in Hive vs Impala. This is actually something interesting to know, even if we don't want to "fix" it. http://gerrit.cloudera.org:8080/#/c/22049/24/tests/query_test/test_charcodec.py File tests/query_test/test_charcodec.py: http://gerrit.cloudera.org:8080/#/c/22049/24/tests/query_test/test_charcodec.py@426 PS24, Line 426: nit: extra whitespace -- To view, visit http://gerrit.cloudera.org:8080/22049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65 Gerrit-Change-Number: 22049 Gerrit-PatchSet: 24 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 29 May 2025 15:12:35 +0000 Gerrit-HasComments: Yes
