Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/22049 )
Change subject: IMPALA-10319: Support arbitrary encodings on Text/Sequence files ...................................................................... Patch Set 21: (19 comments) It looks good. I left mostly minor comments. http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-scanner.h@326 PS21, Line 326: decocoded typo: decoded http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-text-table-writer.cc File be/src/exec/hdfs-text-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-text-table-writer.cc@56 PS21, Line 56: HdfsTextTableWriter::~HdfsTextTableWriter() { nit: can fit in one line http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc File be/src/util/char-codec.cc: http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@61 PS21, Line 61: scopedMemTracker nit: could be scoped_mem_tracker http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@103 PS21, Line 103: prefix Is it possible to use partial_symbol to build up the prefix? That way, we could avoid an additional vector and copying. http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@113 PS21, Line 113: (char*) nit: reinterpret_cast instead of c-style casts http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@159 PS21, Line 159: ( nit: reinterpret_cast instead of c-style casts http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@289 PS21, Line 289: != It should be .equals() http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@294 PS21, Line 294: SERIALIZATION.ENCODING nit: could be lowercase with apostrophe: 'serialization.encoding'. http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@295 PS21, Line 295: %1$s nit: could be simplified as %s http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@303 PS21, Line 303: %1$s / %2$s nit: could be simplified as %s / %s http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@308 PS21, Line 308: FeFsTable nit: unnecessary cast http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@314 PS21, Line 314: %1$s / %2$s nit: could be simplified as %s / %s http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@320 PS21, Line 320: Unsupported encoding: " + encoding + "." nit: it could use a format-string similarly to the previous exception messages. http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@318 PS21, Line 318: String encoding = tblProperties.get(serdeConstants.SERIALIZATION_ENCODING); : if (!Charset.isSupported(encoding)) { : throw new AnalysisException("Unsupported encoding: " + encoding + "."); : } nit: it could be moved after the first property check at L291 http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@331 PS21, Line 331: %1$s nit: could be simplified as %s http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@332 PS21, Line 332: Please refer to Jira ticket IMPALA-10319 for more info. nit: I'd keep it simpler by skipping "Jira ticket": Please refer to IMPALA-10319 for more info. See L350, and the similar error message in CreateTableStmt http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@337 PS21, Line 337: FeFsTable nit: unneccessary cast http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@21 PS21, Line 21: import java.nio.charset.StandardCharsets; nit: not used http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py File tests/query_test/test_charcodec.py: http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py@87 PS21, Line 87: data_file_path = os.path.join(os.environ['IMPALA_HOME'], "testdata", It could use tempfile instead, no extra cleanup needed with it -- To view, visit http://gerrit.cloudera.org:8080/22049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65 Gerrit-Change-Number: 22049 Gerrit-PatchSet: 21 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 08 May 2025 12:53:15 +0000 Gerrit-HasComments: Yes
