Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: IMPALA-10319: Support arbitrary encodings on Text/Sequence files
......................................................................


Patch Set 21:

(19 comments)

It looks good. I left mostly minor comments.

http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-scanner.h@326
PS21, Line 326: decocoded
typo: decoded


http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/exec/hdfs-text-table-writer.cc@56
PS21, Line 56: HdfsTextTableWriter::~HdfsTextTableWriter() {
nit: can fit in one line


http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc
File be/src/util/char-codec.cc:

http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@61
PS21, Line 61: scopedMemTracker
nit: could be scoped_mem_tracker


http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@103
PS21, Line 103: prefix
Is it possible to use partial_symbol to build up the prefix? That way, we could 
avoid an additional vector and copying.


http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@113
PS21, Line 113: (char*)
nit: reinterpret_cast instead of c-style casts


http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@159
PS21, Line 159: (
nit: reinterpret_cast instead of c-style casts


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@289
PS21, Line 289: !=
It should be .equals()


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@294
PS21, Line 294: SERIALIZATION.ENCODING
nit: could be lowercase with apostrophe: 'serialization.encoding'.


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@295
PS21, Line 295: %1$s
nit: could be simplified as %s


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@303
PS21, Line 303: %1$s / %2$s
nit: could be simplified as %s / %s


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@308
PS21, Line 308: FeFsTable
nit: unnecessary cast


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@314
PS21, Line 314: %1$s / %2$s
nit: could be simplified as %s / %s


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@320
PS21, Line 320: Unsupported encoding: " + encoding + "."
nit: it could use a format-string similarly to the previous exception messages.


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@318
PS21, Line 318: String encoding = 
tblProperties.get(serdeConstants.SERIALIZATION_ENCODING);
              :     if (!Charset.isSupported(encoding)) {
              :       throw new AnalysisException("Unsupported encoding: " + 
encoding + ".");
              :     }
nit: it could be moved after the first property check at L291


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@331
PS21, Line 331: %1$s
nit: could be simplified as %s


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@332
PS21, Line 332: Please refer to Jira ticket IMPALA-10319 for more info.
nit: I'd keep it simpler by skipping "Jira ticket": Please refer to 
IMPALA-10319 for more info. See L350, and the similar error message in 
CreateTableStmt


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@337
PS21, Line 337: FeFsTable
nit: unneccessary cast


http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/22049/21/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@21
PS21, Line 21: import java.nio.charset.StandardCharsets;
nit: not used


http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py
File tests/query_test/test_charcodec.py:

http://gerrit.cloudera.org:8080/#/c/22049/21/tests/query_test/test_charcodec.py@87
PS21, Line 87:       data_file_path = os.path.join(os.environ['IMPALA_HOME'], 
"testdata",
It could use tempfile instead, no extra cleanup needed with it



--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 21
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 08 May 2025 12:53:15 +0000
Gerrit-HasComments: Yes

Reply via email to