Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/22289 )
Change subject: IMPALA-12927: Support specifying format for reading JSON BINARY columns ...................................................................... Patch Set 1: (4 comments) Thanks for the code review! These suggestions are very helpful. http://gerrit.cloudera.org:8080/#/c/22289/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22289/1//COMMIT_MSG@22 PS1, Line 22: perty unset or : incorrectly set, and will provide an error message. > It could be useful to add a flag that acts as a default for this property i Good idea, I added a query option to serve as this flag. http://gerrit.cloudera.org:8080/#/c/22289/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/22289/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@572 PS1, Line 572: slotDesc.setType(Type.STRING); > This doesn't look correct to me because even if it works for JSON, theoreti Done, you are right, we indeed need to consider the case where a table can have different file formats. Now this information will be passed to the backend as a query option and will only take effect in the HdfsJsonScanner. Ultimately, the decision to base64 decode will be made in the TextConverter. http://gerrit.cloudera.org:8080/#/c/22289/1/testdata/workloads/functional-query/queries/QueryTest/json-binary-format.test File testdata/workloads/functional-query/queries/QueryTest/json-binary-format.test: http://gerrit.cloudera.org:8080/#/c/22289/1/testdata/workloads/functional-query/queries/QueryTest/json-binary-format.test@3 PS1, Line 3: lter table binary_tbl unset tblproperties > This modifies a shared table and can leave it in a dirty state if the test Done, we can clone a temporary table for testing. http://gerrit.cloudera.org:8080/#/c/22289/1/testdata/workloads/functional-query/queries/QueryTest/json-binary-format.test@12 PS1, Line 12: No valid table properties 'json.binary.format' (base64 or rawstring) provided for scanning binary column of json table '$DATABASE.binary_tbl' > Can you add a test for the case when there is no valid json.binary.format, Done -- To view, visit http://gerrit.cloudera.org:8080/22289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idf61fa3afc0f33caa63fbc05393e975733165e82 Gerrit-Change-Number: 22289 Gerrit-PatchSet: 1 Gerrit-Owner: Zihao Ye <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zihao Ye <[email protected]> Gerrit-Comment-Date: Thu, 16 Jan 2025 08:12:15 +0000 Gerrit-HasComments: Yes
