Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 )
Change subject: IMPALA-9482: Support for BINARY columns ...................................................................... Patch Set 14: (11 comments) Ps 15 is rebase + conflict resolution, ps 16 contains fixes for the comments. http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/exec/hbase-table-writer.cc File be/src/exec/hbase-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/exec/hbase-table-writer.cc@161 PS14, Line 161: data_len = encoded_value.length(); > I just realized we support writing to HBase. It worths adding tests for thi Thanks for the comment, it turned out that the writer was buggy as 'base64_encoded_value' went out of scope while 'data' kept a pointer to its buffer. Added a binary column to insertalltypesagg/insertalltypesaggbinary (only used by HBase tests) and added few tests to hbase-inserts.test Also checked manually that Hive can read our data. http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/exec/text-converter.cc File be/src/exec/text-converter.cc: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/exec/text-converter.cc@359 PS14, Line 359: return auxType.string_subtype != AuxColumnType::StringSubtype::BINARY; > Add JIRA for codegen on reading BINARY from text files? I guess we just nee Created a jira. http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/descriptors.h@256 PS14, Line 256: return col_descs_[slot_desc->col_path().back()]; > Not sure if this is correct for nested types. Can we add tests on BINARY in Added tests with binary in struct, binary in array was only tested manually so far. http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/types.cc File be/src/runtime/types.cc: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/runtime/types.cc@141 PS14, Line 141: return TPrimitiveType::BINARY; > Should we return INVALID_TYPE here because of DCHECK(false) so that we catc Agree with returning INVALID_TYPE to improve error handling in release builds. http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/hs2-util.cc File be/src/service/hs2-util.cc: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/hs2-util.cc@872 PS14, Line 872: AuxColumnType aux_type(columnType); > If this is only used in the TYPE_STRING, perhaps it should be initialized t Done http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/hs2-util.cc@950 PS14, Line 950: type_entry.__set_type(thrift::TTypeId::STRING_TYPE); > If DCHECK is false, do we want to set this to STRING_TYPE? Or maybe we sho There is no "invalid type" concept in the HS2 thrift api + this field is required, so I don't see a better idea here (maybe we could return some status in case of error, but that seems a bit too much effort for the unlikely case of not catching this scenario in debug builds. http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: http://gerrit.cloudera.org:8080/#/c/16066/14/be/src/service/impala-beeswax-server.cc@212 PS14, Line 212: return "string"; > How come we don't return TypeToOdbcString here? What do you mean for inlineing? It is also used in some tests, so we can't keep it in this file. About returning TypeToOdbcString - while odbc does support binary type, my guess is that existing beeswax clients don't, so it seemed safer to return "string" http://gerrit.cloudera.org:8080/#/c/16066/14/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/16066/14/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@187 PS14, Line 187: // No built-in function needed for BINARY <-> STRING conversion. > I think the comment should also reflect other datatypes versus binary are n Done http://gerrit.cloudera.org:8080/#/c/16066/14/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@357 PS14, Line 357: boolean isStringBinaryCast = > I may be understanding this wrong... Added a precondition to ensure that if one is binary, the other must be string. http://gerrit.cloudera.org:8080/#/c/16066/14/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java File fe/src/main/java/org/apache/impala/analysis/LikePredicate.java: http://gerrit.cloudera.org:8080/#/c/16066/14/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java@116 PS14, Line 116: private static boolean isLikeableType(Type type) { > No real comment here. But I feel the other types might get upset for not be :D poor other types - tbh I don't remember whether the naming was intentional http://gerrit.cloudera.org:8080/#/c/16066/14/testdata/workloads/functional-query/queries/QueryTest/binary-type.test File testdata/workloads/functional-query/queries/QueryTest/binary-type.test: http://gerrit.cloudera.org:8080/#/c/16066/14/testdata/workloads/functional-query/queries/QueryTest/binary-type.test@68 PS14, Line 68: where binary_col_with_nulls = cast("01/02/09" as binary) > This is identical to the above query. oops, I think I wanted to test > here -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 14 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Steve Carlin <scar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 03 Aug 2022 18:10:42 +0000 Gerrit-HasComments: Yes