Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/15866 )
Change subject: IMPALA-9531: Dropped support for dateless timestamps ...................................................................... Patch Set 4: (8 comments) Nice work! I feel that this patch is getting into shape. I have some follow-up comments but none of the seems super difficult to address. http://gerrit.cloudera.org:8080/#/c/15866/3/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/15866/3/be/src/exprs/timestamp-functions-ir.cc@167 PS3, Line 167: dt_ctx->Reset(reinterpret_cast<co Did you find out what this check is for? Don't you break anything with simply getting rig of this check? http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions-ir.cc@168 PS4, Line 168: if (!SimpleDateFormatTokenizer::Tokenize(dt_ctx, true, PARSE)) { Anyway, if you tokenize the format here then you do the tokenization for each row of the table even though it would be enough to do once before Impala starts reading rows. http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions.h File be/src/exprs/timestamp-functions.h: http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions.h@104 PS4, Line 104: bool format Please use datetime_parse_util::CastDirection http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions.cc File be/src/exprs/timestamp-functions.cc: http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions.cc@155 PS4, Line 155: bool format = true Why don't you use the parameter type that could be directly passed to Tokenize()? http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/exprs/timestamp-functions.cc@167 PS4, Line 167: if (format) : parse_result = SimpleDateFormatTokenizer::Tokenize(dt_ctx, true, FORMAT); : else : parse_result = SimpleDateFormatTokenizer::Tokenize(dt_ctx, true, PARSE); Please check how to format if-else in Impala http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/runtime/datetime-simple-date-format-parser.h File be/src/runtime/datetime-simple-date-format-parser.h: http://gerrit.cloudera.org:8080/#/c/15866/4/be/src/runtime/datetime-simple-date-format-parser.h@88 PS4, Line 88: CastDirection cast_mode = FORMAT I wouldn't give this a default value unless you have any specific reason making it this way. http://gerrit.cloudera.org:8080/#/c/15866/4/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/15866/4/testdata/data/README@503 PS4, Line 503: Timestamp_parquet_test The file name is not that verbose. After reading it it doesn't give any hints what timestamp tests it is used for. "dateless_timestamps.parq" would be better for this purpose. nit: No need to include "parquet" into the file name as the extension already expresses the format. nit: The file name itself starts with a lowercase letter. http://gerrit.cloudera.org:8080/#/c/15866/4/testdata/data/timestamp_text_test.txt File testdata/data/timestamp_text_test.txt: http://gerrit.cloudera.org:8080/#/c/15866/4/testdata/data/timestamp_text_test.txt@1 PS4, Line 1: 1996-04-22 10:00:00.432100000 Same comment for the file name as for the parquet file: "dateless_timestamps.txt" would be more verbose. -- To view, visit http://gerrit.cloudera.org:8080/15866 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I48c49bf027cc4b917849b3d58518facba372b322 Gerrit-Change-Number: 15866 Gerrit-PatchSet: 4 Gerrit-Owner: Adam Tamas <ta...@cloudera.com> Gerrit-Reviewer: Adam Tamas <ta...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Tue, 16 Jun 2020 15:12:25 +0000 Gerrit-HasComments: Yes