Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13722 )
Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1 ...................................................................... Patch Set 9: (27 comments) http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h File be/src/exprs/cast-expr.h: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h@28 PS9, Line 28: // Note, that it should be verified at the callsite that TExprNode.cast_expr is set. nit: Move this comment in front of L31. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@171 PS9, Line 171: if (val.is_null) return StringVal::null(); Add DCHECK(ctx != nullptr); http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@178 PS9, Line 178: lexical_cast<string>(tv) It would be better to use ToString() directly rather than involving the indirection of lexical_cast and operator<<. (just like it's done in L199). http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@191 PS9, Line 191: if (val.is_null) return StringVal::null(); Add DCHECK(ctx != nullptr); http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@347 PS9, Line 347: if (val.is_null) return DateVal::null(); Add DCHECK(ctx != nullptr); http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h File be/src/runtime/datetime-iso-sql-format-parser.h: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@28 PS9, Line 28: // nit: in most other header files comments are prefixed with /// http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@46 PS9, Line 46: index position http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@47 PS9, Line 47: token group It hasn't been properly defined what "token group" means. Whst's the difference between a token and a token group? http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@48 PS9, Line 48: GetNextTokenGroupFromInput Naming is a bit confusing as it is used to find the end of the current token (pointed to by input_str). Again, instead of "TokenGroup", you might want to just use "Token" in the wording. http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@116 PS3, Line 116: // Note the addition > My assumption was that it is initialized to zero when the 'result' is creat Maybe add some DCHECKs at the beginning of the function then? http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc File be/src/runtime/datetime-iso-sql-format-parser.cc: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc@47 PS9, Line 47: if (!IsoSqlFormatTokenizer::IsSeparator(*current_pos)) return false; : // Advance to the end of the separator sequence. : ++current_pos; : while (current_pos < end_pos && IsoSqlFormatTokenizer::IsSeparator(*current_pos)) { : ++current_pos; : } : // Advance to the end of the separator sequence in the expected tokens list. : ++i; : while (i < dt_ctx.toks.size() && dt_ctx.toks[i].type == SEPARATOR) ++i; : : // If we reached the end of input or the end of token sequence, we can return. : if (current_pos >= end_pos || i >= dt_ctx.toks.size()) { : return (current_pos >= end_pos && i >= dt_ctx.toks.size()); : } : : // Next token, following the separator sequence. : tok = &dt_ctx.toks[i]; : : // The last '-' of a separator sequence might be taken as a sign for timezone hour. : if (*(current_pos - 1) == '-' && tok->type == TIMEZONE_HOUR) { : --current_pos; : } Please consider moving this block to a separate function. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h File be/src/runtime/datetime-iso-sql-format-tokenizer.h: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@73 PS9, Line 73: judge nit: decide http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@75 PS9, Line 75: token group Again, the wording here and below is a bit confusing. Shouldn't it just be "token"? http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91 PS9, Line 91: observed nit: found http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc File be/src/runtime/datetime-iso-sql-format-tokenizer.cc: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110 PS9, Line 110: unsigned I think it would be safer to just use 'int' here. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110 PS9, Line 110: (long)MAX_TOKEN_SIZE Is explicit cast to long necessary here? http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h File be/src/runtime/datetime-parser-common.h: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h@204 PS9, Line 204: bool ParseAndValidate(const char* token_group, int token_len, int min, int max, : int* result); : : bool ParseFractionTokenGroup(const char* token_group, int token_len, : DateTimeParseResult* result); Add WARN_UNUSED_RESULT to these declarations. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc File be/src/runtime/datetime-parser-common.cc: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc@64 PS9, Line 64: const StringValue& fmt = StringValue::FromStringVal(format); nit: move this after L112. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@48 PS9, Line 48: static bool ParseSimpleDateFormat(const char* str, int len, boost::gregorian::date* d, : boost::posix_time::time_duration* t); Add WARN_UNUSED_RESULT here and L60, L69. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@65 PS9, Line 65: run when a user provided nit: is used when a user specifies a datetime format string http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@106 PS9, Line 106: static int AdjustWithTimezone(boost::posix_time::time_duration* t, : const boost::posix_time::time_duration& tz_offset); Please add a comment explaining the return value. http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@174 PS9, Line 174: return false; Any reason the call to IndicateTimestampParseFailure() was removed? http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@263 PS9, Line 263: str_val = indicator.c_str(); This is potentially dangerous: 'indicator' object is destructed after leaving the current L256-L265 block. http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup@2938 PS9, Line 2938: STRING_LITERA My understanding is that to_timestamp() and from_timestamp() accept format strings specified as STRING/CHAR/VARCHAR columns of a table. The FORMAT clause accepts string literals only? Is this limitation intentional or an oversight? If it is intentional, please comment on it in the commit message. http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@227 PS9, Line 227: his is not an intermediate : // step between a char and a datetime. this is a 'datetime to string' or 'string to datetime' cast. http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@229 PS9, Line 229: (type_.getPrimitiveType() != PrimitiveType.CHAR || : children_.get(0).getType().getPrimitiveType() != PrimitiveType.STRING) Consider rewriting this condition to test directly if we have datetime->string or string->datetime cast. That might make the code easier to understand. http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test File testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test: http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109 PS9, Line 109: ==== Would it be possible to specify the format string as a string/char/varchar column of table? Please add some tests to test these scenarios. -- To view, visit http://gerrit.cloudera.org:8080/13722 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23 Gerrit-Change-Number: 13722 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Tue, 16 Jul 2019 11:07:51 +0000 Gerrit-HasComments: Yes