[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Attila Jeges (Code Review) Tue, 16 Jul 2019 04:08:45 -0700

Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13722 )


Change subject: IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1
......................................................................


Patch Set 9:

(27 comments)

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h
File be/src/exprs/cast-expr.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-expr.h@28
PS9, Line 28: // Note, that it should be verified at the callsite that 
TExprNode.cast_expr is set.
nit: Move this comment in front of L31.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@171
PS9, Line 171:   if (val.is_null) return StringVal::null();
Add DCHECK(ctx != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@178
PS9, Line 178: lexical_cast<string>(tv)
It would be better to use ToString() directly rather than involving the 
indirection of lexical_cast and operator<<. (just like it's done in L199).


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@191
PS9, Line 191:   if (val.is_null) return StringVal::null();
Add DCHECK(ctx != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/exprs/cast-functions-ir.cc@347
PS9, Line 347:   if (val.is_null) return DateVal::null();
Add DCHECK(ctx != nullptr);


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h
File be/src/runtime/datetime-iso-sql-format-parser.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@28
PS9, Line 28: //
nit: in most other header files comments are prefixed with ///


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@46
PS9, Line 46: index
position


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@47
PS9, Line 47: token group
It hasn't been properly defined what "token group" means. Whst's the difference 
between a token and a token group?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.h@48
PS9, Line 48: GetNextTokenGroupFromInput
Naming is a bit confusing as it is used to find the end of the current token 
(pointed to by input_str).

Again, instead of "TokenGroup", you might want to just use "Token" in the 
wording.


http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/3/be/src/runtime/datetime-iso-sql-format-parser.cc@116
PS3, Line 116: // Note the addition
> My assumption was that it is initialized to zero when the 'result' is creat
Maybe add some DCHECKs at the beginning of the function then?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc
File be/src/runtime/datetime-iso-sql-format-parser.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-parser.cc@47
PS9, Line 47:      if (!IsoSqlFormatTokenizer::IsSeparator(*current_pos)) 
return false;
            :       // Advance to the end of the separator sequence.
            :       ++current_pos;
            :       while (current_pos < end_pos && 
IsoSqlFormatTokenizer::IsSeparator(*current_pos)) {
            :         ++current_pos;
            :       }
            :       // Advance to the end of the separator sequence in the 
expected tokens list.
            :       ++i;
            :       while (i < dt_ctx.toks.size() && dt_ctx.toks[i].type == 
SEPARATOR) ++i;
            :
            :       // If we reached the end of input or the end of token 
sequence, we can return.
            :       if (current_pos >= end_pos || i >= dt_ctx.toks.size()) {
            :         return (current_pos >= end_pos && i >= 
dt_ctx.toks.size());
            :       }
            :
            :       // Next token, following the separator sequence.
            :       tok = &dt_ctx.toks[i];
            :
            :       // The last '-' of a separator sequence might be taken as a 
sign for timezone hour.
            :       if (*(current_pos - 1) == '-' && tok->type == 
TIMEZONE_HOUR) {
            :         --current_pos;
            :       }
Please consider moving this block to a separate function.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h
File be/src/runtime/datetime-iso-sql-format-tokenizer.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@73
PS9, Line 73: judge
nit: decide


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@75
PS9, Line 75: token group
Again, the wording here and below is a bit confusing. Shouldn't it just be 
"token"?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.h@91
PS9, Line 91: observed
nit: found


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc
File be/src/runtime/datetime-iso-sql-format-tokenizer.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS9, Line 110: unsigned
I think it would be safer to just use 'int' here.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-iso-sql-format-tokenizer.cc@110
PS9, Line 110: (long)MAX_TOKEN_SIZE
Is explicit cast to long necessary here?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h
File be/src/runtime/datetime-parser-common.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.h@204
PS9, Line 204: bool ParseAndValidate(const char* token_group, int token_len, 
int min, int max,
             :     int* result);
             :
             : bool ParseFractionTokenGroup(const char* token_group, int 
token_len,
             :     DateTimeParseResult* result);
Add WARN_UNUSED_RESULT to these declarations.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc
File be/src/runtime/datetime-parser-common.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/datetime-parser-common.cc@64
PS9, Line 64: const StringValue& fmt = StringValue::FromStringVal(format);
nit: move this after L112.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h
File be/src/runtime/timestamp-parse-util.h:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@48
PS9, Line 48: static bool ParseSimpleDateFormat(const char* str, int len, 
boost::gregorian::date* d,
            :       boost::posix_time::time_duration* t);
Add WARN_UNUSED_RESULT here and L60, L69.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@65
PS9, Line 65: run when a user provided
nit: is used when a user specifies a datetime format string


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.h@106
PS9, Line 106: static int AdjustWithTimezone(boost::posix_time::time_duration* 
t,
             :       const boost::posix_time::time_duration& tz_offset);
Please add a comment explaining the return value.


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@174
PS9, Line 174: return false;
Any reason the call to IndicateTimestampParseFailure() was removed?


http://gerrit.cloudera.org:8080/#/c/13722/9/be/src/runtime/timestamp-parse-util.cc@263
PS9, Line 263: str_val = indicator.c_str();
This is potentially dangerous: 'indicator' object is destructed after leaving 
the current L256-L265 block.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/cup/sql-parser.cup@2938
PS9, Line 2938: STRING_LITERA
My understanding is that to_timestamp() and from_timestamp() accept format 
strings specified as STRING/CHAR/VARCHAR columns of a table.

The FORMAT clause accepts string literals only? Is this limitation intentional 
or an oversight? If it is intentional, please comment on it in the commit 
message.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@227
PS9, Line 227: his is not an intermediate
             :     // step between a char and a datetime.
this is a 'datetime to string' or 'string to datetime' cast.


http://gerrit.cloudera.org:8080/#/c/13722/9/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@229
PS9, Line 229: (type_.getPrimitiveType() != PrimitiveType.CHAR ||
             :             children_.get(0).getType().getPrimitiveType() != 
PrimitiveType.STRING)
Consider rewriting this condition to test directly if we have datetime->string 
or string->datetime cast. That might make the code easier to understand.


http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test
File 
testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test:

http://gerrit.cloudera.org:8080/#/c/13722/9/testdata/workloads/functional-query/queries/QueryTest/cast_format_from_table.test@109
PS9, Line 109: ====
Would it be possible to specify the format string as a string/char/varchar 
column of table?

Please add some tests to test these scenarios.



--
To view, visit http://gerrit.cloudera.org:8080/13722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I19d8d097a45ae6f103b6cd1b2d81aad38dfd9e23
Gerrit-Change-Number: 13722
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Jul 2019 11:07:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8703: ISO:SQL:2016 datetime patterns - Milestone 1

Reply via email to