Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
......................................................................


Patch Set 4:

(11 comments)

Thanks everyone for taking a look and leaving comments!

Paul, those are excellent questions about compatibility! Let me move that 
conversation back to the Jira so that we get Greg's opinion as well.

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/CMakeLists.txt
File be/src/exprs/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/CMakeLists.txt@33
PS3, Line 33:  cast-functi
> Is adding header files necessary?
Done


http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc
File be/src/exprs/cast-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@167
PS3, Line 167: const
> const string
Done


http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@172
PS3, Line 172:     DateTimeFormatContext format_ctx(cast_format.c_str(), 
cast_format.length());
             :     if (!ParseFormatTokens(&format_ctx)) return 
StringVal::null();
> I am concerned about the performance impact of doing this for every row - T
Excellent point!
I'll dig a bit into this. I'm sure the parsing can be done before we reach the 
cast functions. What I don't know from the top of my head is that how can I 
return an error or null in case of a parse error. (StringVal::null() vs 
TimestampVal::null())
I'll be back with the outcome, thx!


http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@283
PS3, Line 283: const
> const string
Done


http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc
File be/src/exprs/scalar-expr-evaluator.cc:

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc@129
PS3, Line 129:   string cast_format = root_.GetCastFormat();
> I am not too familiar with ScalarExprEvaluator, but this functions seems st
That's again a good catch. The above scenarios are broken now. Let me sort this 
out and get back.


http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr.h@162
PS3, Line 162: std::s
> In headers use fully qualified names: std::string
Done


http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@220
PS2, Line 220:     return Objects.toStringHelper(this)
> Probably doable. As you noted, we load functions on startup. Cast functions
Yeah, I think this is kind of what I wrote above :) But it won't be enough to 
change CastExpr.initBuiltins() to make this work because the Expr tree has to 
also be extended with an additional child node. This is needed because the 
parameters of the cast functions on the BE are gathered from there. And then we 
end up with adding "if this is a string vs timestamp cast than do this 
otherwise do that" multiple locations of the code that I don't find future 
proof.

In addition Csaba had a good point: It's not the best idea to do the format 
parsing in the cast functions because that would be run for each row. So if we 
want to make this more efficient than we can't pass the format as it is to the 
cast functions as parsing should happen before we reach them.


http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@88
PS3, Line 88:     this(targetTypeDef, e, null);
            :   }
            :
            :   public CastExpr(TypeDef targetTypeDef, Expr e, String format) {
            :     Preconditions.checkNotNull(targetTypeDef);
            :     Preconditions.che
> Can you call here 'this(targetTypeDef, e, null)' instead?
Done


http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@228
PS3, Line 228:   public boolean isImplicit() { return isImplicit_; }
             :
> I think these two lines whould be switched
Done


http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java:

http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2926
PS3, Line 2926:  public void TestCastFormatClauseFromString() throws 
AnalysisException {
              :     AnalysisError("select cast('05-01-2017' AS DATE FORMAT 
'MM-dd-yyyy')",
              :         "Unsupported data type: DATE");
              :     AnalysisError("select cast('05-01-2017' AS DATETIME FORMAT 
'MM-dd-yyyy')",
              :         "Unsupported data type: DATETIME");
              :     AnalysisError("select cast('05-01-2017' AS INT FORMAT 
'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from STRING to INT");
              :     AnalysisError("select cast('05-01-2017' AS STRING FORMAT 
'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from STRING to 
STRING");
              :     AnalysisError("select cast('05-01-2017' AS BOOLEAN FORMAT 
'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from STRING to 
BOOLEAN");
              :     AnalysisError("select cast('05-01-2017' AS DOUBLE FORMAT 
'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from STRING to 
DOUBLE");
              :     AnalysisError("select cast('05-01-2017' AS TIMESTAMP FORMAT 
'')",
              :         "FORMAT clause can't be empty");
              :     AnalyzesOk("select cast('05-01-2017' AS TIMESTAMP FORMAT 
'MM-dd-yyyy')");
              :   }
              :
              :   @Test
              :   public void TestCastFormatClauseFromTimestamp() throws 
AnalysisException {
              :     String to_timestamp_cast = "cast('05-01-2017' as 
timestamp)";
              :     AnalysisError("select cast("+to_timestamp_cast+" as DATE 
FORMAT 'MM-dd-yyyy')",
              :         "Unsupported data type: DATE");
              :     AnalysisError("select cast("+to_timestamp_cast+" as 
DATETIME FORMAT 'MM-dd-yyyy')",
              :         "Unsupported data type: DATETIME");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS INT 
FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to 
INT");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS BOOLEAN 
FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to 
BOOLEAN");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS DOUBLE 
FORMAT 'MM-dd-yyyy')",
              :         "FORMAT clause is not applicable from TIMESTAMP to 
DOUBLE");
              :     AnalysisError("select cast("+to_timestamp_cast+" AS STRING 
FORMAT '')",
              :         "FORMAT clause can't be empty");
              :     AnalyzesOk("select cast("+to_timestamp_cast+" AS STRING 
FORMAT 'MM-dd-yyyy')");
              :     AnalyzesOk("select cast("+to_timestamp_cast+" AS VARCHAR 
FORMAT 'MM-dd-yyyy')");
              :     AnalyzesOk("select cast("+to_timestamp_cast+" AS CHAR(10) 
FORMAT 'MM-dd-yyyy')");
              :   }
> Maybe you should add tests for short format strings, e.g.: "yyyy-M-d"
CastExpr.analyze() doesn't check the content of the format parameter so no need 
to cover scenarios from that aspect in analysis tests.


http://gerrit.cloudera.org:8080/#/c/12267/3/testdata/workloads/functional-query/queries/QueryTest/cast_format.test
File testdata/workloads/functional-query/queries/QueryTest/cast_format.test:

http://gerrit.cloudera.org:8080/#/c/12267/3/testdata/workloads/functional-query/queries/QueryTest/cast_format.test@62
PS3, Line 62: yyyy/dd/MM HH:mm:ss
> Again, add some tests using short format strings, e.g. yyyy-M-d
Hmm. As the underlying code is actually a full re-use of the existing pattern 
matching from to_timestamp() and from_timestamp() I feel that testing the 
pattern matching itself is not needed here.

Another note: It seem that CAST(..FORMAT..) is expected to use SQL patterns 
instead of the implemented Java patterns from the beginning. So I wouldn't 
focus that much right now on the exact patterns.



--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Feb 2019 14:37:42 +0000
Gerrit-HasComments: Yes

Reply via email to