Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 )
Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() ...................................................................... Patch Set 4: (11 comments) Thanks everyone for taking a look and leaving comments! Paul, those are excellent questions about compatibility! Let me move that conversation back to the Jira so that we get Greg's opinion as well. http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/CMakeLists.txt File be/src/exprs/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/CMakeLists.txt@33 PS3, Line 33: cast-functi > Is adding header files necessary? Done http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc File be/src/exprs/cast-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@167 PS3, Line 167: const > const string Done http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@172 PS3, Line 172: DateTimeFormatContext format_ctx(cast_format.c_str(), cast_format.length()); : if (!ParseFormatTokens(&format_ctx)) return StringVal::null(); > I am concerned about the performance impact of doing this for every row - T Excellent point! I'll dig a bit into this. I'm sure the parsing can be done before we reach the cast functions. What I don't know from the top of my head is that how can I return an error or null in case of a parse error. (StringVal::null() vs TimestampVal::null()) I'll be back with the outcome, thx! http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/cast-functions-ir.cc@283 PS3, Line 283: const > const string Done http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr-evaluator.cc@129 PS3, Line 129: string cast_format = root_.GetCastFormat(); > I am not too familiar with ScalarExprEvaluator, but this functions seems st That's again a good catch. The above scenarios are broken now. Let me sort this out and get back. http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr.h File be/src/exprs/scalar-expr.h: http://gerrit.cloudera.org:8080/#/c/12267/3/be/src/exprs/scalar-expr.h@162 PS3, Line 162: std::s > In headers use fully qualified names: std::string Done http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/12267/2/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@220 PS2, Line 220: return Objects.toStringHelper(this) > Probably doable. As you noted, we load functions on startup. Cast functions Yeah, I think this is kind of what I wrote above :) But it won't be enough to change CastExpr.initBuiltins() to make this work because the Expr tree has to also be extended with an additional child node. This is needed because the parameters of the cast functions on the BE are gathered from there. And then we end up with adding "if this is a string vs timestamp cast than do this otherwise do that" multiple locations of the code that I don't find future proof. In addition Csaba had a good point: It's not the best idea to do the format parsing in the cast functions because that would be run for each row. So if we want to make this more efficient than we can't pass the format as it is to the cast functions as parsing should happen before we reach them. http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java File fe/src/main/java/org/apache/impala/analysis/CastExpr.java: http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@88 PS3, Line 88: this(targetTypeDef, e, null); : } : : public CastExpr(TypeDef targetTypeDef, Expr e, String format) { : Preconditions.checkNotNull(targetTypeDef); : Preconditions.che > Can you call here 'this(targetTypeDef, e, null)' instead? Done http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@228 PS3, Line 228: public boolean isImplicit() { return isImplicit_; } : > I think these two lines whould be switched Done http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: http://gerrit.cloudera.org:8080/#/c/12267/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java@2926 PS3, Line 2926: public void TestCastFormatClauseFromString() throws AnalysisException { : AnalysisError("select cast('05-01-2017' AS DATE FORMAT 'MM-dd-yyyy')", : "Unsupported data type: DATE"); : AnalysisError("select cast('05-01-2017' AS DATETIME FORMAT 'MM-dd-yyyy')", : "Unsupported data type: DATETIME"); : AnalysisError("select cast('05-01-2017' AS INT FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from STRING to INT"); : AnalysisError("select cast('05-01-2017' AS STRING FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from STRING to STRING"); : AnalysisError("select cast('05-01-2017' AS BOOLEAN FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from STRING to BOOLEAN"); : AnalysisError("select cast('05-01-2017' AS DOUBLE FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from STRING to DOUBLE"); : AnalysisError("select cast('05-01-2017' AS TIMESTAMP FORMAT '')", : "FORMAT clause can't be empty"); : AnalyzesOk("select cast('05-01-2017' AS TIMESTAMP FORMAT 'MM-dd-yyyy')"); : } : : @Test : public void TestCastFormatClauseFromTimestamp() throws AnalysisException { : String to_timestamp_cast = "cast('05-01-2017' as timestamp)"; : AnalysisError("select cast("+to_timestamp_cast+" as DATE FORMAT 'MM-dd-yyyy')", : "Unsupported data type: DATE"); : AnalysisError("select cast("+to_timestamp_cast+" as DATETIME FORMAT 'MM-dd-yyyy')", : "Unsupported data type: DATETIME"); : AnalysisError("select cast("+to_timestamp_cast+" AS INT FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from TIMESTAMP to INT"); : AnalysisError("select cast("+to_timestamp_cast+" AS BOOLEAN FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from TIMESTAMP to BOOLEAN"); : AnalysisError("select cast("+to_timestamp_cast+" AS DOUBLE FORMAT 'MM-dd-yyyy')", : "FORMAT clause is not applicable from TIMESTAMP to DOUBLE"); : AnalysisError("select cast("+to_timestamp_cast+" AS STRING FORMAT '')", : "FORMAT clause can't be empty"); : AnalyzesOk("select cast("+to_timestamp_cast+" AS STRING FORMAT 'MM-dd-yyyy')"); : AnalyzesOk("select cast("+to_timestamp_cast+" AS VARCHAR FORMAT 'MM-dd-yyyy')"); : AnalyzesOk("select cast("+to_timestamp_cast+" AS CHAR(10) FORMAT 'MM-dd-yyyy')"); : } > Maybe you should add tests for short format strings, e.g.: "yyyy-M-d" CastExpr.analyze() doesn't check the content of the format parameter so no need to cover scenarios from that aspect in analysis tests. http://gerrit.cloudera.org:8080/#/c/12267/3/testdata/workloads/functional-query/queries/QueryTest/cast_format.test File testdata/workloads/functional-query/queries/QueryTest/cast_format.test: http://gerrit.cloudera.org:8080/#/c/12267/3/testdata/workloads/functional-query/queries/QueryTest/cast_format.test@62 PS3, Line 62: yyyy/dd/MM HH:mm:ss > Again, add some tests using short format strings, e.g. yyyy-M-d Hmm. As the underlying code is actually a full re-use of the existing pattern matching from to_timestamp() and from_timestamp() I feel that testing the pattern matching itself is not needed here. Another note: It seem that CAST(..FORMAT..) is expected to use SQL patterns instead of the implemented Java patterns from the beginning. So I wouldn't focus that much right now on the exact patterns. -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Greg Rahn <gr...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com> Gerrit-Comment-Date: Fri, 01 Feb 2019 14:37:42 +0000 Gerrit-HasComments: Yes