Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12001 )
Change subject: IMPALA-7902: NumericLiteral fixes, refactoring ...................................................................... Patch Set 8: (10 comments) Thanks Tim for your review comments. I've addressed them via explanations here or by code changes. This change is a step along the way to doing single-pass analysis. The "explicit type" thing will go away once we get there. But, until then, this prevents the bugs that otherwise crop up as I do the work. The tests exist because of the old maxim to wrap code with unit tests before mucking with it. http://gerrit.cloudera.org:8080/#/c/12001/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12001/7//COMMIT_MSG@33 PS7, Line 33: IMPALA-7891: Analyzer does not detect numeric overflow in CAST > This behaviour of ignoring overflow doesn't change does it? This logic turns out to be highly complex. We have two paths: * Analyzer code does a type conversion. This is the case that was ignoring overflow and is fixed. * BE does the type conversion as part of constant folding. The BE reports the error and the FE retains the cast so that the query will fail when run. Example: SELECT CAST(1000 AS TINYINT) Ideally, the analyzer would fail the query due to an invalid cast. Looks like there is some subtlety here and the code decided it is better to have the BE handle the error. (Though, if you have a union with 1B rows, followed by an overflow, not sure it is helpful to grind through the 1B rows then fail...) A better solution overall is to let the BE handle all type conversions via constant folding. That is not in this patch to limit size, but might be a good change in the future. http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/Expr.java@377 PS7, Line 377: public long getNumDistinctValues() { return numDistinctValues_; } > Do we even need this on the Expr base class? It seems like it's only actual You are right, the explicit type is a bit of a hack. It works around the "repeated type widening" issue without changing much code. Without this, each time we analyze a + b, we pick a size for the result that is larger than the inputs: TINYINT + TINYINT --> SMALLINT We then cast the inputs to that type. Then if we analyze again: SMALLINT + SMALLINT --> INT And we cast the same inputs to INT. Having this explicit type breaks that loop; a condition unique to numbers. (Though, strings have a similar problem.) The ultimate fix is to run the type propagation logic only once, which is the ultimate goal. Will take a few more moves to get there, so this is a temporary fix. That said, I have refined this code a bit more, so I'll update the patch with those changes; that will remove the method here in the Expr class. http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java File fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java: http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java@66 PS7, Line 66: t > extra space Done http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java File fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java: http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@53 PS7, Line 53: * > Are "natural type" and "implicit type" the same thing? Natural type is computed from input: 1:TINYINT, 1000:SMALLINT, etc. The explicit type is the result of a user-defined cast: CAST(1 AS SMALLINT) An implicit type is the result of parameter matching: 1:TINYINT + 1:TINYINT matches the SMALLINT addition, so we cast the inputs to that type. If we repeat the process (because we analyze twice in the current version), types can widen again. The code to handle this at present is very fragile. The explicit type provides a temporary fix that keeps the wheels on long enough to move to a single-pass analysis. http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@131 PS7, Line 131: > double? I don't think this needs to be nullable, right? Fixed. http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@283 PS7, Line 283: > I find these function names potentially confusing. Something like isInFloat Renamed to fitsInMumble. Trying to keep noise wording to a minimum... http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java@332 PS7, Line 332: /** > It looks like you're preserving some existing behaviour, but I'm not sure t Yeah, I thought this was odd given how little extra range FLOAT handles on top of DECIMAL: both support up to 10^38. (Oh, typo in the comment: "within the range of DECIMAL" (not BIGINT)... Have made the change to always use DOUBLE. Let's see if the tests complain. http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/common/SqlCastException.java File fe/src/main/java/org/apache/impala/common/SqlCastException.java: http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/main/java/org/apache/impala/common/SqlCastException.java@26 PS7, Line 26: > typo: planner Done http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@492 PS7, Line 492: try { > nit: Blank line seems unnecessary Done http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/test/java/org/apache/impala/analysis/NumericLiteralTest.java File fe/src/test/java/org/apache/impala/analysis/NumericLiteralTest.java: http://gerrit.cloudera.org:8080/#/c/12001/7/fe/src/test/java/org/apache/impala/analysis/NumericLiteralTest.java@445 PS7, Line 445: { > I think it would help to have a very brief comment for some of these tests. Done -- To view, visit http://gerrit.cloudera.org:8080/12001 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I484600747b2871d3a6fe9153751973af9a8534f2 Gerrit-Change-Number: 12001 Gerrit-PatchSet: 8 Gerrit-Owner: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 04 Dec 2018 01:11:38 +0000 Gerrit-HasComments: Yes