[ https://issues.apache.org/jira/browse/CALCITE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949259#comment-16949259 ]
Zoltan Haindrich commented on CALCITE-3368: ------------------------------------------- I also think that we should probably add some more control to these kind of things... We could probably get back kinda the same simplifiaction power if we could get min/max values for every column - and using range logics for the nodes; but that would I left some comments on the RB; I can understand that if we need {{isAlwaysTrue()}} changes - but that could have serious consequences... > Some problems simplifying ‘expression IS NULL’ > ---------------------------------------------- > > Key: CALCITE-3368 > URL: https://issues.apache.org/jira/browse/CALCITE-3368 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.21.0 > Reporter: Leonard Xu > Assignee: Leonard Xu > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > 'is null' expression in SQL may be optimized incorrectly in the underlying > implementation. > > When I write a Fink SQL to test overflow just like > {code:java} > select > case when (f0 + f1) is null then 'null' else 'not null' end > from testTable > {code} > , I found expression '(f0 + f1) is null ' has been optimized by Calcite, and > the optimization may be incorrect. > > The underlying implementation is that Calcite's simplification logic of > isNull expression in SQL will convert from > *"f(operand0, operand1) IS NULL"* to > *"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind > is ANY。 > This simplification leads to the expression will not calculate the real > value of *f(operand0, operand1)* (eg.. '(f0 + f1)' in my case ),but '(f0 + > f1)' maybe overflows after operation. > {code:java} > //org.apache.calcite.rex.RexSimplify.java > private RexNode simplifyIsNull(RexNode a) { > // Simplify the argument first, > // call ourselves recursively to see whether we can make more progress. > // For example, given > // "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the > // argument to "2", and only then we can simplify "2 IS NULL" to "FALSE". > a = simplify(a, UNKNOWN); > if (!a.getType().isNullable() && isSafeExpression(a)) { > return rexBuilder.makeLiteral(false); > } > if (RexUtil.isNull(a)) { > return rexBuilder.makeLiteral(true); > } > if (a.getKind() == SqlKind.CAST) { > return null; > } > switch (Strong.policy(a.getKind())) { > case NOT_NULL: > return rexBuilder.makeLiteral(false); > case ANY: > // "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies > // to "operand0 IS NULL OR operand1 IS NULL" > final List<RexNode> operands = new ArrayList<>(); > for (RexNode operand : ((RexCall) a).getOperands()) { > final RexNode simplified = simplifyIsNull(operand); > if (simplified == null) { > operands.add( > rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand)); > } else { > operands.add(simplified); > } > } > return RexUtil.composeDisjunction(rexBuilder, operands, false); > case AS_IS: > default: > return null; > } > }{code} > And most of calculating SqlKinds are assigned *Policy.ANY* at present. > {code:java} > //org.apache.calcite.plan.Strong.java > public static Policy policy(SqlKind kind) { > return MAP.getOrDefault(kind, Policy.AS_IS); > } > .... > map.put(SqlKind.PLUS, Policy.ANY); > map.put(SqlKind.PLUS_PREFIX, Policy.ANY); > map.put(SqlKind.MINUS, Policy.ANY); > map.put(SqlKind.MINUS_PREFIX, Policy.ANY); > map.put(SqlKind.TIMES, Policy.ANY); > map.put(SqlKind.DIVIDE, Policy.ANY); > * that operator evaluates to null. */ > public enum Policy { > /** This kind of expression is never null. No need to look at its arguments, > * if it has any. */ > NOT_NULL, > /** This kind of expression has its own particular rules about whether it > * is null. */ > CUSTOM, > /** This kind of expression is null if and only if at least one of its > * arguments is null. */ > ANY, > /** This kind of expression may be null. There is no way to rewrite. */ > AS_IS, > }{code} > > It may be an obvious nonequivalent simplification in SQL. And this issue come > from Flink (FLINK-14030). > [~danny0405], Could you have a look at this? -- This message was sent by Atlassian Jira (v8.3.4#803005)