[ https://issues.apache.org/jira/browse/CALCITE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leonard Xu updated CALCITE-3368: -------------------------------- Description: 'is null' expression in SQL may be optimized incorrectly in the underlying implementation. When I write a Fink SQL to test overflow just like {code:java} select case when (f0 + f1) is null then 'null' else 'not null' end from testTable {code} , I found expression '(f0 + f1) is null ' has been optimized by Calcite, and the optimization may be incorrect. The underlying implementation is that Calcite's simplification logic of isNull expression in SQL will convert from *"f(operand0, operand1) IS NULL"* to *"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind is ANY。 This simplification leads to the expression will not calculate the real value of *f(operand0, operand1)* (eg.. '(f0 + f1)' in my case ),but '(f0 + f1)' maybe overflows after operation. {code:java} //org.apache.calcite.rex.RexSimplify.java private RexNode simplifyIsNull(RexNode a) { // Simplify the argument first, // call ourselves recursively to see whether we can make more progress. // For example, given // "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the // argument to "2", and only then we can simplify "2 IS NULL" to "FALSE". a = simplify(a, UNKNOWN); if (!a.getType().isNullable() && isSafeExpression(a)) { return rexBuilder.makeLiteral(false); } if (RexUtil.isNull(a)) { return rexBuilder.makeLiteral(true); } if (a.getKind() == SqlKind.CAST) { return null; } switch (Strong.policy(a.getKind())) { case NOT_NULL: return rexBuilder.makeLiteral(false); case ANY: // "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies // to "operand0 IS NULL OR operand1 IS NULL" final List<RexNode> operands = new ArrayList<>(); for (RexNode operand : ((RexCall) a).getOperands()) { final RexNode simplified = simplifyIsNull(operand); if (simplified == null) { operands.add( rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand)); } else { operands.add(simplified); } } return RexUtil.composeDisjunction(rexBuilder, operands, false); case AS_IS: default: return null; } }{code} And most of calculating SqlKinds are assigned *Policy.ANY* at present. {code:java} //org.apache.calcite.plan.Strong.java public static Policy policy(SqlKind kind) { return MAP.getOrDefault(kind, Policy.AS_IS); } .... map.put(SqlKind.PLUS, Policy.ANY); map.put(SqlKind.PLUS_PREFIX, Policy.ANY); map.put(SqlKind.MINUS, Policy.ANY); map.put(SqlKind.MINUS_PREFIX, Policy.ANY); map.put(SqlKind.TIMES, Policy.ANY); map.put(SqlKind.DIVIDE, Policy.ANY); * that operator evaluates to null. */ public enum Policy { /** This kind of expression is never null. No need to look at its arguments, * if it has any. */ NOT_NULL, /** This kind of expression has its own particular rules about whether it * is null. */ CUSTOM, /** This kind of expression is null if and only if at least one of its * arguments is null. */ ANY, /** This kind of expression may be null. There is no way to rewrite. */ AS_IS, }{code} It may be an obvious nonequivalent simplification in SQL. And this issue come from Flink (FLINK-14030). [~danny0405], Could you have a look at this? was: 'is null' expression in SQL may be optimized incorrectly in the underlying implementation. When I test a Fink SQL to check overflow situation like {code:java} select case when (f0 + f1) is null then 'null' else 'not null' end from testTable {code} , I found expression '(f0 + f1) is null ' has been optimized by Calcite, and the optimization may be incorrect. The und The Calcite's simplification logic of isNull expression in SQL will convert from *"f(operand0, operand1) IS NULL"* to *"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind is ANY。 This simplification leads to the expression will not calculate the real value of *f(operand0, operand1)* (eg..'f0 + 'f1 )which maybe overflow. {code:java} //org.apache.calcite.rex.RexSimplify.java private RexNode simplifyIsNull(RexNode a) { // Simplify the argument first, // call ourselves recursively to see whether we can make more progress. // For example, given // "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the // argument to "2", and only then we can simplify "2 IS NULL" to "FALSE". a = simplify(a, UNKNOWN); if (!a.getType().isNullable() && isSafeExpression(a)) { return rexBuilder.makeLiteral(false); } if (RexUtil.isNull(a)) { return rexBuilder.makeLiteral(true); } if (a.getKind() == SqlKind.CAST) { return null; } switch (Strong.policy(a.getKind())) { case NOT_NULL: return rexBuilder.makeLiteral(false); case ANY: // "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies // to "operand0 IS NULL OR operand1 IS NULL" final List<RexNode> operands = new ArrayList<>(); for (RexNode operand : ((RexCall) a).getOperands()) { final RexNode simplified = simplifyIsNull(operand); if (simplified == null) { operands.add( rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand)); } else { operands.add(simplified); } } return RexUtil.composeDisjunction(rexBuilder, operands, false); case AS_IS: default: return null; } }{code} And most of calculating SqlKinds are assigned *Policy.ANY* at present. {code:java} //org.apache.calcite.plan.Strong.java public static Policy policy(SqlKind kind) { return MAP.getOrDefault(kind, Policy.AS_IS); } .... map.put(SqlKind.PLUS, Policy.ANY); map.put(SqlKind.PLUS_PREFIX, Policy.ANY); map.put(SqlKind.MINUS, Policy.ANY); map.put(SqlKind.MINUS_PREFIX, Policy.ANY); map.put(SqlKind.TIMES, Policy.ANY); map.put(SqlKind.DIVIDE, Policy.ANY); * that operator evaluates to null. */ public enum Policy { /** This kind of expression is never null. No need to look at its arguments, * if it has any. */ NOT_NULL, /** This kind of expression has its own particular rules about whether it * is null. */ CUSTOM, /** This kind of expression is null if and only if at least one of its * arguments is null. */ ANY, /** This kind of expression may be null. There is no way to rewrite. */ AS_IS, }{code} It may be an obvious nonequivalent simplification in SQL. And this issue come from Flink (FLINK-14030). [~danny0405], Could you have a look at this? > 'is null' expression in SQL may be optimized incorrectly in the underlying > implementation > ----------------------------------------------------------------------------------------- > > Key: CALCITE-3368 > URL: https://issues.apache.org/jira/browse/CALCITE-3368 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.21.0 > Reporter: Leonard Xu > Assignee: Danny Chan > Priority: Major > > 'is null' expression in SQL may be optimized incorrectly in the underlying > implementation. > > When I write a Fink SQL to test overflow just like > {code:java} > select > case when (f0 + f1) is null then 'null' else 'not null' end > from testTable > {code} > , I found expression '(f0 + f1) is null ' has been optimized by Calcite, and > the optimization may be incorrect. > > The underlying implementation is that Calcite's simplification logic of > isNull expression in SQL will convert from > *"f(operand0, operand1) IS NULL"* to > *"operand0 IS NULL OR operand1 IS NULL"* if the Policy of RexNode‘s SqlKind > is ANY。 > This simplification leads to the expression will not calculate the real > value of *f(operand0, operand1)* (eg.. '(f0 + f1)' in my case ),but '(f0 + > f1)' maybe overflows after operation. > {code:java} > //org.apache.calcite.rex.RexSimplify.java > private RexNode simplifyIsNull(RexNode a) { > // Simplify the argument first, > // call ourselves recursively to see whether we can make more progress. > // For example, given > // "(CASE WHEN FALSE THEN 1 ELSE 2) IS NULL" we first simplify the > // argument to "2", and only then we can simplify "2 IS NULL" to "FALSE". > a = simplify(a, UNKNOWN); > if (!a.getType().isNullable() && isSafeExpression(a)) { > return rexBuilder.makeLiteral(false); > } > if (RexUtil.isNull(a)) { > return rexBuilder.makeLiteral(true); > } > if (a.getKind() == SqlKind.CAST) { > return null; > } > switch (Strong.policy(a.getKind())) { > case NOT_NULL: > return rexBuilder.makeLiteral(false); > case ANY: > // "f" is a strong operator, so "f(operand0, operand1) IS NULL" simplifies > // to "operand0 IS NULL OR operand1 IS NULL" > final List<RexNode> operands = new ArrayList<>(); > for (RexNode operand : ((RexCall) a).getOperands()) { > final RexNode simplified = simplifyIsNull(operand); > if (simplified == null) { > operands.add( > rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operand)); > } else { > operands.add(simplified); > } > } > return RexUtil.composeDisjunction(rexBuilder, operands, false); > case AS_IS: > default: > return null; > } > }{code} > And most of calculating SqlKinds are assigned *Policy.ANY* at present. > {code:java} > //org.apache.calcite.plan.Strong.java > public static Policy policy(SqlKind kind) { > return MAP.getOrDefault(kind, Policy.AS_IS); > } > .... > map.put(SqlKind.PLUS, Policy.ANY); > map.put(SqlKind.PLUS_PREFIX, Policy.ANY); > map.put(SqlKind.MINUS, Policy.ANY); > map.put(SqlKind.MINUS_PREFIX, Policy.ANY); > map.put(SqlKind.TIMES, Policy.ANY); > map.put(SqlKind.DIVIDE, Policy.ANY); > * that operator evaluates to null. */ > public enum Policy { > /** This kind of expression is never null. No need to look at its arguments, > * if it has any. */ > NOT_NULL, > /** This kind of expression has its own particular rules about whether it > * is null. */ > CUSTOM, > /** This kind of expression is null if and only if at least one of its > * arguments is null. */ > ANY, > /** This kind of expression may be null. There is no way to rewrite. */ > AS_IS, > }{code} > > It may be an obvious nonequivalent simplification in SQL. And this issue come > from Flink (FLINK-14030). > [~danny0405], Could you have a look at this? > -- This message was sent by Atlassian Jira (v8.3.4#803005)