zabetak commented on code in PR #5196: URL: https://github.com/apache/hive/pull/5196#discussion_r1933775051
########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java: ########## @@ -66,18 +65,18 @@ public abstract class HiveFilterJoinRule extends FilterJoinRule { /** * Creates a PushFilterPastJoinRule with an explicit root operand. */ - protected HiveFilterJoinRule(RelOptRuleOperand operand, String id, boolean smart, + protected HiveFilterJoinRule(Config conf, RelOptRuleOperand operand, String id, boolean smart, RelBuilderFactory relBuilderFactory) { super( - (Config) RelRule.Config.EMPTY + conf Review Comment: We can possibly remove the conf arg from the constructor and use directly `FilterIntoJoinRule.FilterIntoJoinRuleConfig.DEFAULT` in the body. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java: ########## @@ -22,8 +22,6 @@ import org.apache.calcite.rel.RelNode; import org.apache.calcite.rel.RelShuttle; import org.apache.calcite.rel.core.Filter; -import org.apache.calcite.rel.core.Join; -import org.apache.calcite.rel.core.TableScan; Review Comment: nit: Changes seem unrelated to the upgrade. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java: ########## @@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) { switch (call.getKind()) { case IN: return transformIntoOrAndClause(rexBuilder, call); + case SEARCH: + return expandSearchAndRemoveRowOperator(rexBuilder, call); default: break; } } return RexUtil.isFlat(node) ? node : RexUtil.flatten(rexBuilder, node); } + + private RexNode expandSearchAndRemoveRowOperator(RexBuilder rexBuilder, RexCall expression) { + RexLiteral literal = (RexLiteral) expression.getOperands().get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + + if (sarg.isPoints()) { + // if it is a row operator, we'll have + // SEARCH(ROW($1, $2), Sarg[[1, 2], [3, 4]] Review Comment: How do we end-up with `SEARCH(ROW($1, $2), Sarg[[1, 2], [3, 4]]`? What is the SQL equivalent? Why can't we handle it with `RexUtil.searchShuttle`? How will the shuttle rewrite the SEARCH operator in the example above? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java: ########## @@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) { switch (call.getKind()) { case IN: return transformIntoOrAndClause(rexBuilder, call); Review Comment: I get the impression that `case IN` is now dead/unreachable code. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java: ########## @@ -171,7 +175,36 @@ public Double visitCall(RexCall call) { break; } - default: + case SEARCH: Review Comment: The introduction of SEARCH makes the `case IN` and `case BETWEEN` kinda obsolete and probably dead code. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java: ########## @@ -171,7 +175,36 @@ public Double visitCall(RexCall call) { break; } - default: + case SEARCH: + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + RexBuilder rexBuilder = childRel.getCluster().getRexBuilder(); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // TODO: revisit this to better handle INs Review Comment: Instead of TODOs we should create JIRA tickets that outline the remaining work that needs to be done. The ticket should clearly mention if it is optional improvement or a regression that needs to be addressed. We can postpone the creation of the ticket till we finalize the review for this PR. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ExprNodeConverter.java: ########## @@ -203,6 +207,32 @@ public ExprNodeDesc visitCall(RexCall call) { for (RexNode operand : call.operands) { args.add(operand.accept(this)); } + } else if (call.getKind() == SqlKind.SEARCH) { Review Comment: Most of the comments that I did in ASTConverter about the SEARCH operator apply also here. The code is also very similar so we should check if we can somehow refactor this logic. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java: ########## @@ -322,18 +324,24 @@ public RexNode getExpression(String functionText, FunctionInfo fi, // unix_timestamp(args) -> to_unix_timestamp(args) calciteOp = HiveToUnixTimestampSqlOperator.INSTANCE; } - expr = rexBuilder.makeCall(returnType, calciteOp, inputs); + expr = getExpression(returnType, calciteOp, inputs); } if (expr instanceof RexCall && !expr.isA(SqlKind.CAST)) { RexCall call = (RexCall) expr; expr = rexBuilder.makeCall(returnType, call.getOperator(), RexUtil.flatten(call.getOperands(), call.getOperator())); } - return expr; } + private RexNode getExpression(RelDataType returnType, SqlOperator calciteOp, List<RexNode> inputs) { + if (calciteOp.getKind() == SqlKind.IN) { + return rexBuilder.makeIn(inputs.get(0), inputs.subList(1, inputs.size())); + } + return rexBuilder.makeCall(returnType, calciteOp, inputs); + } + Review Comment: The code would be more uniform if instead of overloading the `getExpression` method we added the new `if clause` in the existing one. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java: ########## @@ -198,6 +208,12 @@ public Double visitCall(RexCall call) { return null; } + if (call.getKind() == SqlKind.SEARCH) { + RexCall expandedCall = (RexCall) RexUtil.expandSearch(rexBuilder, null, call); + + return visitCall(expandedCall); + } + Review Comment: I think that the best approach would be to handle SEARCH inside the `functionCost` method and not expand it. When we do the expansion we never end up with an IN or BETWEEN call so the code for them inside `functionCost` is basically dead/unreachable. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java: ########## @@ -1177,4 +1185,633 @@ public static List<Integer> getNewRelDistributionKeys( } return distributionKeys; } + + /** + * This class just wraps around a RexNode enables equals/hashCode based on toString. + * After CALCITE-2632 this might not be needed anymore */ + public static class RexNodeRef { + + public static Comparator<RexNodeRef> COMPARATOR = + Comparator.comparing((RexNodeRef o) -> o.node.toString()); + private final RexNode node; + + public RexNodeRef(RexNode node) { + this.node = node; + } + + public RexNode getRexNode() { + return node; + } + + @Override + public int hashCode() { + return node.toString().hashCode(); + } + + @Override + public boolean equals(Object o) { + if (o instanceof RexNodeRef) { + RexNodeRef otherRef = (RexNodeRef) o; + return node.toString().equals(otherRef.node.toString()); + } + return false; + } + + @Override + public String toString() { + return "ref for:" + node.toString(); + } + } + + /** + * Represents a constraint. + * + * Example: a=1 + * substr(a,1,2) = concat('asd','xxx') + */ + public static class Constraint { + + private final RexNode exprNode; + private final RexNode constNode; + + public Constraint(RexNode exprNode, RexNode constNode) { + this.exprNode = exprNode; + this.constNode = constNode; + } + + /** + * Interprets argument as a constraint; if not possible returns null. + */ + public static Constraint of(RexNode n) { + if (!(n instanceof RexCall)) { + return null; + } + RexCall call = (RexCall) n; + if (call.getOperator().getKind() != SqlKind.EQUALS) { + return null; + } + RexNode opA = call.operands.get(0); + RexNode opB = call.operands.get(1); + if (RexUtil.isNull(opA) || RexUtil.isNull(opB)) { + // dont try to compare nulls + return null; + } + if (isConstExpr(opA) && isColumnExpr(opB)) { + return new Constraint(opB, opA); + } + if (isColumnExpr(opA) && isConstExpr(opB)) { + return new Constraint(opA, opB); + } + return null; + } + + private static boolean isColumnExpr(RexNode node) { + return !node.getType().isStruct() && !HiveCalciteUtil.getInputRefs(node).isEmpty() + && HiveCalciteUtil.isDeterministic(node); + } + + private static boolean isConstExpr(RexNode node) { + return !node.getType().isStruct() && HiveCalciteUtil.getInputRefs(node).isEmpty() + && HiveCalciteUtil.isDeterministic(node); + } + + public RexNodeRef getKey() { + return new RexNodeRef(exprNode); + } + + public RexNode getValue() { + return constNode; + } + + } + + /** + * Merge IN clauses, when possible. + */ + public static class RexMergeInClause extends RexShuttle { + private final RexBuilder rexBuilder; + + public RexMergeInClause(RexBuilder rexBuilder) { + this.rexBuilder = rexBuilder; + } + + @Override public RexNode visitCall(RexCall call) { + switch (call.getKind()) { + case AND: + return handleAND(rexBuilder, call); + case OR: + return handleOR(rexBuilder, call); + default: + return super.visitCall(call); + } + } + + private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) { + // Visited nodes + final Set<RexNode> visitedRefs = new LinkedHashSet<>(); + // IN clauses need to be combined by keeping only common elements + final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = LinkedHashMultimap.create(); + // We will use this set to keep those expressions that may evaluate + // into a null value. + final Multimap<RexNode,RexNode> inLHSExprToRHSNullableExprs = LinkedHashMultimap.create(); + final List<RexNode> operands = new ArrayList<>(RexUtil.flattenAnd(call.getOperands())); + + for (int i = 0; i < operands.size(); i++) { + RexNode operand = operands.get(i); + if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { + continue; + } + RexNode ref = inCall.getOperands().get(0); + visitedRefs.add(ref); + if (ref.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(ref, ref); + } + if (inLHSExprToRHSExprs.containsKey(ref)) { + Set<SimilarRexNodeElement> expressions = Sets.newHashSet(); + for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + expressions.add(new SimilarRexNodeElement(constNode)); + if (constNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(ref, constNode); + } + } + Collection<SimilarRexNodeElement> knownConstants = inLHSExprToRHSExprs.get(ref); + if (!shareSameType(knownConstants, expressions)) { + return call; + } + knownConstants.retainAll(expressions); + } else { + for (int j = 1; j < inCall.getOperands().size(); j++) { + RexNode constNode = inCall.getOperands().get(j); + inLHSExprToRHSExprs.put(ref, new SimilarRexNodeElement(constNode)); + if (constNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(ref, constNode); + } + } + } + operands.remove(i); + --i; + } else if (operand.getKind() == SqlKind.EQUALS) { + Constraint c = Constraint.of(operand); + if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) { + continue; + } + visitedRefs.add(c.exprNode); + if (c.exprNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(c.exprNode, c.exprNode); + } + if (c.constNode.getType().isNullable()) { + inLHSExprToRHSNullableExprs.put(c.exprNode, c.constNode); + } + if (inLHSExprToRHSExprs.containsKey(c.exprNode)) { + Collection<SimilarRexNodeElement> knownConstants = inLHSExprToRHSExprs.get(c.exprNode); + Collection<SimilarRexNodeElement> nextConstant = Collections.singleton(new SimilarRexNodeElement(c.constNode)); + if (!shareSameType(knownConstants, nextConstant)) { + return call; + } + knownConstants.retainAll(nextConstant); + } else { + inLHSExprToRHSExprs.put(c.exprNode, new SimilarRexNodeElement(c.constNode)); + } + operands.remove(i); + --i; + } + } + // Create IN clauses + final List<RexNode> newOperands = createInClauses(rexBuilder, + visitedRefs, inLHSExprToRHSExprs, inLHSExprToRHSNullableExprs); + newOperands.addAll(operands); + // Return node + return RexUtil.composeConjunction(rexBuilder, newOperands, false); + } + + protected static class SimilarRexNodeElement { + private final RexNode rexNode; + + protected SimilarRexNodeElement(RexNode rexNode) { + this.rexNode = rexNode; + } + + public RexNode getRexNode() { + return rexNode; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (o == null || getClass() != o.getClass()) return false; + SimilarRexNodeElement that = (SimilarRexNodeElement) o; + return equalsWithSimilarType(rexNode, that.rexNode); + } + + private static boolean equalsWithSimilarType(RexNode rexNode1, RexNode rexNode2) { + if (!(rexNode1 instanceof RexLiteral) || !(rexNode2 instanceof RexLiteral)) { + return rexNode1.equals(rexNode2); + } + + RexLiteral rexLiteral1 = (RexLiteral) rexNode1; + RexLiteral rexLiteral2 = (RexLiteral) rexNode2; + + if (rexLiteral1.getValue() == null && rexLiteral2.getValue() == null) { + return true; + } + + return rexLiteral1.getValue() != null && rexLiteral1.getValue().compareTo(rexLiteral2.getValue()) == 0 && + rexLiteral1.getType().getSqlTypeName().equals(rexLiteral2.getType().getSqlTypeName()); + } + + @Override + public int hashCode() { + if (rexNode instanceof RexLiteral) { + RexLiteral rexLiteral = (RexLiteral) rexNode; + return Objects.hash(rexLiteral.getValue(), rexLiteral.getType().getSqlTypeName()); + } + return Objects.hash(rexNode); + } + } + + /** + * Check if the type of nodes in the two collections is homogeneous within the collections + * and identical between them. + * @param nodes1 the first collection of nodes + * @param nodes2 the second collection of nodes + * @return true if nodes in both collections is unique and identical, false otherwise + */ + private static boolean shareSameType( + Collection<SimilarRexNodeElement> nodes1, Collection<SimilarRexNodeElement> nodes2) { + return Stream.of(nodes1, nodes2).flatMap(Collection::stream) + .map(n -> n.getRexNode().getType().getSqlTypeName()) + .distinct() + .count() == 1; + } + + private static RexNode handleOR(RexBuilder rexBuilder, RexCall call) { + // IN clauses need to be combined by keeping all elements + final List<RexNode> operands = new ArrayList<>(RexUtil.flattenOr(call.getOperands())); + final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = LinkedHashMultimap.create(); + for (int i = 0; i < operands.size(); i++) { + RexNode operand = operands.get(i); + if (operand.getKind() == SqlKind.IN) { + RexCall inCall = (RexCall) operand; + if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) { + continue; + } + RexNode ref = inCall.getOperands().get(0); + for (int j = 1; j < inCall.getOperands().size(); j++) { + inLHSExprToRHSExprs.put(ref, new SimilarRexNodeElement(inCall.getOperands().get(j))); + } + operands.remove(i); + --i; + } + } + // Create IN clauses (fourth parameter is not needed since no expressions were removed) + final List<RexNode> newOperands = createInClauses(rexBuilder, + inLHSExprToRHSExprs.keySet(), inLHSExprToRHSExprs, null); + newOperands.addAll(operands); + // Return node + RexNode result = RexUtil.composeDisjunction(rexBuilder, newOperands, false); + if (!result.getType().equals(call.getType())) { + return rexBuilder.makeCast(call.getType(), result, true); + } + return result; + } + + private static RexNode createResultFromEmptySet(RexBuilder rexBuilder, + RexNode ref, Multimap<RexNode, RexNode> inLHSExprToRHSNullableExprs) { + if (inLHSExprToRHSNullableExprs.containsKey(ref)) { + // We handle possible null values in the expressions. + List<RexNode> nullableExprs = + inLHSExprToRHSNullableExprs.get(ref) + .stream() + .map(n -> rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, ImmutableList.of(n))) + .collect(Collectors.toList()); + return RexUtil.composeConjunction(rexBuilder, + ImmutableList.of( + RexUtil.composeDisjunction(rexBuilder, nullableExprs, false), + rexBuilder.makeNullLiteral(rexBuilder.getTypeFactory().createSqlType(SqlTypeName.BOOLEAN))), + false); + } + return rexBuilder.makeLiteral(false); + } + + private static List<RexNode> createInClauses(RexBuilder rexBuilder, Set<RexNode> visitedRefs, + Multimap<RexNode, SimilarRexNodeElement> inLHSExprToRHSExprs, Multimap<RexNode,RexNode> inLHSExprToRHSNullableExprs) { + final List<RexNode> newExpressions = new ArrayList<>(); + for (RexNode ref : visitedRefs) { + Collection<SimilarRexNodeElement> exprs = inLHSExprToRHSExprs.get(ref); + if (exprs.isEmpty()) { + // Note that Multimap does not keep a key if all its values are removed. + newExpressions.add(createResultFromEmptySet(rexBuilder, ref, inLHSExprToRHSNullableExprs)); + } else if (exprs.size() == 1) { + List<RexNode> newOperands = new ArrayList<>(2); + newOperands.add(ref); + newOperands.add(exprs.iterator().next().getRexNode()); + newExpressions.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, newOperands)); + } else { + List<RexNode> newOperands = new ArrayList<>(exprs.size() + 1); + newOperands.add(ref); + newOperands.addAll(exprs.stream().map(SimilarRexNodeElement::getRexNode).collect(Collectors.toList())); + newExpressions.add(rexBuilder.makeIn(newOperands.get(0), newOperands.subList(1, newOperands.size()))); + } + } + return newExpressions; + } + + } + + /** + * Transforms inequality candidates into [NOT] BETWEEN calls. + * + */ + public static class RexTransformIntoBetween extends RexShuttle { + private final RexBuilder rexBuilder; + + public RexTransformIntoBetween(RexBuilder rexBuilder) { + this.rexBuilder = rexBuilder; + } + + @Override + public RexNode visitCall(RexCall inputCall) { + RexNode node = super.visitCall(inputCall); + if (node instanceof RexCall) { + RexCall call = (RexCall) node; + switch (call.getKind()) { + case AND: + return processComparisons(call, SqlKind.LESS_THAN_OR_EQUAL, false); + case OR: + return processComparisons(call, SqlKind.GREATER_THAN, true); + default: + break; + } + } + return node; + } + + /** + * Represents a replacement candidate. + */ + static class BetweenCandidate { + + private final RexNode newNode; + private final RexNode[] oldNodes; + // keeps track if this candidate was already used during replacement + private boolean used; + + public BetweenCandidate(RexNode newNode, RexNode... oldNodes) { + this.newNode = newNode; + this.oldNodes = oldNodes; + } + } + + private RexNode processComparisons(RexCall call, SqlKind forwardEdge, boolean invert) { + DiGraph<RexNodeRef, RexCall> g = + buildComparisonGraph(call.getOperands(), forwardEdge); + Map<RexNode, BetweenCandidate> replacedNodes = new IdentityHashMap<>(); + for (RexNodeRef n : g.nodes()) { + Set<RexNodeRef> pred = g.predecessors(n); + Set<RexNodeRef> succ = g.successors(n); + if (!pred.isEmpty() && !succ.isEmpty()) { + RexNodeRef p = pred.iterator().next(); + RexNodeRef s = succ.iterator().next(); + + RexNode between = rexBuilder.makeCall(HiveBetween.INSTANCE, + rexBuilder.makeLiteral(invert), n.getRexNode(), p.getRexNode(), s.getRexNode()); + BetweenCandidate bc = new BetweenCandidate( + between, + g.removeEdge(p, n), + g.removeEdge(n, s)); + + for (RexNode node : bc.oldNodes) { + replacedNodes.put(node, bc); + } + } + } + if (replacedNodes.isEmpty()) { + // no effect + return call; + } + List<RexNode> newOperands = new ArrayList<>(); + for (RexNode o : call.getOperands()) { + BetweenCandidate candidate = replacedNodes.get(o); + if (candidate == null) { + newOperands.add(o); + } else { + if (!candidate.used) { + newOperands.add(candidate.newNode); + candidate.used = true; + } + } + } + + if (newOperands.size() == 1) { + return newOperands.get(0); + } else { + return rexBuilder.makeCall(call.getOperator(), newOperands); + } + } + + /** + * Builds a graph of the given comparison type. + * The graph edges are annotated with the RexNodes representing the comparison. + */ + private DiGraph<RexNodeRef, RexCall> buildComparisonGraph(List<RexNode> operands, SqlKind cmpForward) { + DiGraph<RexNodeRef, RexCall> g = new DiGraph<>(); + for (RexNode node : operands) { + if(!(node instanceof RexCall) ) { + continue; + } + RexCall rexCall = (RexCall) node; + SqlKind kind = rexCall.getKind(); + if (kind == cmpForward) { + RexNode opA = rexCall.getOperands().get(0); + RexNode opB = rexCall.getOperands().get(1); + g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall); + } else if (kind == cmpForward.reverse()) { + RexNode opA = rexCall.getOperands().get(1); + RexNode opB = rexCall.getOperands().get(0); + g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall); + } + } + return g; + } + } + + /** + * Transforms OR clauses into IN clauses, when possible. + */ + public static class RexTransformIntoInClause extends RexShuttle { + private final RexBuilder rexBuilder; + private final int minNumORClauses; + + public RexTransformIntoInClause(RexBuilder rexBuilder, int minNumORClauses) { + this.rexBuilder = rexBuilder; + this.minNumORClauses = minNumORClauses; + } + + @Override + public RexNode visitCall(RexCall inputCall) { + RexNode node = super.visitCall(inputCall); + if (node instanceof RexCall) { + RexCall call = (RexCall) node; + if (call.getKind() == SqlKind.OR) { + try { + RexNode newNode = transformIntoInClauseCondition(rexBuilder, + call, minNumORClauses); + if (newNode != null) { + return newNode; + } + } catch (SemanticException e) { + LOG.error("Exception in HivePointLookupOptimizerRule", e); + return call; + } + } + } + return node; + } + + /** + * A group of Constraints. + * + * Examples: + * (a=1 && b=1) + * (a=1) + * + * Note: any rexNode is accepted as constraint; but it might be keyed with the empty key; + * which means it can't be parsed as a constraint for some reason; but for completeness... + * + */ + static class ConstraintGroup { + private final Map<RexNodeRef, Constraint> constraints = new HashMap<>(); + private final RexNode originalRexNode; + private final Set<RexNodeRef> key; + + public ConstraintGroup(RexNode rexNode) { + originalRexNode = rexNode; + + final List<RexNode> conjunctions = RelOptUtil.conjunctions(rexNode); + + for (RexNode n : conjunctions) { + + Constraint c = Constraint.of(n); + if (c == null) { + // interpretation failed; make this node opaque + key = Collections.emptySet(); + return; + } + constraints.put(c.getKey(), c); + } + if (constraints.size() != conjunctions.size()) { + LOG.debug("unexpected situation; giving up on this branch"); + key = Collections.emptySet(); + return; + } + key = constraints.keySet(); + } + + public List<RexNode> getValuesInOrder(List<RexNodeRef> columns) throws SemanticException { + List<RexNode> ret = new ArrayList<>(); + for (RexNodeRef rexInputRef : columns) { + Constraint constraint = constraints.get(rexInputRef); + if (constraint == null) { + throw new SemanticException("Unable to find constraint which was earlier added."); + } + ret.add(constraint.getValue()); + } + return ret; + } + } + + private RexNode transformIntoInClauseCondition(RexBuilder rexBuilder, RexNode condition, + int minNumORClauses) throws SemanticException { + assert condition.getKind() == SqlKind.OR; + + ImmutableList<RexNode> operands = RexUtil.flattenOr(((RexCall) condition).getOperands()); + if (operands.size() < minNumORClauses) { + // We bail out + return null; + } + List<ConstraintGroup> allNodes = new ArrayList<>(); + List<ConstraintGroup> processedNodes = new ArrayList<>(); + for (int i = 0; i < operands.size(); i++) { + ConstraintGroup m = new ConstraintGroup(operands.get(i)); + allNodes.add(m); + } + + Multimap<Set<RexNodeRef>, ConstraintGroup> assignmentGroups = + Multimaps.index(allNodes, cg -> cg.key); + + for (Map.Entry<Set<RexNodeRef>, Collection<ConstraintGroup>> sa : assignmentGroups.asMap().entrySet()) { + // skip opaque + if (sa.getKey().isEmpty()) { + continue; + } + // not enough equalities should not be handled + if (sa.getValue().size() < 2 || sa.getValue().size() < minNumORClauses) { + continue; + } + + allNodes.add(new ConstraintGroup(buildInFor(sa.getKey(), sa.getValue()))); + processedNodes.addAll(sa.getValue()); + } + + if (processedNodes.isEmpty()) { + return null; + } + allNodes.removeAll(processedNodes); + List<RexNode> ops = new ArrayList<>(); + for (ConstraintGroup mx : allNodes) { + ops.add(mx.originalRexNode); + } + if (ops.size() == 1) { + return ops.get(0); + } else { + return rexBuilder.makeCall(SqlStdOperatorTable.OR, ops); + } + + } + + private RexNode buildInFor(Set<RexNodeRef> set, Collection<ConstraintGroup> value) throws SemanticException { + + List<RexNodeRef> columns = new ArrayList<>(set); + columns.sort(RexNodeRef.COMPARATOR); + List<RexNode >operands = new ArrayList<>(); + + List<RexNode> columnNodes = columns.stream().map(RexNodeRef::getRexNode).collect(Collectors.toList()); + operands.add(useStructIfNeeded(columnNodes)); + for (ConstraintGroup node : value) { + List<RexNode> values = node.getValuesInOrder(columns); + operands.add(useStructIfNeeded(values)); + } + + return rexBuilder.makeIn(operands.get(0), operands.subList(1, operands.size())); + } + + private RexNode useStructIfNeeded(List<? extends RexNode> columns) { + // Create STRUCT clause + if (columns.size() == 1) { + return columns.get(0); + } else { + return rexBuilder.makeCall(SqlStdOperatorTable.ROW, columns); + } + } + + } + + public static RexNode transformOrToInAndInequalityToBetween( + RexBuilder rexBuilder, RexNode condition, int minNumORClauses) { + // 1. We try to transform possible candidates + RexTransformIntoInClause transformIntoInClause = new RexTransformIntoInClause(rexBuilder, minNumORClauses); + RexNode newCondition = transformIntoInClause.apply(condition); + + // 2. We merge IN expressions + RexMergeInClause mergeInClause = new RexMergeInClause(rexBuilder); + newCondition = mergeInClause.apply(newCondition); + + // 3. Close BETWEEN expressions if possible + RexTransformIntoBetween t = new RexTransformIntoBetween(rexBuilder); + newCondition = t.apply(newCondition); + return newCondition; + } Review Comment: There is quite some overlap with the functionality of RexSimplify and the capabilities of the `SEARCH` operator. If we commit this code as it is we will need to create some follow-up tickets for cleanup. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java: ########## @@ -80,7 +82,12 @@ private HivePushdownSnapshotFilterRule(Config config) { @Override public void onMatch(RelOptRuleCall call) { HiveFilter filter = call.rel(0); - RexNode newCondition = filter.getCondition().accept(new SnapshotIdShuttle(call.builder().getRexBuilder(), call.getMetadataQuery(), filter)); + RexNode expandedCondition = RexUtil.expandSearch( + filter.getCluster().getRexBuilder(), null, filter.getCondition() + ); Review Comment: The fact that we expand the SEARCH in this rule may be OK but somewhat surprising. Firstly, I was not expecting to have a SEARCH operator over the `snapshotId` virtual column since we are supposed to have a single condition in the filter that is `snapshotId <= 12345677899`. I was under the impression that simple inequality conditions are not transformed to SEARCH. Secondly, assuming that for whatever reason we end-up with a SEARCH operator then it should be strictly in the form `SEARCH($snapshotId, Sarg[-∞, 12345677899])` so I guess we could handle it explicitly in the `SnapshotIdShuttle`; to be discussed if its worth it. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // convert Sarg to IN when they are points. + if (sarg.isPoints()) { Review Comment: I am wondering if we need to handle also org.apache.calcite.util.Sarg#isComplementedPoints to create NOT IN. Let's check the implementation in [SqlImplementor](https://github.com/apache/calcite/blob/5c7ead4ba7f29dbf5412fb2fb551fef01f83ce62/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L922) to see if there is anything else that we missing. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java: ########## @@ -171,7 +175,36 @@ public Double visitCall(RexCall call) { break; } - default: + case SEARCH: + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + RexBuilder rexBuilder = childRel.getCluster().getRexBuilder(); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // TODO: revisit this to better handle INs + if (sarg.isPoints()) { + selectivity = computeFunctionSelectivity(call); + if (selectivity != null) { + selectivity = selectivity * sarg.pointCount; + if (selectivity <= 0.0) { + selectivity = 0.10; + } else if (selectivity >= 1.0) { + selectivity = 1.0; + } + } + break; + + } else { + return visitCall((RexCall) + transformOrToInAndInequalityToBetween( Review Comment: This is not an appropriate place to perform transformations. This code should just estimate the cost based on the expressions it gets as an input. I understand that this is an attempt to restore/retain the old cost behavior but doing back and forth transformations is costly and fragile. The sarg argument is basically a set of ranges so we could probably come up with the same cost by reusing the code in `computeBetweenPredicateSelectivity` and `computeDisjunctionSelectivity`. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveRowIsDeletedPropagator.java: ########## @@ -264,9 +263,10 @@ public RelNode visit(HiveJoin join, Context context) { projectNames.add(ANY_INSERTED_COLUMN_NAME); // Create input refs to derived expressions in project - RelDataType boolIntType = relBuilder.getTypeFactory().createSqlType(SqlTypeName.BOOLEAN); - RexNode anyDeleted = rexBuilder.makeInputRef(boolIntType, projects.size() - 2); - RexNode anyInserted = rexBuilder.makeInputRef(boolIntType, projects.size() - 1); + int anyDeletedIndex = projects.size() - 2; + int anyInsertedIndex = projects.size() - 1; + RexNode anyDeleted = rexBuilder.makeInputRef(projects.get(anyDeletedIndex).getType(), anyDeletedIndex); + RexNode anyInserted = rexBuilder.makeInputRef(projects.get(anyInsertedIndex).getType(), anyInsertedIndex); Review Comment: Is this refactoring necessary/required for the upgrade? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -869,6 +875,12 @@ public ASTNode visitInputRef(RexInputRef inputRef) { @Override public ASTNode visitLiteral(RexLiteral literal) { + if (!RexUtil.isNull(literal) && literal.getType().isStruct()) { + return rexBuilder + .makeCall(SqlStdOperatorTable.ROW, (List<? extends RexNode>) literal.getValue()) + .accept(this); + } + Review Comment: Why we didn't have STRUCT literals before? Why we didn't need to handle them here? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1067,9 +1079,17 @@ public ASTNode visitCall(RexCall call) { return SqlFunctionConverter.buildAST(SqlStdOperatorTable.NOT, Collections.singletonList(SqlFunctionConverter.buildAST(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM, astNodeLst, call.getType())), call.getType()); case CAST: - assert(call.getOperands().size() == 1); + if (call.getOperands().size() != 1) { + throw new IllegalArgumentException("CASTs should have only 1 operand"); + } + RexNode castOperand = call.getOperands().get(0); + + // Extract RexNode out of CAST when it's not a literal and the types are equal + if (!(castOperand instanceof RexLiteral) && call.getType().equals(castOperand.getType())) { + return castOperand.accept(this); + } Review Comment: This doesn't seem like the appropriate place to do such simplifications. They are usually performed by Calcite (RexSimplify/RexBuilder/RexExecutor) when it is valid to do so. Why did we opt to add this here? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // convert Sarg to IN when they are points. + if (sarg.isPoints()) { + // just expand SEARCH to ORs when point count is less than HIVE_POINT_LOOKUP_OPTIMIZER_MIN + if (sarg.pointCount < minOrClauses) { + return visitCall((RexCall) call.accept(RexUtil.searchShuttle(rexBuilder, null, -1))); + } + + // else convert to IN + for (Range<?> range : sarg.rangeSet.asRanges()) { + astNodeLst.add(visitLiteral((RexLiteral) rexBuilder.makeLiteral( + range.lowerEndpoint(), literal.getType(), true, true))); Review Comment: Should we rather use `astNodeLst.add(rexBuilder.makeLiteral(...).accept(this))`. The same holds for other places we employ directly `visitCall` or `visitLiteral`. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); Review Comment: I am not sure if it makes sense to consider `HIVE_POINT_LOOKUP_OPTIMIZER_MIN` property at this stage. At this point the OR clauses are already transformed to IN (or rather SEARCH) so we already payed the conversion price (perf) that this property is supposed to guard against. If we now transform back the IN to OR we are gonna spend extra CPU/MEM cost for a transformation that I am not sure if its better. ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java: ########## @@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) { // proceed correctly if we just ignore the <time_unit> astNodeLst.add(call.operands.get(1).accept(this)); break; + case SEARCH: + ASTNode astNode = call.getOperands().get(0).accept(this); + astNodeLst.add(astNode); + RexLiteral literal = (RexLiteral) call.operands.get(1); + Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), "Sarg"); + int minOrClauses = SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN); + + // convert Sarg to IN when they are points. + if (sarg.isPoints()) { + // just expand SEARCH to ORs when point count is less than HIVE_POINT_LOOKUP_OPTIMIZER_MIN + if (sarg.pointCount < minOrClauses) { + return visitCall((RexCall) call.accept(RexUtil.searchShuttle(rexBuilder, null, -1))); + } + + // else convert to IN + for (Range<?> range : sarg.rangeSet.asRanges()) { + astNodeLst.add(visitLiteral((RexLiteral) rexBuilder.makeLiteral( + range.lowerEndpoint(), literal.getType(), true, true))); + } + + return SqlFunctionConverter.buildAST(HiveIn.INSTANCE, astNodeLst, call.getType()); + // Expand SEARCH operator + } else { + return visitCall((RexCall) transformOrToInAndInequalityToBetween( + rexBuilder, + call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)), + minOrClauses + ) Review Comment: The converter is not an appropriate place to perform again transformations that are already done during optimization phase. At this stage we should just focus on transforming the sarg ranges to Hive SQL. We still need to detect that a range is equivalent to `[NOT] BETWEEN` and generate that instead of generating individual comparison operators (=,>,<,>=,<=). One open question that remains is how important it is to detect and generate `[NOT] BETWEEN` predicates at this stage. If we don't need `BETWEEN` for doing some specific physical optimizations then we could possibly skip the BETWEEN detect/transform part. It may not be necessary to answer this question now but leaving a comment so that I don't forget. ########## ql/src/java/org/apache/hadoop/hive/ql/parse/type/RexNodeExprFactory.java: ########## @@ -615,8 +616,9 @@ protected RexNode createConstantExpr(TypeInfo typeInfo, Object constantValue) SqlStdOperatorTable.ROW, operands); } - return rexBuilder.makeLiteral(constantValue, - TypeConverter.convert(typeInfo, rexBuilder.getTypeFactory()), false); + RelDataType finalType = TypeConverter.convert(typeInfo, rexBuilder.getTypeFactory()); + boolean allowCast = finalType.getFamily() == SqlTypeFamily.CHARACTER; + return rexBuilder.makeLiteral(constantValue, finalType, allowCast); Review Comment: Why do we need to change the type casting rules after the upgrade? ########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java: ########## @@ -128,6 +129,11 @@ public HiveJdbcImplementor(SqlDialect dialect, JavaTypeFactory typeFactory) { return result(join, leftResult, rightResult); } +public Result visit(JdbcTableScan scan) { + return result(scan.jdbcTable.tableName(), + ImmutableList.of(Clause.FROM), scan, null); +} + Review Comment: Even though `visit(JdbcTableScan scan)` was removed there is still `visit(TableScan scan)`. If we cannot use the latter the I guess there is a problem lying somewhere else either in Calcite or Hive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org