Re: [PR] HIVE-27102: Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 [hive]

via GitHub Wed, 29 Jan 2025 08:28:26 -0800


zabetak commented on code in PR #5196:
URL: https://github.com/apache/hive/pull/5196#discussion_r1933775051



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterJoinRule.java:
##########
@@ -66,18 +65,18 @@ public abstract class HiveFilterJoinRule extends 
FilterJoinRule {
   /**
    * Creates a PushFilterPastJoinRule with an explicit root operand.
    */
-  protected HiveFilterJoinRule(RelOptRuleOperand operand, String id, boolean 
smart,
+  protected HiveFilterJoinRule(Config conf, RelOptRuleOperand operand, String 
id, boolean smart,
       RelBuilderFactory relBuilderFactory) {
 
     super(
-      (Config) RelRule.Config.EMPTY
+      conf

Review Comment:
   We can possibly remove the conf arg from the constructor and use directly 
`FilterIntoJoinRule.FilterIntoJoinRuleConfig.DEFAULT` in the body.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java:
##########
@@ -22,8 +22,6 @@
 import org.apache.calcite.rel.RelNode;
 import org.apache.calcite.rel.RelShuttle;
 import org.apache.calcite.rel.core.Filter;
-import org.apache.calcite.rel.core.Join;
-import org.apache.calcite.rel.core.TableScan;

Review Comment:
   nit: Changes seem unrelated to the upgrade.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java:
##########
@@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) {
         switch (call.getKind()) {
           case IN:
             return transformIntoOrAndClause(rexBuilder, call);
+          case SEARCH:
+            return expandSearchAndRemoveRowOperator(rexBuilder, call);
           default:
             break;
         }
       }
       return RexUtil.isFlat(node) ? node : RexUtil.flatten(rexBuilder, node);
     }
+    
+    private RexNode expandSearchAndRemoveRowOperator(RexBuilder rexBuilder, 
RexCall expression) {
+      RexLiteral literal = (RexLiteral) expression.getOperands().get(1);
+      Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+
+      if (sarg.isPoints()) {
+        // if it is a row operator, we'll have 
+        // SEARCH(ROW($1, $2), Sarg[[1, 2], [3, 4]]

Review Comment:
   How do we end-up with `SEARCH(ROW($1, $2), Sarg[[1, 2], [3, 4]]`? What is 
the SQL equivalent?
   
   Why can't we handle it with `RexUtil.searchShuttle`? How will the shuttle 
rewrite the SEARCH operator in the example above?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/JDBCExpandExpressionsRule.java:
##########
@@ -182,12 +191,75 @@ public RexNode visitCall(RexCall inputCall) {
         switch (call.getKind()) {
           case IN:
             return transformIntoOrAndClause(rexBuilder, call);

Review Comment:
   I get the impression that `case IN` is now dead/unreachable code.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -171,7 +175,36 @@ public Double visitCall(RexCall call) {
       break;
     }
 
-    default:
+    case SEARCH:

Review Comment:
   The introduction of SEARCH makes the `case IN` and `case BETWEEN` kinda 
obsolete and probably dead code.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -171,7 +175,36 @@ public Double visitCall(RexCall call) {
       break;
     }
 
-    default:
+    case SEARCH:
+      RexLiteral literal = (RexLiteral) call.operands.get(1);
+      Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+      RexBuilder rexBuilder = childRel.getCluster().getRexBuilder();
+      int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+      // TODO: revisit this to better handle INs

Review Comment:
   Instead of TODOs we should create JIRA tickets that outline the remaining 
work that needs to be done. The ticket should clearly mention if it is optional 
improvement or a regression that needs to be addressed. We can postpone the 
creation of the ticket till we finalize the review for this PR.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ExprNodeConverter.java:
##########
@@ -203,6 +207,32 @@ public ExprNodeDesc visitCall(RexCall call) {
       for (RexNode operand : call.operands) {
         args.add(operand.accept(this));
       }
+    } else if (call.getKind() == SqlKind.SEARCH) {

Review Comment:
   Most of the comments that I did in ASTConverter about the SEARCH operator 
apply also here. The code is also very similar so we should check if we can 
somehow refactor this logic.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########
@@ -322,18 +324,24 @@ public RexNode getExpression(String functionText, 
FunctionInfo fi,
         // unix_timestamp(args) -> to_unix_timestamp(args)
         calciteOp = HiveToUnixTimestampSqlOperator.INSTANCE;
       }
-      expr = rexBuilder.makeCall(returnType, calciteOp, inputs);
+      expr = getExpression(returnType, calciteOp, inputs);
     }
 
     if (expr instanceof RexCall && !expr.isA(SqlKind.CAST)) {
       RexCall call = (RexCall) expr;
       expr = rexBuilder.makeCall(returnType, call.getOperator(),
           RexUtil.flatten(call.getOperands(), call.getOperator()));
     }
-
     return expr;
   }
 
+  private RexNode getExpression(RelDataType returnType, SqlOperator calciteOp, 
List<RexNode> inputs) {
+    if (calciteOp.getKind() == SqlKind.IN) {
+      return rexBuilder.makeIn(inputs.get(0), inputs.subList(1, 
inputs.size()));
+    }
+    return rexBuilder.makeCall(returnType, calciteOp, inputs);
+  }
+

Review Comment:
   The code would be more uniform if instead of overloading the `getExpression` 
method we added the new `if clause` in the existing one. 



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSortPredicates.java:
##########
@@ -198,6 +208,12 @@ public Double visitCall(RexCall call) {
         return null;
       }
 
+      if (call.getKind() == SqlKind.SEARCH) {
+        RexCall expandedCall = (RexCall) RexUtil.expandSearch(rexBuilder, 
null, call);
+
+        return visitCall(expandedCall);
+      }
+

Review Comment:
   I think that the best approach would be to handle SEARCH inside the 
`functionCost` method and not expand it. When we do the expansion we never end 
up with an IN or BETWEEN call so the code for them inside `functionCost` is 
basically dead/unreachable.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1177,4 +1185,633 @@ public static List<Integer> getNewRelDistributionKeys(
     }
     return distributionKeys;
   }
+
+  /**
+   * This class just wraps around a RexNode enables equals/hashCode based on 
toString.
+   * After CALCITE-2632 this might not be needed anymore */
+  public static class RexNodeRef {
+
+    public static Comparator<RexNodeRef> COMPARATOR =
+        Comparator.comparing((RexNodeRef o) -> o.node.toString());
+    private final RexNode node;
+
+    public RexNodeRef(RexNode node) {
+      this.node = node;
+    }
+
+    public RexNode getRexNode() {
+      return node;
+    }
+
+    @Override
+    public int hashCode() {
+      return node.toString().hashCode();
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (o instanceof RexNodeRef) {
+        RexNodeRef otherRef = (RexNodeRef) o;
+        return node.toString().equals(otherRef.node.toString());
+      }
+      return false;
+    }
+
+    @Override
+    public String toString() {
+      return "ref for:" + node.toString();
+    }
+  }
+
+  /**
+   * Represents a constraint.
+   *
+   * Example: a=1
+   * substr(a,1,2) = concat('asd','xxx')
+   */
+  public static class Constraint {
+
+    private final RexNode exprNode;
+    private final RexNode constNode;
+
+    public Constraint(RexNode exprNode, RexNode constNode) {
+      this.exprNode = exprNode;
+      this.constNode = constNode;
+    }
+
+    /**
+     * Interprets argument as a constraint; if not possible returns null.
+     */
+    public static Constraint of(RexNode n) {
+      if (!(n instanceof RexCall)) {
+        return null;
+      }
+      RexCall call = (RexCall) n;
+      if (call.getOperator().getKind() != SqlKind.EQUALS) {
+        return null;
+      }
+      RexNode opA = call.operands.get(0);
+      RexNode opB = call.operands.get(1);
+      if (RexUtil.isNull(opA) || RexUtil.isNull(opB)) {
+        // dont try to compare nulls
+        return null;
+      }
+      if (isConstExpr(opA) && isColumnExpr(opB)) {
+        return new Constraint(opB, opA);
+      }
+      if (isColumnExpr(opA) && isConstExpr(opB)) {
+        return new Constraint(opA, opB);
+      }
+      return null;
+    }
+
+    private static boolean isColumnExpr(RexNode node) {
+      return !node.getType().isStruct() && 
!HiveCalciteUtil.getInputRefs(node).isEmpty()
+          && HiveCalciteUtil.isDeterministic(node);
+    }
+
+    private static boolean isConstExpr(RexNode node) {
+      return !node.getType().isStruct() && 
HiveCalciteUtil.getInputRefs(node).isEmpty()
+          && HiveCalciteUtil.isDeterministic(node);
+    }
+
+    public RexNodeRef getKey() {
+      return new RexNodeRef(exprNode);
+    }
+
+    public RexNode getValue() {
+      return constNode;
+    }
+
+  }
+
+  /**
+   * Merge IN clauses, when possible.
+   */
+  public static class RexMergeInClause extends RexShuttle {
+    private final RexBuilder rexBuilder;
+
+    public RexMergeInClause(RexBuilder rexBuilder) {
+      this.rexBuilder = rexBuilder;
+    }
+
+    @Override public RexNode visitCall(RexCall call) {
+      switch (call.getKind()) {
+        case AND:
+          return handleAND(rexBuilder, call);
+        case OR:
+          return handleOR(rexBuilder, call);
+        default:
+          return super.visitCall(call);
+      }
+    }
+
+    private static RexNode handleAND(RexBuilder rexBuilder, RexCall call) {
+      // Visited nodes
+      final Set<RexNode> visitedRefs = new LinkedHashSet<>();
+      // IN clauses need to be combined by keeping only common elements
+      final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+      // We will use this set to keep those expressions that may evaluate
+      // into a null value.
+      final Multimap<RexNode,RexNode> inLHSExprToRHSNullableExprs = 
LinkedHashMultimap.create();
+      final List<RexNode> operands = new 
ArrayList<>(RexUtil.flattenAnd(call.getOperands()));
+
+      for (int i = 0; i < operands.size(); i++) {
+        RexNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.IN) {
+          RexCall inCall = (RexCall) operand;
+          if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+            continue;
+          }
+          RexNode ref = inCall.getOperands().get(0);
+          visitedRefs.add(ref);
+          if (ref.getType().isNullable()) {
+            inLHSExprToRHSNullableExprs.put(ref, ref);
+          }
+          if (inLHSExprToRHSExprs.containsKey(ref)) {
+            Set<SimilarRexNodeElement> expressions = Sets.newHashSet();
+            for (int j = 1; j < inCall.getOperands().size(); j++) {
+              RexNode constNode = inCall.getOperands().get(j);
+              expressions.add(new SimilarRexNodeElement(constNode));
+              if (constNode.getType().isNullable()) {
+                inLHSExprToRHSNullableExprs.put(ref, constNode);
+              }
+            }
+            Collection<SimilarRexNodeElement> knownConstants = 
inLHSExprToRHSExprs.get(ref);
+            if (!shareSameType(knownConstants, expressions)) {
+              return call;
+            }
+            knownConstants.retainAll(expressions);
+          } else {
+            for (int j = 1; j < inCall.getOperands().size(); j++) {
+              RexNode constNode = inCall.getOperands().get(j);
+              inLHSExprToRHSExprs.put(ref, new 
SimilarRexNodeElement(constNode));
+              if (constNode.getType().isNullable()) {
+                inLHSExprToRHSNullableExprs.put(ref, constNode);
+              }
+            }
+          }
+          operands.remove(i);
+          --i;
+        } else if (operand.getKind() == SqlKind.EQUALS) {
+          Constraint c = Constraint.of(operand);
+          if (c == null || !HiveCalciteUtil.isDeterministic(c.exprNode)) {
+            continue;
+          }
+          visitedRefs.add(c.exprNode);
+          if (c.exprNode.getType().isNullable()) {
+            inLHSExprToRHSNullableExprs.put(c.exprNode, c.exprNode);
+          }
+          if (c.constNode.getType().isNullable()) {
+            inLHSExprToRHSNullableExprs.put(c.exprNode, c.constNode);
+          }
+          if (inLHSExprToRHSExprs.containsKey(c.exprNode)) {
+            Collection<SimilarRexNodeElement> knownConstants = 
inLHSExprToRHSExprs.get(c.exprNode);
+            Collection<SimilarRexNodeElement> nextConstant = 
Collections.singleton(new SimilarRexNodeElement(c.constNode));
+            if (!shareSameType(knownConstants, nextConstant)) {
+              return call;
+            }
+            knownConstants.retainAll(nextConstant);
+          } else {
+            inLHSExprToRHSExprs.put(c.exprNode, new 
SimilarRexNodeElement(c.constNode));
+          }
+          operands.remove(i);
+          --i;
+        }
+      }
+      // Create IN clauses
+      final List<RexNode> newOperands = createInClauses(rexBuilder,
+          visitedRefs, inLHSExprToRHSExprs, inLHSExprToRHSNullableExprs);
+      newOperands.addAll(operands);
+      // Return node
+      return RexUtil.composeConjunction(rexBuilder, newOperands, false);
+    }
+
+    protected static class SimilarRexNodeElement {
+      private final RexNode rexNode;
+
+      protected SimilarRexNodeElement(RexNode rexNode) {
+        this.rexNode = rexNode;
+      }
+
+      public RexNode getRexNode() {
+        return rexNode;
+      }
+
+      @Override
+      public boolean equals(Object o) {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+        SimilarRexNodeElement that = (SimilarRexNodeElement) o;
+        return equalsWithSimilarType(rexNode, that.rexNode);
+      }
+
+      private static boolean equalsWithSimilarType(RexNode rexNode1, RexNode 
rexNode2) {
+        if (!(rexNode1 instanceof RexLiteral) || !(rexNode2 instanceof 
RexLiteral)) {
+          return rexNode1.equals(rexNode2);
+        }
+
+        RexLiteral rexLiteral1 = (RexLiteral) rexNode1;
+        RexLiteral rexLiteral2 = (RexLiteral) rexNode2;
+
+        if (rexLiteral1.getValue() == null && rexLiteral2.getValue() == null) {
+          return true;
+        }
+
+        return rexLiteral1.getValue() != null && 
rexLiteral1.getValue().compareTo(rexLiteral2.getValue()) == 0 &&
+            
rexLiteral1.getType().getSqlTypeName().equals(rexLiteral2.getType().getSqlTypeName());
+      }
+
+      @Override
+      public int hashCode() {
+        if (rexNode instanceof RexLiteral) {
+          RexLiteral rexLiteral = (RexLiteral) rexNode;
+          return Objects.hash(rexLiteral.getValue(), 
rexLiteral.getType().getSqlTypeName());
+        }
+        return Objects.hash(rexNode);
+      }
+    }
+
+    /**
+     * Check if the type of nodes in the two collections is homogeneous within 
the collections
+     * and identical between them.
+     * @param nodes1 the first collection of nodes
+     * @param nodes2 the second collection of nodes
+     * @return true if nodes in both collections is unique and identical, 
false otherwise
+     */
+    private static boolean shareSameType(
+        Collection<SimilarRexNodeElement> nodes1, 
Collection<SimilarRexNodeElement> nodes2) {
+      return Stream.of(nodes1, nodes2).flatMap(Collection::stream)
+          .map(n -> n.getRexNode().getType().getSqlTypeName())
+          .distinct()
+          .count() == 1;
+    }
+
+    private static RexNode handleOR(RexBuilder rexBuilder, RexCall call) {
+      // IN clauses need to be combined by keeping all elements
+      final List<RexNode> operands = new 
ArrayList<>(RexUtil.flattenOr(call.getOperands()));
+      final Multimap<RexNode,SimilarRexNodeElement> inLHSExprToRHSExprs = 
LinkedHashMultimap.create();
+      for (int i = 0; i < operands.size(); i++) {
+        RexNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.IN) {
+          RexCall inCall = (RexCall) operand;
+          if (!HiveCalciteUtil.isDeterministic(inCall.getOperands().get(0))) {
+            continue;
+          }
+          RexNode ref = inCall.getOperands().get(0);
+          for (int j = 1; j < inCall.getOperands().size(); j++) {
+            inLHSExprToRHSExprs.put(ref, new 
SimilarRexNodeElement(inCall.getOperands().get(j)));
+          }
+          operands.remove(i);
+          --i;
+        }
+      }
+      // Create IN clauses (fourth parameter is not needed since no 
expressions were removed)
+      final List<RexNode> newOperands = createInClauses(rexBuilder,
+          inLHSExprToRHSExprs.keySet(), inLHSExprToRHSExprs, null);
+      newOperands.addAll(operands);
+      // Return node
+      RexNode result = RexUtil.composeDisjunction(rexBuilder, newOperands, 
false);
+      if (!result.getType().equals(call.getType())) {
+        return rexBuilder.makeCast(call.getType(), result, true);
+      }
+      return result;
+    }
+
+    private static RexNode createResultFromEmptySet(RexBuilder rexBuilder,
+                                                    RexNode ref, 
Multimap<RexNode, RexNode> inLHSExprToRHSNullableExprs) {
+      if (inLHSExprToRHSNullableExprs.containsKey(ref)) {
+        // We handle possible null values in the expressions.
+        List<RexNode> nullableExprs =
+            inLHSExprToRHSNullableExprs.get(ref)
+                .stream()
+                .map(n -> rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, 
ImmutableList.of(n)))
+                .collect(Collectors.toList());
+        return RexUtil.composeConjunction(rexBuilder,
+            ImmutableList.of(
+                RexUtil.composeDisjunction(rexBuilder, nullableExprs, false),
+                
rexBuilder.makeNullLiteral(rexBuilder.getTypeFactory().createSqlType(SqlTypeName.BOOLEAN))),
+            false);
+      }
+      return rexBuilder.makeLiteral(false);
+    }
+
+    private static List<RexNode> createInClauses(RexBuilder rexBuilder, 
Set<RexNode> visitedRefs,
+                                                 Multimap<RexNode, 
SimilarRexNodeElement> inLHSExprToRHSExprs, Multimap<RexNode,RexNode> 
inLHSExprToRHSNullableExprs) {
+      final List<RexNode> newExpressions = new ArrayList<>();
+      for (RexNode ref : visitedRefs) {
+        Collection<SimilarRexNodeElement> exprs = inLHSExprToRHSExprs.get(ref);
+        if (exprs.isEmpty()) {
+          // Note that Multimap does not keep a key if all its values are 
removed.
+          newExpressions.add(createResultFromEmptySet(rexBuilder, ref, 
inLHSExprToRHSNullableExprs));
+        } else if (exprs.size() == 1) {
+          List<RexNode> newOperands = new ArrayList<>(2);
+          newOperands.add(ref);
+          newOperands.add(exprs.iterator().next().getRexNode());
+          newExpressions.add(rexBuilder.makeCall(SqlStdOperatorTable.EQUALS, 
newOperands));
+        } else {
+          List<RexNode> newOperands = new ArrayList<>(exprs.size() + 1);
+          newOperands.add(ref);
+          
newOperands.addAll(exprs.stream().map(SimilarRexNodeElement::getRexNode).collect(Collectors.toList()));
+          newExpressions.add(rexBuilder.makeIn(newOperands.get(0), 
newOperands.subList(1, newOperands.size())));
+        }
+      }
+      return newExpressions;
+    }
+
+  }
+
+  /**
+   * Transforms inequality candidates into [NOT] BETWEEN calls.
+   *
+   */
+  public static class RexTransformIntoBetween extends RexShuttle {
+    private final RexBuilder rexBuilder;
+
+    public RexTransformIntoBetween(RexBuilder rexBuilder) {
+      this.rexBuilder = rexBuilder;
+    }
+
+    @Override
+    public RexNode visitCall(RexCall inputCall) {
+      RexNode node = super.visitCall(inputCall);
+      if (node instanceof RexCall) {
+        RexCall call = (RexCall) node;
+        switch (call.getKind()) {
+          case AND:
+            return processComparisons(call, SqlKind.LESS_THAN_OR_EQUAL, false);
+          case OR:
+            return processComparisons(call, SqlKind.GREATER_THAN, true);
+          default:
+            break;
+        }
+      }
+      return node;
+    }
+
+    /**
+     * Represents a replacement candidate.
+     */
+    static class BetweenCandidate {
+
+      private final RexNode newNode;
+      private final RexNode[] oldNodes;
+      // keeps track if this candidate was already used during replacement
+      private boolean used;
+
+      public BetweenCandidate(RexNode newNode, RexNode... oldNodes) {
+        this.newNode = newNode;
+        this.oldNodes = oldNodes;
+      }
+    }
+
+    private RexNode processComparisons(RexCall call, SqlKind forwardEdge, 
boolean invert) {
+      DiGraph<RexNodeRef, RexCall> g =
+          buildComparisonGraph(call.getOperands(), forwardEdge);
+      Map<RexNode, BetweenCandidate> replacedNodes = new IdentityHashMap<>();
+      for (RexNodeRef n : g.nodes()) {
+        Set<RexNodeRef> pred = g.predecessors(n);
+        Set<RexNodeRef> succ = g.successors(n);
+        if (!pred.isEmpty() && !succ.isEmpty()) {
+          RexNodeRef p = pred.iterator().next();
+          RexNodeRef s = succ.iterator().next();
+
+          RexNode between = rexBuilder.makeCall(HiveBetween.INSTANCE,
+              rexBuilder.makeLiteral(invert), n.getRexNode(), p.getRexNode(), 
s.getRexNode());
+          BetweenCandidate bc = new BetweenCandidate(
+              between,
+              g.removeEdge(p, n),
+              g.removeEdge(n, s));
+
+          for (RexNode node : bc.oldNodes) {
+            replacedNodes.put(node, bc);
+          }
+        }
+      }
+      if (replacedNodes.isEmpty()) {
+        // no effect
+        return call;
+      }
+      List<RexNode> newOperands = new ArrayList<>();
+      for (RexNode o : call.getOperands()) {
+        BetweenCandidate candidate = replacedNodes.get(o);
+        if (candidate == null) {
+          newOperands.add(o);
+        } else {
+          if (!candidate.used) {
+            newOperands.add(candidate.newNode);
+            candidate.used = true;
+          }
+        }
+      }
+
+      if (newOperands.size() == 1) {
+        return newOperands.get(0);
+      } else {
+        return rexBuilder.makeCall(call.getOperator(), newOperands);
+      }
+    }
+
+    /**
+     * Builds a graph of the given comparison type.
+     * The graph edges are annotated with the RexNodes representing the 
comparison.
+     */
+    private DiGraph<RexNodeRef, RexCall> buildComparisonGraph(List<RexNode> 
operands, SqlKind cmpForward) {
+      DiGraph<RexNodeRef, RexCall> g = new DiGraph<>();
+      for (RexNode node : operands) {
+        if(!(node instanceof RexCall) ) {
+          continue;
+        }
+        RexCall rexCall = (RexCall) node;
+        SqlKind kind = rexCall.getKind();
+        if (kind == cmpForward) {
+          RexNode opA = rexCall.getOperands().get(0);
+          RexNode opB = rexCall.getOperands().get(1);
+          g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall);
+        } else if (kind == cmpForward.reverse()) {
+          RexNode opA = rexCall.getOperands().get(1);
+          RexNode opB = rexCall.getOperands().get(0);
+          g.putEdgeValue(new RexNodeRef(opA), new RexNodeRef(opB), rexCall);
+        }
+      }
+      return g;
+    }
+  }
+
+  /**
+   * Transforms OR clauses into IN clauses, when possible.
+   */
+  public static class RexTransformIntoInClause extends RexShuttle {
+    private final RexBuilder rexBuilder;
+    private final int minNumORClauses;
+
+    public RexTransformIntoInClause(RexBuilder rexBuilder, int 
minNumORClauses) {
+      this.rexBuilder = rexBuilder;
+      this.minNumORClauses = minNumORClauses;
+    }
+
+    @Override
+    public RexNode visitCall(RexCall inputCall) {
+      RexNode node = super.visitCall(inputCall);
+      if (node instanceof RexCall) {
+        RexCall call = (RexCall) node;
+        if (call.getKind() == SqlKind.OR) {
+          try {
+            RexNode newNode = transformIntoInClauseCondition(rexBuilder,
+                call, minNumORClauses);
+            if (newNode != null) {
+              return newNode;
+            }
+          } catch (SemanticException e) {
+            LOG.error("Exception in HivePointLookupOptimizerRule", e);
+            return call;
+          }
+        }
+      }
+      return node;
+    }
+
+    /**
+     * A group of Constraints.
+     *
+     * Examples:
+     *  (a=1 && b=1)
+     *  (a=1)
+     *
+     * Note: any rexNode is accepted as constraint; but it might be keyed with 
the empty key;
+     * which means it can't be parsed as a constraint for some reason; but for 
completeness...
+     *
+     */
+    static class ConstraintGroup {
+      private final Map<RexNodeRef, Constraint> constraints = new HashMap<>();
+      private final RexNode originalRexNode;
+      private final Set<RexNodeRef> key;
+
+      public ConstraintGroup(RexNode rexNode) {
+        originalRexNode = rexNode;
+
+        final List<RexNode> conjunctions = RelOptUtil.conjunctions(rexNode);
+
+        for (RexNode n : conjunctions) {
+
+          Constraint c = Constraint.of(n);
+          if (c == null) {
+            // interpretation failed; make this node opaque
+            key = Collections.emptySet();
+            return;
+          }
+          constraints.put(c.getKey(), c);
+        }
+        if (constraints.size() != conjunctions.size()) {
+          LOG.debug("unexpected situation; giving up on this branch");
+          key = Collections.emptySet();
+          return;
+        }
+        key = constraints.keySet();
+      }
+
+      public List<RexNode> getValuesInOrder(List<RexNodeRef> columns) throws 
SemanticException {
+        List<RexNode> ret = new ArrayList<>();
+        for (RexNodeRef rexInputRef : columns) {
+          Constraint constraint = constraints.get(rexInputRef);
+          if (constraint == null) {
+            throw new SemanticException("Unable to find constraint which was 
earlier added.");
+          }
+          ret.add(constraint.getValue());
+        }
+        return ret;
+      }
+    }
+
+    private RexNode transformIntoInClauseCondition(RexBuilder rexBuilder, 
RexNode condition,
+                                                   int minNumORClauses) throws 
SemanticException {
+      assert condition.getKind() == SqlKind.OR;
+
+      ImmutableList<RexNode> operands = RexUtil.flattenOr(((RexCall) 
condition).getOperands());
+      if (operands.size() < minNumORClauses) {
+        // We bail out
+        return null;
+      }
+      List<ConstraintGroup> allNodes = new ArrayList<>();
+      List<ConstraintGroup> processedNodes = new ArrayList<>();
+      for (int i = 0; i < operands.size(); i++) {
+        ConstraintGroup m = new ConstraintGroup(operands.get(i));
+        allNodes.add(m);
+      }
+
+      Multimap<Set<RexNodeRef>, ConstraintGroup> assignmentGroups =
+          Multimaps.index(allNodes, cg -> cg.key);
+
+      for (Map.Entry<Set<RexNodeRef>, Collection<ConstraintGroup>> sa : 
assignmentGroups.asMap().entrySet()) {
+        // skip opaque
+        if (sa.getKey().isEmpty()) {
+          continue;
+        }
+        // not enough equalities should not be handled
+        if (sa.getValue().size() < 2 || sa.getValue().size() < 
minNumORClauses) {
+          continue;
+        }
+
+        allNodes.add(new ConstraintGroup(buildInFor(sa.getKey(), 
sa.getValue())));
+        processedNodes.addAll(sa.getValue());
+      }
+
+      if (processedNodes.isEmpty()) {
+        return null;
+      }
+      allNodes.removeAll(processedNodes);
+      List<RexNode> ops = new ArrayList<>();
+      for (ConstraintGroup mx : allNodes) {
+        ops.add(mx.originalRexNode);
+      }
+      if (ops.size() == 1) {
+        return ops.get(0);
+      } else {
+        return rexBuilder.makeCall(SqlStdOperatorTable.OR, ops);
+      }
+
+    }
+
+    private RexNode buildInFor(Set<RexNodeRef> set, 
Collection<ConstraintGroup> value) throws SemanticException {
+
+      List<RexNodeRef> columns = new ArrayList<>(set);
+      columns.sort(RexNodeRef.COMPARATOR);
+      List<RexNode >operands = new ArrayList<>();
+
+      List<RexNode> columnNodes = 
columns.stream().map(RexNodeRef::getRexNode).collect(Collectors.toList());
+      operands.add(useStructIfNeeded(columnNodes));
+      for (ConstraintGroup node : value) {
+        List<RexNode> values = node.getValuesInOrder(columns);
+        operands.add(useStructIfNeeded(values));
+      }
+
+      return rexBuilder.makeIn(operands.get(0), operands.subList(1, 
operands.size()));
+    }
+
+    private RexNode useStructIfNeeded(List<? extends RexNode> columns) {
+      // Create STRUCT clause
+      if (columns.size() == 1) {
+        return columns.get(0);
+      } else {
+        return rexBuilder.makeCall(SqlStdOperatorTable.ROW, columns);
+      }
+    }
+
+  }
+
+  public static RexNode transformOrToInAndInequalityToBetween(
+      RexBuilder rexBuilder, RexNode condition, int minNumORClauses) {
+    // 1. We try to transform possible candidates
+    RexTransformIntoInClause transformIntoInClause = new 
RexTransformIntoInClause(rexBuilder, minNumORClauses);
+    RexNode newCondition = transformIntoInClause.apply(condition);
+
+    // 2. We merge IN expressions
+    RexMergeInClause mergeInClause = new RexMergeInClause(rexBuilder);
+    newCondition = mergeInClause.apply(newCondition);
+
+    // 3. Close BETWEEN expressions if possible
+    RexTransformIntoBetween t = new RexTransformIntoBetween(rexBuilder);
+    newCondition = t.apply(newCondition);
+    return newCondition;
+  }

Review Comment:
   There is quite some overlap with the functionality of RexSimplify and the 
capabilities of the `SEARCH` operator. If we commit this code as it is we will 
need to create some follow-up tickets for cleanup.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HivePushdownSnapshotFilterRule.java:
##########
@@ -80,7 +82,12 @@ private HivePushdownSnapshotFilterRule(Config config) {
   @Override
   public void onMatch(RelOptRuleCall call) {
     HiveFilter filter = call.rel(0);
-    RexNode newCondition = filter.getCondition().accept(new 
SnapshotIdShuttle(call.builder().getRexBuilder(), call.getMetadataQuery(), 
filter));
+    RexNode expandedCondition = RexUtil.expandSearch(
+        filter.getCluster().getRexBuilder(), null, filter.getCondition()
+    );

Review Comment:
   The fact that we expand the SEARCH in this rule may be OK but somewhat 
surprising. 
   
   Firstly, I was not expecting to have a SEARCH operator over the `snapshotId` 
virtual column since we are supposed to have a single condition in the filter 
that is `snapshotId <= 12345677899`. I was under the impression that simple 
inequality conditions are not transformed to SEARCH.
   
   Secondly, assuming that for whatever reason we end-up with a SEARCH operator 
then it should be strictly in the form `SEARCH($snapshotId, Sarg[-∞, 
12345677899])` so I guess we could handle it explicitly in the 
`SnapshotIdShuttle`; to be discussed if its worth it.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+        // convert Sarg to IN when they are points.
+        if (sarg.isPoints()) {

Review Comment:
   I am wondering if we need to handle also 
org.apache.calcite.util.Sarg#isComplementedPoints to create NOT IN. Let's check 
the implementation in 
[SqlImplementor](https://github.com/apache/calcite/blob/5c7ead4ba7f29dbf5412fb2fb551fef01f83ce62/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L922)
 to see if there is anything else that we missing.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -171,7 +175,36 @@ public Double visitCall(RexCall call) {
       break;
     }
 
-    default:
+    case SEARCH:
+      RexLiteral literal = (RexLiteral) call.operands.get(1);
+      Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+      RexBuilder rexBuilder = childRel.getCluster().getRexBuilder();
+      int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+      // TODO: revisit this to better handle INs
+      if (sarg.isPoints()) {
+        selectivity = computeFunctionSelectivity(call);
+        if (selectivity != null) {
+          selectivity = selectivity * sarg.pointCount;
+          if (selectivity <= 0.0) {
+            selectivity = 0.10;
+          } else if (selectivity >= 1.0) {
+            selectivity = 1.0;
+          }
+        }
+        break;
+
+      } else {
+        return  visitCall((RexCall) 
+            transformOrToInAndInequalityToBetween(

Review Comment:
   This is not an appropriate place to perform transformations. This code 
should just estimate the cost based on the expressions it gets as an input.
   
   I understand that this is an attempt to restore/retain the old cost behavior 
but doing back and forth transformations is costly and fragile. The sarg 
argument is basically a set of ranges so we could probably come up with the 
same cost by reusing the code in `computeBetweenPredicateSelectivity` and 
`computeDisjunctionSelectivity`.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveRowIsDeletedPropagator.java:
##########
@@ -264,9 +263,10 @@ public RelNode visit(HiveJoin join, Context context) {
     projectNames.add(ANY_INSERTED_COLUMN_NAME);
 
     // Create input refs to derived expressions in project
-    RelDataType boolIntType = 
relBuilder.getTypeFactory().createSqlType(SqlTypeName.BOOLEAN);
-    RexNode anyDeleted = rexBuilder.makeInputRef(boolIntType, projects.size() 
- 2);
-    RexNode anyInserted = rexBuilder.makeInputRef(boolIntType, projects.size() 
- 1);
+    int anyDeletedIndex = projects.size() - 2;
+    int anyInsertedIndex = projects.size() - 1;
+    RexNode anyDeleted = 
rexBuilder.makeInputRef(projects.get(anyDeletedIndex).getType(), 
anyDeletedIndex);
+    RexNode anyInserted = 
rexBuilder.makeInputRef(projects.get(anyInsertedIndex).getType(), 
anyInsertedIndex);

Review Comment:
   Is this refactoring necessary/required for the upgrade?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -869,6 +875,12 @@ public ASTNode visitInputRef(RexInputRef inputRef) {
     @Override
     public ASTNode visitLiteral(RexLiteral literal) {
 
+      if (!RexUtil.isNull(literal) && literal.getType().isStruct()) {
+        return rexBuilder
+            .makeCall(SqlStdOperatorTable.ROW, (List<? extends RexNode>) 
literal.getValue())
+            .accept(this);
+      }
+

Review Comment:
   Why we didn't have STRUCT literals before? Why we didn't need to handle them 
here?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1067,9 +1079,17 @@ public ASTNode visitCall(RexCall call) {
         return SqlFunctionConverter.buildAST(SqlStdOperatorTable.NOT,
           
Collections.singletonList(SqlFunctionConverter.buildAST(SqlStdOperatorTable.IS_NOT_DISTINCT_FROM,
 astNodeLst, call.getType())), call.getType());
       case CAST:
-        assert(call.getOperands().size() == 1);
+        if (call.getOperands().size() != 1) {
+          throw new IllegalArgumentException("CASTs should have only 1 
operand");
+        }
+        RexNode castOperand = call.getOperands().get(0);
+        
+        // Extract RexNode out of CAST when it's not a literal and the types 
are equal
+        if (!(castOperand instanceof RexLiteral) && 
call.getType().equals(castOperand.getType())) {
+          return castOperand.accept(this);
+        }

Review Comment:
   This doesn't seem like the appropriate place to do such simplifications. 
They are usually performed by Calcite (RexSimplify/RexBuilder/RexExecutor) when 
it is valid to do so. Why did we opt to add this here?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+        // convert Sarg to IN when they are points.
+        if (sarg.isPoints()) {
+          // just expand SEARCH to ORs when point count is less than 
HIVE_POINT_LOOKUP_OPTIMIZER_MIN
+          if (sarg.pointCount < minOrClauses) {
+            return visitCall((RexCall) 
call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)));
+          }
+
+          // else convert to IN
+          for (Range<?> range : sarg.rangeSet.asRanges()) {
+            astNodeLst.add(visitLiteral((RexLiteral) rexBuilder.makeLiteral(
+                    range.lowerEndpoint(), literal.getType(), true, true)));

Review Comment:
   Should we rather use 
`astNodeLst.add(rexBuilder.makeLiteral(...).accept(this))`. The same holds for 
other places we employ directly `visitCall` or `visitLiteral`.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);

Review Comment:
   I am not sure if it makes sense to consider 
`HIVE_POINT_LOOKUP_OPTIMIZER_MIN` property at this stage. At this point the OR 
clauses are already transformed to IN (or rather SEARCH) so we already payed 
the conversion price (perf)  that this property is supposed to guard against. 
If we now transform back the IN to OR we are gonna spend extra CPU/MEM cost for 
a transformation that I am not sure if its better.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java:
##########
@@ -1078,6 +1098,36 @@ public ASTNode visitCall(RexCall call) {
         // proceed correctly if we just ignore the <time_unit>
         astNodeLst.add(call.operands.get(1).accept(this));
         break;
+      case SEARCH:
+        ASTNode astNode = call.getOperands().get(0).accept(this);
+        astNodeLst.add(astNode);
+        RexLiteral literal = (RexLiteral) call.operands.get(1);
+        Sarg<?> sarg = Objects.requireNonNull(literal.getValueAs(Sarg.class), 
"Sarg");
+        int minOrClauses = 
SessionState.getSessionConf().getIntVar(HiveConf.ConfVars.HIVE_POINT_LOOKUP_OPTIMIZER_MIN);
+
+        // convert Sarg to IN when they are points.
+        if (sarg.isPoints()) {
+          // just expand SEARCH to ORs when point count is less than 
HIVE_POINT_LOOKUP_OPTIMIZER_MIN
+          if (sarg.pointCount < minOrClauses) {
+            return visitCall((RexCall) 
call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)));
+          }
+
+          // else convert to IN
+          for (Range<?> range : sarg.rangeSet.asRanges()) {
+            astNodeLst.add(visitLiteral((RexLiteral) rexBuilder.makeLiteral(
+                    range.lowerEndpoint(), literal.getType(), true, true)));
+          }
+          
+          return SqlFunctionConverter.buildAST(HiveIn.INSTANCE, astNodeLst, 
call.getType());
+          // Expand SEARCH operator
+        } else {
+          return visitCall((RexCall) transformOrToInAndInequalityToBetween(
+                  rexBuilder,
+                  call.accept(RexUtil.searchShuttle(rexBuilder, null, -1)),
+                  minOrClauses
+              )

Review Comment:
   The converter is not an appropriate place to perform again transformations 
that are already done during optimization phase. At this stage we should just 
focus on transforming the sarg ranges to Hive SQL.
   
   We still need to detect that a range is equivalent to `[NOT] BETWEEN` and 
generate that instead of generating individual comparison operators 
(=,>,<,>=,<=). 
   
   One open question that remains is how important it is to detect and generate 
`[NOT] BETWEEN` predicates at this stage. If we don't need `BETWEEN` for doing 
some specific physical optimizations then we could possibly skip the BETWEEN 
detect/transform part. It may not be necessary to answer this question now but 
leaving a comment so that I don't forget.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/RexNodeExprFactory.java:
##########
@@ -615,8 +616,9 @@ protected RexNode createConstantExpr(TypeInfo typeInfo, 
Object constantValue)
           SqlStdOperatorTable.ROW,
           operands);
     }
-    return rexBuilder.makeLiteral(constantValue,
-        TypeConverter.convert(typeInfo, rexBuilder.getTypeFactory()), false);
+    RelDataType finalType = TypeConverter.convert(typeInfo, 
rexBuilder.getTypeFactory());
+    boolean allowCast = finalType.getFamily() == SqlTypeFamily.CHARACTER;
+    return rexBuilder.makeLiteral(constantValue, finalType, allowCast);

Review Comment:
   Why do we need to change the type casting rules after the upgrade?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java:
##########
@@ -128,6 +129,11 @@ public HiveJdbcImplementor(SqlDialect dialect, 
JavaTypeFactory typeFactory) {
     return result(join, leftResult, rightResult);
   }
 
+public Result visit(JdbcTableScan scan) {
+  return result(scan.jdbcTable.tableName(),
+      ImmutableList.of(Clause.FROM), scan, null);
+}
+

Review Comment:
   Even though `visit(JdbcTableScan scan)` was removed there is still 
`visit(TableScan scan)`. If we cannot use the latter the I guess there is a 
problem lying somewhere else either in Calcite or Hive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-27102: Upgrade Calcite to 1.33.0 and Avatica to 1.23.0 [hive]

Reply via email to