[GitHub] [hive] kgyrtkirk commented on a change in pull request #1439: HIVE-24084 cost aggr

GitBox Wed, 02 Sep 2020 06:59:03 -0700


kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r482090289




##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##########
@@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) {
     }
   }
 
+  /**
+   * Determines weather the give grouping is unique.
+   *
+   * Consider a join which might produce non-unique rows; but later the 
results are aggregated again.
+   * This method determines if there are sufficient columns in the grouping 
which have been present previously as unique column(s).
+   */
+  private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) {
+    if (groups.isEmpty()) {
+      return false;
+    }
+    RelMetadataQuery mq = input.getCluster().getMetadataQuery();
+    Set<ImmutableBitSet> uKeys = mq.getUniqueKeys(input);
+    for (ImmutableBitSet u : uKeys) {
+      if (groups.contains(u)) {
+        return true;
+      }
+    }
+    if (input instanceof Join) {
+      Join join = (Join) input;
+      RexBuilder rexBuilder = input.getCluster().getRexBuilder();
+      SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(), 
rexBuilder);
+
+      if (cond.valid) {
+        ImmutableBitSet newGroup = 
groups.intersect(ImmutableBitSet.fromBitSet(cond.fields));
+        RelNode l = join.getLeft();
+        RelNode r = join.getRight();
+
+        int joinFieldCount = join.getRowType().getFieldCount();
+        int lFieldCount = l.getRowType().getFieldCount();
+
+        ImmutableBitSet groupL = newGroup.get(0, lFieldCount);
+        ImmutableBitSet groupR = newGroup.get(lFieldCount, 
joinFieldCount).shift(-lFieldCount);
+
+        if (isGroupingUnique(l, groupL)) {

Review comment:
       this method does a bit different thing - honestly I feeled like I'm in 
trouble when I've given this name to it :)
   
   this method checks if the given columns contain an unique column somewhere 
in the covered joins; (this still sound fuzzy) so let's take an example
   
   consider:
   ```
   select c_id, sum(i_prize) from customer c join item i on(i.c_id=c.c_id)
   ```
   
   * do an aggregate grouping by the column C_ID  ; and sum up something 
   * below is a join which joins by C_ID
   * asking wether C_ID  is a unique column on top of the join is false; but 
there is subtree in which C_ID is unique => so if we push the aggregate on that 
branch the aggregation will be a no-op
   
   I think this case is not handled by `areColumnsUnique`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] kgyrtkirk commented on a change in pull request #1439: HIVE-24084 cost aggr

Reply via email to