kgyrtkirk commented on a change in pull request #1439:
URL: https://github.com/apache/hive/pull/1439#discussion_r482090289
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveAggregateJoinTransposeRule.java
##########
@@ -303,6 +305,90 @@ public void onMatch(RelOptRuleCall call) {
}
}
+ /**
+ * Determines weather the give grouping is unique.
+ *
+ * Consider a join which might produce non-unique rows; but later the
results are aggregated again.
+ * This method determines if there are sufficient columns in the grouping
which have been present previously as unique column(s).
+ */
+ private boolean isGroupingUnique(RelNode input, ImmutableBitSet groups) {
+ if (groups.isEmpty()) {
+ return false;
+ }
+ RelMetadataQuery mq = input.getCluster().getMetadataQuery();
+ Set<ImmutableBitSet> uKeys = mq.getUniqueKeys(input);
+ for (ImmutableBitSet u : uKeys) {
+ if (groups.contains(u)) {
+ return true;
+ }
+ }
+ if (input instanceof Join) {
+ Join join = (Join) input;
+ RexBuilder rexBuilder = input.getCluster().getRexBuilder();
+ SimpleConditionInfo cond = new SimpleConditionInfo(join.getCondition(),
rexBuilder);
+
+ if (cond.valid) {
+ ImmutableBitSet newGroup =
groups.intersect(ImmutableBitSet.fromBitSet(cond.fields));
+ RelNode l = join.getLeft();
+ RelNode r = join.getRight();
+
+ int joinFieldCount = join.getRowType().getFieldCount();
+ int lFieldCount = l.getRowType().getFieldCount();
+
+ ImmutableBitSet groupL = newGroup.get(0, lFieldCount);
+ ImmutableBitSet groupR = newGroup.get(lFieldCount,
joinFieldCount).shift(-lFieldCount);
+
+ if (isGroupingUnique(l, groupL)) {
Review comment:
this method does a bit different thing - honestly I feeled like I'm in
trouble when I've given this name to it :)
this method checks if the given columns contain an unique column somewhere
in the covered joins; (this still sound fuzzy) so let's take an example
consider:
```
select c_id, sum(i_prize) from customer c join item i on(i.c_id=c.c_id)
```
* do an aggregate grouping by the column C_ID ; and sum up something
* below is a join which joins by C_ID
* asking wether C_ID is a unique column on top of the join is false; but
there is subtree in which C_ID is unique => so if we push the aggregate on that
branch the aggregation will be a no-op
I think this case is not handled by `areColumnsUnique`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]