tjbanghart commented on code in PR #4544:
URL: https://github.com/apache/calcite/pull/4544#discussion_r2445609682
##########
core/src/test/java/org/apache/calcite/test/RelBuilderTest.java:
##########
@@ -5585,6 +5590,284 @@ private static RelNode
buildCorrelateWithJoin(JoinRelType type, RelBuilder build
assertThat(root, hasTree(expected));
}
+
+ @Test void testCombine() {
+ // Create two separate queries
+ final RelBuilder builder = RelBuilder.create(config().build());
+ RelNode scan1 = builder.scan("EMP")
+ .filter(builder.equals(builder.field("DEPTNO"), builder.literal(10)))
+ .project(builder.field("ENAME"))
+ .build();
+ RelNode scan2 = builder.scan("EMP")
+ .filter(builder.equals(builder.field("DEPTNO"), builder.literal(20)))
+ .project(builder.field("SAL"))
+ .build();
+
+ // Test combine with varargs
+ RelNode combine1 = builder.combine(scan1, scan2).build();
+ assertThat(combine1, instanceOf(Combine.class));
+ assertThat(combine1.getInputs(), hasSize(2));
+
+ // Test combine with Iterable
+ RelNode combine2 = builder.combine(Arrays.asList(scan1, scan2)).build();
+ assertThat(combine2, instanceOf(Combine.class));
+ assertThat(combine2.getInputs(), hasSize(2));
+
+ // Test combine with stack
+ builder.push(scan1)
+ .push(scan2);
+ RelNode combine3 = builder.combine(2).build();
+ assertThat(combine3, instanceOf(Combine.class));
+ assertThat(((Combine) combine3).getInputs(), hasSize(2));
+
+ // Test combine with all stack
+ builder.clear();
+ builder.push(scan1)
+ .push(scan2);
+ RelNode combine4 = builder.combine().build();
+ assertThat(combine4, instanceOf(Combine.class));
+ assertThat(combine4.getInputs(), hasSize(2));
+ }
+
+ @Test void testCombineWithSharedSubexpression() {
+ // Test that demonstrates how Combine can optimize queries with shared
subexpressions
+ final RelBuilder builder = RelBuilder.create(config().build());
+
+ // Create a common subexpression - employees with salary > 1000
+ builder.scan("EMP")
+ .filter(
+ builder.call(SqlStdOperatorTable.GREATER_THAN,
+ builder.field("SAL"), builder.literal(1000)));
+ RelNode commonSubexpr = builder.build();
+
+ // Query 1: Count high earners by department
+ RelNode query1 = builder.push(commonSubexpr)
+ .aggregate(builder.groupKey("DEPTNO"), builder.count())
+ .build();
+
+ // Query 2: Get names of high earners
+ RelNode query2 = builder.push(commonSubexpr)
+ .project(builder.field("ENAME"), builder.field("SAL"))
+ .build();
+
+ // Combine both queries - planner can recognize shared subexpression
+ RelNode combined = builder.combine(query1, query2).build();
+
+ assertThat(combined, instanceOf(Combine.class));
+ assertThat(combined.getInputs(), hasSize(2));
+ // Both queries share the same filter as their base
Review Comment:
These are great questions, and I’ll do my best to answer them. I’ll note
that in this PR, my goal is simply to introduce the operator so that folks
(myself included) can start thinking about more interesting problems where
`Combine` could be made useful.
That said, I now see that this isn’t a particularly useful test case. Here
we’ve explicitly created a shared expression and then built two queries off of
it. In this situation, `Spool` would have been the better choice since we’ve
already identified the shared expression.
`Spool` will likely be a component of any well-optimized plan involving
`Combine`. In particular, `Combine` should help identify opportunities where
`Spool` can be leveraged across multiple queries. Even in this test case, a
good plan would detect the repeated scan -> filter and replace one of them with
a `Spool` reference. We might even consider making the scan -> filter the root
(i.e., pulling it above `Combine`) and then have both distinct queries act as
`Spool` dependents.
I agree that `Combine` makes more sense as a transparent operator, and I
think the natural follow-up to this PR is to introduce a way to wrap multiple
expressions in a `Combine`.
[CALCITE-6188](https://issues.apache.org/jira/browse/CALCITE-6188) proposes
some novel SQL syntax to enforce a `Combine`, but I believe there’s
lower-hanging fruit we could tackle first. For example, we could start with a
single query containing multiple CTEs, wrap them all in a `Combine`, and
ideally detect shared expressions. It would make sense to have a simple planner
rule that could look for trivial equality of sub-expressions at first. I am
very excited to see if `RelCommonExpressionSuggester` could help with this task
([CALCITE-7111](https://issues.apache.org/jira/browse/CALCITE-7111)).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]