[
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728893#comment-17728893
]
Rong Rong commented on CALCITE-5740:
------------------------------------
hmm. IIUC the PROJECT_TO_SEMI_JOIN doesn't actually touches the PROJECT node
above.
the difference between PROJECT_TO_SEMI_JOIN and JOIN_TO_SEMI_JOIN is basically
only to see if the "PROJECT" above the JOIN actually touches any fields from
the right-side aggregate;
- if the PROJECT doesn't exist then it uses the `isEmptyAggregate(aggregate)`
check to determine whether to convert a JOIN to SEMI_JOIN. see:
https://github.com/walterddr/calcite/blob/3817b0e42c07a6b185f3c1b921f648ff28e8a3b7/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L84-L87
- if the PROJECT exist it check if the project field bits intersects with the
right aggregate bits, see:
https://github.com/walterddr/calcite/blob/3817b0e42c07a6b185f3c1b921f648ff28e8a3b7/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L80-L82
- after the rule being applied to convert join to semi-join the project seems
to be put back?
https://github.com/walterddr/calcite/blob/3817b0e42c07a6b185f3c1b921f648ff28e8a3b7/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L126-L128C1
with this logic I dont think PROJECT is all that special. it is simply used to
verify that the logic above the JOIN doesn't touch the RHS, and thus we can
safely convert the JOIN to SEMI-JOIN? was my understanding incorrect here?
If what I understood was correct. i think adding AGG is probably not that
special, it technically can be any RelNode on top of JOIN as long as there's a
way to extract the bit reference, i can even make this more generic, just that
i can't think of a reason to do so.
> Support for AggToSemiJoinRule
> -----------------------------
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
> Issue Type: New Feature
> Reporter: Rong Rong
> Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule. which in
> the rule itself it performance check and see if the project accesses columns
> from the RHS result
> This can be extended to Aggregate as well, experimental code:
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to
> activate the project-to-semi-join rule. please share if there's any other
> alternative if I haven't considered.
> thanks
--
This message was sent by Atlassian Jira
(v8.20.10#820010)