[
https://issues.apache.org/jira/browse/CALCITE-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706444#comment-14706444
]
Jesus Camacho Rodriguez commented on CALCITE-850:
-------------------------------------------------
Thanks for the feedback [~jni]. Please, let me know when you finish running
those additional tests, so I can push the changes to master; I'd like this to
be included in 1.4.
Concerning your comments about the regressions: I agree that pushing
expressions on both sides of a join shouldn't create performance regressions in
most cases. I also agree, that ultimately it should be a cost-based decision.
Hive is a special case... I'll try to keep it short.
Currently Hive rewrites the plan returned by Calcite into a HiveQL query, that
is in turn parsed again, physically optimized, and executed by the Hive engine.
As we do not do e.g. algorithm selection in Calcite and reflect this directly
by translating the Calcite operators into Hive operators, even minimal plan
changes can have a huge impact for our logic. The reason is that some physical
optimizations might not kick in if they don't recognize a given plan pattern
(e.g. transformation of reduce side joins into map joins, etc.).
We have been working on closing the gap between Calcite and Hive through
operator-to-operator translation (umbrella JIRA is HIVE-9132), which would
definitively solve this kind of issues.
> Remove push down expressions from FilterJoinRule and create a new rule for it
> -----------------------------------------------------------------------------
>
> Key: CALCITE-850
> URL: https://issues.apache.org/jira/browse/CALCITE-850
> Project: Calcite
> Issue Type: Bug
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
>
> CALCITE-457 added pushing expressions in join conditions into projects below
> the join in the FilterJoinRule, so the expression would be computed
> beforehand and not in the join predicate.
> While this can be an interesting feature for some projects using Calcite, it
> is a different functionality and it should be a standalone independent rule.
> For instance, in Hive we do not want to enable it at the moment, as it causes
> some performance regressions in many test cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)