[
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348293#comment-14348293
]
Cheng Hao commented on SPARK-5791:
----------------------------------
I think this is a typical case that we need to optimize the join for the
dimension tables, as they have lots of the data are filtered out with the join
condition.
In this case it's possible most of data are filtered for the join condition of
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
WHERE datediff(d_date, '2001-05-08') >= -30
AND datediff(d_date, '2001-05-08') <= 30
{/panel}
> [Spark SQL] show poor performance when multiple table do join operation
> -----------------------------------------------------------------------
>
> Key: SPARK-5791
> URL: https://issues.apache.org/jira/browse/SPARK-5791
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.0
> Reporter: Yi Zhou
> Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]