[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

Cheng Hao (JIRA) Wed, 04 Mar 2015 23:08:00 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348293#comment-14348293
 ]


Cheng Hao commented on SPARK-5791:
----------------------------------

I think this is a typical case that we need to optimize the join for the 
dimension tables, as they have lots of the data are filtered out with the join 
condition.

In this case it's possible most of data are filtered for the join condition of 
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
    WHERE datediff(d_date, '2001-05-08') >= -30
    AND datediff(d_date, '2001-05-08') <= 30
{/panel}

> [Spark SQL] show poor performance when multiple table do join operation
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5791
>                 URL: https://issues.apache.org/jira/browse/SPARK-5791
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Yi Zhou
>         Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

Reply via email to