[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348293#comment-14348293
 ] 

Cheng Hao edited comment on SPARK-5791 at 3/5/15 7:08 AM:
----------------------------------------------------------

I think this is a typical case that we need to optimize the join for the 
dimension tables, as they have lots of the data are filtered out with the join 
condition.

In this case it's possible most of records in the factor table 'inv' are 
filtered for the join condition of 
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
    WHERE datediff(d_date, '2001-05-08') >= -30
    AND datediff(d_date, '2001-05-08') <= 30
{panel}


was (Author: chenghao):
I think this is a typical case that we need to optimize the join for the 
dimension tables, as they have lots of the data are filtered out with the join 
condition.

In this case it's possible most of data are filtered for the join condition of 
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
    WHERE datediff(d_date, '2001-05-08') >= -30
    AND datediff(d_date, '2001-05-08') <= 30
{panel}

> [Spark SQL] show poor performance when multiple table do join operation
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5791
>                 URL: https://issues.apache.org/jira/browse/SPARK-5791
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Yi Zhou
>         Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to