[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

Yin Huai (JIRA) Thu, 05 Mar 2015 09:41:06 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349139#comment-14349139
 ]


Yin Huai commented on SPARK-5791:
---------------------------------

[~jameszhouyi] Thank you for the updated physical plan. What is the file format 
used for those tables? ORC or Parquet? Also, what is the version of Spark? If 
Parquet is used, HiveTableScan is not as efficient as our native parquet 
support (ParquetRelation2 in Spark SQL. Actually, if you are using Spark 1.3 
and data is stored as Parquet, you should not see HiveTableScan when reading 
parquet data).

> [Spark SQL] show poor performance when multiple table do join operation
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5791
>                 URL: https://issues.apache.org/jira/browse/SPARK-5791
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Yi Zhou
>         Attachments: Physcial_Plan_Hive.txt, 
> Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

Reply via email to