[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-09-08 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734721#comment-14734721
 ] 

Yi Zhou commented on SPARK-5791:


[~yhuai], Yes. Thank you !

> [Spark SQL] show poor performance when multiple table do join operation
> ---
>
> Key: SPARK-5791
> URL: https://issues.apache.org/jira/browse/SPARK-5791
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Yi Zhou
> Attachments: Physcial_Plan_Hive.txt, 
> Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-07-28 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644689#comment-14644689
 ] 

Yin Huai commented on SPARK-5791:
-

[~jameszhouyi] So, the performance issue of join operation in your test has 
been resolved?

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-04-13 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493350#comment-14493350
 ] 

Yi Zhou commented on SPARK-5791:


[~yhuai], yes, Both used Parquet.

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-04-13 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492618#comment-14492618
 ] 

Yin Huai commented on SPARK-5791:
-

[~jameszhouyi] Thank you for the update :) For Hive, it also used Parquet in 
your last run, right?

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-04-12 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491864#comment-14491864
 ] 

Yi Zhou commented on SPARK-5791:


We changed file format from ORC to Parquet.  Got the result like below:
Spark SQL(2m28s) vs. Hive (3m12s)

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-06 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350396#comment-14350396
 ] 

Yi Zhou commented on SPARK-5791:


About 3.7MB in size for the result of 'name' subquery

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349139#comment-14349139
 ] 

Yin Huai commented on SPARK-5791:
-

[~jameszhouyi] Thank you for the updated physical plan. What is the file format 
used for those tables? ORC or Parquet? Also, what is the version of Spark? If 
Parquet is used, HiveTableScan is not as efficient as our native parquet 
support (ParquetRelation2 in Spark SQL. Actually, if you are using Spark 1.3 
and data is stored as Parquet, you should not see HiveTableScan when reading 
parquet data).

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349183#comment-14349183
 ] 

Yin Huai commented on SPARK-5791:
-

Also, how large is the results of name subquery?

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-05 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349929#comment-14349929
 ] 

Yi Zhou commented on SPARK-5791:


[~yhuai] Currently all of input tables are ORC file format. We used CDH5.3.0 
Spark-1.2 when testing such query.

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, 
 Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348293#comment-14348293
 ] 

Cheng Hao commented on SPARK-5791:
--

I think this is a typical case that we need to optimize the join for the 
dimension tables, as they have lots of the data are filtered out with the join 
condition.

In this case it's possible most of data are filtered for the join condition of 
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
WHERE datediff(d_date, '2001-05-08') = -30
AND datediff(d_date, '2001-05-08') = 30
{/panel}

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348225#comment-14348225
 ] 

Yin Huai commented on SPARK-5791:
-

I see. In Hive's plan, all of item, warehouse, and date_dim are broadcast 
tables. However, in Spark SQL's plan, the join between item and inventory was a 
shuffle join. Can you set the value of spark.sql.autoBroadcastJoinThreshold 
larger than the size of item? Also, what is the value of spark.serializer? 
Using org.apache.spark.serializer.KryoSerializer for spark.serializer will also 
help the performance (we will use Kryo to serialize broadcast tables). 

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348324#comment-14348324
 ] 

Yi Zhou commented on SPARK-5791:


Thank you [~yhuai]. Updated SparkSQL physical plan with  below parameters with 
great improved performance. But from latest test results, the query still get 
slow compared with Hive on M/R (~6min vs ~2min)
spark.sql.shuffle.partitions=200;
spark.sql.autoBroadcastJoinThreshold=209715200;
spark.serializer=org.apache.spark.serializer.KryoSerializer

== Physical Plan ==
InsertIntoHiveTable (MetastoreRelation bigbenchorc, 
q22_spark_run_query_0_result, None), Map(), false
 Sort [w_warehouse_name#674 ASC,i_item_id#651 ASC], false
  Exchange (HashPartitioning [w_warehouse_name#674,i_item_id#651], 200)
   Filter (((inv_before#635L  0)  ((CAST(inv_after#636L, DoubleType) / 
CAST(inv_before#635L, DoubleType)) = 0.))  
((CAST(inv_after#636L, DoubleType) / CAST(inv_before#635L, DoubleType)) = 1.5))
Aggregate false, [w_warehouse_name#674,i_item_id#651], 
[w_warehouse_name#674,i_item_id#651,SUM(PartialSum#716L) AS 
inv_before#635L,SUM(PartialSum#717L) AS inv_after#636L]
 Exchange (HashPartitioning [w_warehouse_name#674,i_item_id#651], 200)
  Aggregate true, [w_warehouse_name#674,i_item_id#651], 
[w_warehouse_name#674,i_item_id#651,SUM(CAST(CASE WHEN 
(HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateDiff(d_date#688,2001-05-08)
  0) THEN inv_quantity_on_hand#649 ELSE 0, LongType)) AS 
PartialSum#716L,SUM(CAST(CASE WHEN 
(HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateDiff(d_date#688,2001-05-08)
 = 0) THEN inv_quantity_on_hand#649 ELSE 0, LongType)) AS PartialSum#717L]
   Project 
[w_warehouse_name#674,i_item_id#651,d_date#688,inv_quantity_on_hand#649]
BroadcastHashJoin [inv_date_sk#646L], [d_date_sk#686L], BuildRight
 Project 
[i_item_id#651,w_warehouse_name#674,inv_date_sk#646L,inv_quantity_on_hand#649]
  BroadcastHashJoin [inv_warehouse_sk#648L], [w_warehouse_sk#672L], 
BuildRight
   Project 
[inv_warehouse_sk#648L,i_item_id#651,inv_date_sk#646L,inv_quantity_on_hand#649]
BroadcastHashJoin [inv_item_sk#647L], [i_item_sk#650L], BuildRight
 HiveTableScan 
[inv_date_sk#646L,inv_item_sk#647L,inv_warehouse_sk#648L,inv_quantity_on_hand#649],
 (MetastoreRelation bigbenchorc, inventory, Some(inv)), None
 Project [i_item_id#651,i_item_sk#650L]
  Filter ((i_current_price#655  0.98)  (i_current_price#655  
1.5))
   HiveTableScan 
[i_item_id#651,i_item_sk#650L,i_current_price#655], (MetastoreRelation 
bigbenchorc, item, None), None
   HiveTableScan [w_warehouse_name#674,w_warehouse_sk#672L], 
(MetastoreRelation bigbenchorc, warehouse, Some(w)), None
 Filter 
((HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateDiff(d_date#688,2001-05-08)
 = -30)  
(HiveGenericUdf#org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateDiff(d_date#688,2001-05-08)
 = 30))
  HiveTableScan [d_date_sk#686L,d_date#688], (MetastoreRelation 
bigbenchorc, date_dim, Some(d)), None
Time taken: 2.579 seconds


 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-04 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348216#comment-14348216
 ] 

Yi Zhou commented on SPARK-5791:


Hi, [~yhuai] i attached the Physical Plan for Hive. Please kindly refer..

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-03-02 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344486#comment-14344486
 ] 

Yin Huai commented on SPARK-5791:
-

[~jameszhouyi] Can you also add the plan generated by Hive?

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou
 Attachments: Physical_Plan.txt


 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-27 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341095#comment-14341095
 ] 

Yi Zhou commented on SPARK-5791:


Add tables size info:
~4.9 GB 'inventory' table 
~73.5 MB 'item' table
~3.1 KB 'warehouse'  table
~1.7MB 'date_dim' table

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou

 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-15 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321777#comment-14321777
 ] 

Yi Zhou commented on SPARK-5791:


For the same input dataset size, it costs about ~2mins on hive on M/R with 
optimization parameters but it costs about ~1hour on SparkSQL.

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou

 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-12 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319400#comment-14319400
 ] 

Cheng Hao commented on SPARK-5791:
--

Can you also attach the performance comparison result for this query?

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou

 Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-02-12 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319364#comment-14319364
 ] 

Yi Zhou commented on SPARK-5791:


For example:
SELECT *
FROM inventory inv
JOIN (
  SELECT
i_item_id,
i_item_sk
  FROM item
  WHERE i_current_price  0.98
  AND i_current_price  1.5
) items
ON inv.inv_item_sk = items.i_item_sk
JOIN warehouse w ON inv.inv_warehouse_sk = w.w_warehouse_sk
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
WHERE datediff(d_date, '2001-05-08') = -30
AND datediff(d_date, '2001-05-08') = 30;

 [Spark SQL] show poor performance when multiple table do join operation
 ---

 Key: SPARK-5791
 URL: https://issues.apache.org/jira/browse/SPARK-5791
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: Yi Zhou

 Spark SQL shows poor performance when multiple tables do join operation 
 compared with  Hive on MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org