Thanks Davies for the explanation.
When I turn off the following options, I still see that spark1.5 is much slower
than 1.4.1. I am thinking how I can configure so that spark 1.5 can have
similar performance as spark1.4 for this particular query..
--conf spark.sql.planner.sortMergeJoin=false
I had ran similar benchmark for 1.5, do self join on a fact table with
join key that had many duplicated rows (there are N rows for the same
join key), say N, after join, there will be N*N rows for each join
key. Generating the joined row is slower in 1.5 than 1.4 (it needs to
copy left and right
Can you confirm if the query really run in the cluster mode? Not the local
mode. Can you print the call stack of the executor when the query is running?
BTW: spark.shuffle.reduceLocality.enabled is the configuration of Spark, not
Spark SQL.
From: Todd [mailto:bit1...@163.com]
Sent: Friday,
.@intel.com>, Todd <bit1...@163.com>, Michael
> Armbrust <mich...@databricks.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> Date: 09/11/2015 10:41 AM
> Subject: Re: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+
> compared w
e? Not the local mode. Can you print the c
>
> From: "Cheng, Hao" <hao.ch...@intel.com>
> To: Todd <bit1...@163.com>
> Cc: Jesse F Chen/San Francisco/IBM@IBMUS, Michael Armbrust
<mich...@databricks.com>, "user@spark.apache.org" <user@spark.ap
@intel.com>
> To: Todd <bit1...@163.com>
> Cc: Jesse F Chen/San Francisco/IBM@IBMUS, Michael Armbrust
> <mich...@databricks.com>, "user@spark.apache.org" <user@spark.apache.org>
> Date: 09/11/2015 01:00 AM
> Subject: RE: Re:Re:RE: Re:RE: spark 1.5
: Jesse F Chen/San Francisco/IBM@IBMUS, Michael Armbrust
<mich...@databricks.com>, "user@spark.apache.org"
<user@spark.apache.org>
Date: 09/11/2015 01:00 AM
Subject: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by
5