Re: spark 1.5 sort slow

2015-09-02 Thread Michael Armbrust
Can you include the output of `explain()` for each of the runs?

On Tue, Sep 1, 2015 at 1:06 AM, patcharee  wrote:

> Hi,
>
> I found spark 1.5 sorting is very slow compared to spark 1.4. Below is my
> code snippet
>
> val sqlRDD = sql("select date, u, v, z from fino3_hr3 where zone == 2
> and z >= 2 and z <= order by date, z")
> println("sqlRDD " + sqlRDD.count())
>
> The fino3_hr3 (in the sql command) is a hive table in orc format,
> partitioned by zone and z.
>
> Spark 1.5 takes 4.5 mins to execute this sql, while spark 1.4 takes 1.5
> mins. I noticed that dissimilar to spark 1.4 when spark 1.5 sorted, data
> was shuffled into few tasks, not divided for all tasks. Do I need to set
> any configuration explicitly? Any suggestions?
>
> BR,
> Patcharee
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


spark 1.5 sort slow

2015-09-01 Thread patcharee

Hi,

I found spark 1.5 sorting is very slow compared to spark 1.4. Below is 
my code snippet


val sqlRDD = sql("select date, u, v, z from fino3_hr3 where zone == 
2 and z >= 2 and z <= order by date, z")

println("sqlRDD " + sqlRDD.count())

The fino3_hr3 (in the sql command) is a hive table in orc format, 
partitioned by zone and z.


Spark 1.5 takes 4.5 mins to execute this sql, while spark 1.4 takes 1.5 
mins. I noticed that dissimilar to spark 1.4 when spark 1.5 sorted, data 
was shuffled into few tasks, not divided for all tasks. Do I need to set 
any configuration explicitly? Any suggestions?


BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org