Can you include the output of `explain()` for each of the runs?
On Tue, Sep 1, 2015 at 1:06 AM, patcharee wrote:
> Hi,
>
> I found spark 1.5 sorting is very slow compared to spark 1.4. Below is my
> code snippet
>
> val sqlRDD = sql("select date, u, v, z from fino3_hr3 where zone == 2
> and z >= 2 and z <= order by date, z")
> println("sqlRDD " + sqlRDD.count())
>
> The fino3_hr3 (in the sql command) is a hive table in orc format,
> partitioned by zone and z.
>
> Spark 1.5 takes 4.5 mins to execute this sql, while spark 1.4 takes 1.5
> mins. I noticed that dissimilar to spark 1.4 when spark 1.5 sorted, data
> was shuffled into few tasks, not divided for all tasks. Do I need to set
> any configuration explicitly? Any suggestions?
>
> BR,
> Patcharee
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>