Re: Spark-SQL 1.2.0 sort by results are not consistent with Hive

2015-02-26 Thread Cheng Lian
Could you check the Spark web UI for the number of tasks issued when the query is executed? I digged out |mapred.map.tasks| because I saw 2 tasks were issued. On 2/26/15 3:01 AM, Kannan Rajah wrote: Cheng, We tried this setting and it still did not help. This was on Spark 1.2.0. -- Kannan

Re: Spark-SQL 1.2.0 sort by results are not consistent with Hive

2015-02-25 Thread Kannan Rajah
Cheng, We tried this setting and it still did not help. This was on Spark 1.2.0. -- Kannan On Mon, Feb 23, 2015 at 6:38 PM, Cheng Lian lian.cs@gmail.com wrote: (Move to user list.) Hi Kannan, You need to set mapred.map.tasks to 1 in hive-site.xml. The reason is this line of code

RE: Spark-SQL 1.2.0 sort by results are not consistent with Hive

2015-02-25 Thread Cheng, Hao
How many reducers you set for Hive? With small data set, Hive will run in local mode, which will set the reducer count always as 1. From: Kannan Rajah [mailto:kra...@maprtech.com] Sent: Thursday, February 26, 2015 3:02 AM To: Cheng Lian Cc: user@spark.apache.org Subject: Re: Spark-SQL 1.2.0 sort

Re: Spark-SQL 1.2.0 sort by results are not consistent with Hive

2015-02-23 Thread Cheng Lian
(Move to user list.) Hi Kannan, You need to set |mapred.map.tasks| to 1 in hive-site.xml. The reason is this line of code https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L68, which overrides |spark.default.parallelism|. Also,