Re: Hive Query on Spark fails with OOM

2016-03-15 Thread Sabarish Sasidharan
Yes, I suggested increasing shuffle partitions to address this problem. The other suggestion to increase shuffle fraction was not for this but makes sense given that you are reserving all that memory and doing nothing with it. By diverting more of it for shuffles you can help improve your shuffle

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Michael Armbrust
On Mon, Mar 14, 2016 at 1:30 PM, Prabhu Joseph wrote: > > Thanks for the recommendation. But can you share what are the > improvements made above Spark-1.2.1 and how which specifically handle the > issue that is observed here. > Memory used for query execution is

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Prabhu Joseph
Michael, Thanks for the recommendation. But can you share what are the improvements made above Spark-1.2.1 and how which specifically handle the issue that is observed here. On Tue, Mar 15, 2016 at 12:03 AM, Jörn Franke wrote: > I am not sure about this. At least

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Jörn Franke
I am not sure about this. At least Hortonworks provides its distribution with Hive and Spark 1.6 > On 14 Mar 2016, at 09:25, Mich Talebzadeh wrote: > > I think the only version of Spark that works OK with Hive (Hive on Spark > engine) is version 1.3.1. I also get

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Michael Armbrust
+1 to upgrading Spark. 1.2.1 has non of the memory management improvements that were added in 1.4-1.6. On Mon, Mar 14, 2016 at 2:03 AM, Prabhu Joseph wrote: > The issue is the query hits OOM on a Stage when reading Shuffle Output > from previous stage.How come

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Prabhu Joseph
The issue is the query hits OOM on a Stage when reading Shuffle Output from previous stage.How come increasing shuffle memory helps to avoid OOM. On Mon, Mar 14, 2016 at 2:28 PM, Sabarish Sasidharan wrote: > Thats a pretty old version of Spark SQL. It is devoid of all

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Sabarish Sasidharan
Thats a pretty old version of Spark SQL. It is devoid of all the improvements introduced in the last few releases. You should try bumping your spark.sql.shuffle.partitions to a value higher than default (5x or 10x). Also increase your shuffle memory fraction as you really are not explicitly

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Prabhu Joseph
It is a Spark-SQL and the version used is Spark-1.2.1. On Mon, Mar 14, 2016 at 2:16 PM, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: > I believe the OP is using Spark SQL and not Hive on Spark. > > Regards > Sab > > On Mon, Mar 14, 2016 at 1:55 PM, Mich Talebzadeh < >

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Sabarish Sasidharan
I believe the OP is using Spark SQL and not Hive on Spark. Regards Sab On Mon, Mar 14, 2016 at 1:55 PM, Mich Talebzadeh wrote: > I think the only version of Spark that works OK with Hive (Hive on Spark > engine) is version 1.3.1. I also get OOM from time to time and

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Mich Talebzadeh
I think the only version of Spark that works OK with Hive (Hive on Spark engine) is version 1.3.1. I also get OOM from time to time and have to revert using MR Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Hive Query on Spark fails with OOM

2016-03-14 Thread Sabarish Sasidharan
Which version of Spark are you using? The configuration varies by version. Regards Sab On Mon, Mar 14, 2016 at 10:53 AM, Prabhu Joseph wrote: > Hi All, > > A Hive Join query which runs fine and faster in MapReduce takes lot of > time with Spark and finally fails

Hive Query on Spark fails with OOM

2016-03-13 Thread Prabhu Joseph
Hi All, A Hive Join query which runs fine and faster in MapReduce takes lot of time with Spark and finally fails with OOM. *Query: hivejoin.py* from pyspark import SparkContext, SparkConf from pyspark.sql import HiveContext conf = SparkConf().setAppName("Hive_Join") sc =