Yes, I suggested increasing shuffle partitions to address this problem. The
other suggestion to increase shuffle fraction was not for this but makes
sense given that you are reserving all that memory and doing nothing with
it. By diverting more of it for shuffles you can help improve your shuffle
On Mon, Mar 14, 2016 at 1:30 PM, Prabhu Joseph
wrote:
>
> Thanks for the recommendation. But can you share what are the
> improvements made above Spark-1.2.1 and how which specifically handle the
> issue that is observed here.
>
Memory used for query execution is
Michael,
Thanks for the recommendation. But can you share what are the
improvements made above Spark-1.2.1 and how which specifically handle the
issue that is observed here.
On Tue, Mar 15, 2016 at 12:03 AM, Jörn Franke wrote:
> I am not sure about this. At least
I am not sure about this. At least Hortonworks provides its distribution with
Hive and Spark 1.6
> On 14 Mar 2016, at 09:25, Mich Talebzadeh wrote:
>
> I think the only version of Spark that works OK with Hive (Hive on Spark
> engine) is version 1.3.1. I also get
+1 to upgrading Spark. 1.2.1 has non of the memory management improvements
that were added in 1.4-1.6.
On Mon, Mar 14, 2016 at 2:03 AM, Prabhu Joseph
wrote:
> The issue is the query hits OOM on a Stage when reading Shuffle Output
> from previous stage.How come
The issue is the query hits OOM on a Stage when reading Shuffle Output from
previous stage.How come increasing shuffle memory helps to avoid OOM.
On Mon, Mar 14, 2016 at 2:28 PM, Sabarish Sasidharan wrote:
> Thats a pretty old version of Spark SQL. It is devoid of all
Thats a pretty old version of Spark SQL. It is devoid of all the
improvements introduced in the last few releases.
You should try bumping your spark.sql.shuffle.partitions to a value higher
than default (5x or 10x). Also increase your shuffle memory fraction as you
really are not explicitly
It is a Spark-SQL and the version used is Spark-1.2.1.
On Mon, Mar 14, 2016 at 2:16 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:
> I believe the OP is using Spark SQL and not Hive on Spark.
>
> Regards
> Sab
>
> On Mon, Mar 14, 2016 at 1:55 PM, Mich Talebzadeh <
>
I believe the OP is using Spark SQL and not Hive on Spark.
Regards
Sab
On Mon, Mar 14, 2016 at 1:55 PM, Mich Talebzadeh
wrote:
> I think the only version of Spark that works OK with Hive (Hive on Spark
> engine) is version 1.3.1. I also get OOM from time to time and
I think the only version of Spark that works OK with Hive (Hive on Spark
engine) is version 1.3.1. I also get OOM from time to time and have to
revert using MR
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Which version of Spark are you using? The configuration varies by version.
Regards
Sab
On Mon, Mar 14, 2016 at 10:53 AM, Prabhu Joseph
wrote:
> Hi All,
>
> A Hive Join query which runs fine and faster in MapReduce takes lot of
> time with Spark and finally fails
Hi All,
A Hive Join query which runs fine and faster in MapReduce takes lot of time
with Spark and finally fails with OOM.
*Query: hivejoin.py*
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext
conf = SparkConf().setAppName("Hive_Join")
sc =
12 matches
Mail list logo