Re: OOM Error

2019-09-07 Thread Ankit Khettry
Sure folks, will try later today!

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra,  wrote:

> Ankit
>
> Can you try reducing number of cores or increasing memory. Because with
> below configuration your each core is getting ~3.5 GB. Otherwise your data
> is skewed, that one of cores is getting too much data based key.
>
> spark.executor.cores 6 spark.executor.memory 36g
>
> On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh  wrote:
>
>> It says you have 3811 tasks in earlier stages and you're going down to
>> 2001 partitions, that would make it more memory intensive. I'm guessing the
>> default spark shuffle partition was 200 so that would have failed. Go for
>> higher number, maybe even higher than 3811. What was your shuffle write
>> from stage 7 and shuffle read from stage 8?
>>
>> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, 
>> wrote:
>>
>>> Still unable to overcome the error. Attaching some screenshots for
>>> reference.
>>> Following are the configs used:
>>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead
>>> 6g spark.executor.cores 6 spark.executor.memory 36g
>>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true spark.executor.instances 10
>>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>>>
>>>> You can try, consider processing each partition separately if your data
>>>> is heavily skewed when you partition it.
>>>>
>>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
>>>> wrote:
>>>>
>>>>> Thanks Chris
>>>>>
>>>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>>>> 2001. Also, I was wondering if it would help if I repartition the data by
>>>>> the fields I am using in group by and window operations?
>>>>>
>>>>> Best Regards
>>>>> Ankit Khettry
>>>>>
>>>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>>>>>
>>>>>> Hi Ankit,
>>>>>>
>>>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing
>>>>>> you're running on default number of Spark shuffle partitions.
>>>>>>
>>>>>> If you're seeing a lot of shuffle spill, you likely have to increase
>>>>>> the number of shuffle partitions to accommodate the huge shuffle size.
>>>>>>
>>>>>> I hope that helps
>>>>>> Chris
>>>>>>
>>>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>>>>>> wrote:
>>>>>>
>>>>>>> Nope, it's a batch job.
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Ankit Khettry
>>>>>>>
>>>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <
>>>>>>> 028upasana...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Is it a streaming job?
>>>>>>>>
>>>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have a Spark job that consists of a large number of Window
>>>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs 
>>>>>>>>> of
>>>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>>>>>> instances). I am using the following configurations for the job, 
>>>>>>>>> although I
>>>>>>>>> have tried various other combinations without any success.
>>>>>>>>>
>>>>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>>>>> spark.storage.memoryFraction 0.1
>>>>>>>>> spark.executor.cores 6
>>>>>>>>> spark.executor.memory 36g
>>>>>>>>> spark.memory.offHeap.size 8g
>>>>>>>>> spark.memory.offHeap.enabled true
>>>>>>>>> spark.executor.instances 10
>>>>>>>>> spark.driver.memory 14g
>>>>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>>>>
>>>>>>>>> I keep running into the following OOM error:
>>>>>>>>>
>>>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>>>>>>>> 16384 bytes of memory, got 0
>>>>>>>>> at
>>>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>>>>>>
>>>>>>>>> I see there are a large number of JIRAs in place for similar
>>>>>>>>> issues and a great many of them are even marked resolved.
>>>>>>>>> Can someone guide me as to how to approach this problem? I am
>>>>>>>>> using Databricks Spark 2.4.1.
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Ankit Khettry
>>>>>>>>>
>>>>>>>>


Re: OOM Error

2019-09-07 Thread Ankit Khettry
Thanks Chris

Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001.
Also, I was wondering if it would help if I repartition the data by the
fields I am using in group by and window operations?

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:

> Hi Ankit,
>
> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
> running on default number of Spark shuffle partitions.
>
> If you're seeing a lot of shuffle spill, you likely have to increase the
> number of shuffle partitions to accommodate the huge shuffle size.
>
> I hope that helps
> Chris
>
> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
> wrote:
>
>> Nope, it's a batch job.
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>> wrote:
>>
>>> Is it a streaming job?
>>>
>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>> wrote:
>>>
>>>> I have a Spark job that consists of a large number of Window operations
>>>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>>>> am using the following configurations for the job, although I have tried
>>>> various other combinations without any success.
>>>>
>>>> spark.yarn.driver.memoryOverhead 6g
>>>> spark.storage.memoryFraction 0.1
>>>> spark.executor.cores 6
>>>> spark.executor.memory 36g
>>>> spark.memory.offHeap.size 8g
>>>> spark.memory.offHeap.enabled true
>>>> spark.executor.instances 10
>>>> spark.driver.memory 14g
>>>> spark.yarn.executor.memoryOverhead 10g
>>>>
>>>> I keep running into the following OOM error:
>>>>
>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>>>> bytes of memory, got 0
>>>> at
>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>> at
>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>> at
>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>> at
>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>
>>>> I see there are a large number of JIRAs in place for similar issues and
>>>> a great many of them are even marked resolved.
>>>> Can someone guide me as to how to approach this problem? I am using
>>>> Databricks Spark 2.4.1.
>>>>
>>>> Best Regards
>>>> Ankit Khettry
>>>>
>>>


Re: OOM Error

2019-09-07 Thread Ankit Khettry
Nope, it's a batch job.

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
wrote:

> Is it a streaming job?
>
> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
> wrote:
>
>> I have a Spark job that consists of a large number of Window operations
>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>> am using the following configurations for the job, although I have tried
>> various other combinations without any success.
>>
>> spark.yarn.driver.memoryOverhead 6g
>> spark.storage.memoryFraction 0.1
>> spark.executor.cores 6
>> spark.executor.memory 36g
>> spark.memory.offHeap.size 8g
>> spark.memory.offHeap.enabled true
>> spark.executor.instances 10
>> spark.driver.memory 14g
>> spark.yarn.executor.memoryOverhead 10g
>>
>> I keep running into the following OOM error:
>>
>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>> bytes of memory, got 0
>> at
>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>> at
>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>
>> I see there are a large number of JIRAs in place for similar issues and a
>> great many of them are even marked resolved.
>> Can someone guide me as to how to approach this problem? I am using
>> Databricks Spark 2.4.1.
>>
>> Best Regards
>> Ankit Khettry
>>
>


OOM Error

2019-09-06 Thread Ankit Khettry
I have a Spark job that consists of a large number of Window operations and
hence involves large shuffles. I have roughly 900 GiBs of data, although I
am using a large enough cluster (10 * m5.4xlarge instances). I am using the
following configurations for the job, although I have tried various other
combinations without any success.

spark.yarn.driver.memoryOverhead 6g
spark.storage.memoryFraction 0.1
spark.executor.cores 6
spark.executor.memory 36g
spark.memory.offHeap.size 8g
spark.memory.offHeap.enabled true
spark.executor.instances 10
spark.driver.memory 14g
spark.yarn.executor.memoryOverhead 10g

I keep running into the following OOM error:

org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
bytes of memory, got 0
at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
at
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)

I see there are a large number of JIRAs in place for similar issues and a
great many of them are even marked resolved.
Can someone guide me as to how to approach this problem? I am using
Databricks Spark 2.4.1.

Best Regards
Ankit Khettry


Re: An alternative logic to collaborative filtering works fine but we are facing run time issues in executing the job

2019-04-16 Thread Ankit Khettry
Hi Balakumar

Two things.

One - It seems like your cluster is running out of memory and then
eventually out of disc , likely while materializing the dataframe to write
(what's the volume of data created by the join?)

Two - Your job is running in local mode, and is able to utilize just the
master node resources.

Try running the job in yarn mode and if the issue persists, try increasing
the disc volumes.

Best Regards
Ankit Khettry

On Wed, 17 Apr, 2019, 9:44 AM Balakumar iyer S, 
wrote:

> Hi ,
>
>
> While running the following spark code in the cluster with following
> configuration it is spread into  3 job Id's
>
> CLUSTER CONFIGURATION
>
> 3 NODE CLUSTER
>
> NODE 1 - 64GB 16CORES
>
> NODE 2 - 64GB 16CORES
>
> NODE 3 - 64GB 16CORES
>
>
> At Job Id 2 job is stuck at the stage 51 of 254 and then it starts
> utilising the disk space I am not sure why is this happening and my work is
> completely ruined . could someone help me on this
>
> I have attached screen shot of spark stages which are stuck for reference
>
> Please let me know for more questions with the setup and code
> Thanks
>
>
>
> code:
>
>def main(args: Array[String]) {
>
> Logger.getLogger("org").setLevel(Level.ERROR)
>
> val ss = SparkSession
>
>   .builder
>
>   .appName("join_association").master("local[*]")
>
>   .getOrCreate()
>
>   import ss.implicits._
>
>  val dframe = ss.read.option("inferSchema",
> value=true).option("delimiter", ",").csv("in/matrimony.txt")
>
>  dframe.show()
>
>  dframe.printSchema()
>
>  //left_frame
>
>
>
>  val dfLeft = dframe.withColumnRenamed("_c1", "left_data")
>
>
>
>  val dfRight = dframe.withColumnRenamed("_c1", "right_data")
>
>
>
>  //Join
>
>
>
>  val joined = dfLeft.join(dfRight , dfLeft.col("_c0") ===
> dfRight.col("_c0") ).filter(col("left_data") !== col("right_data") )
>
>
>
>   joined.show()
>
>
>
> val result = joined.select(col("left_data"), col("right_data") as
> "similar_ids" )
>
>
>
> result.write.csv("/output")
>
> ss.stop()
>
>
>
>   }
>
>
>
> --
> REGARDS
> BALAKUMAR SEETHARAMAN
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org