[ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070478#comment-17070478
 ] 

Alfred Davidson commented on SPARK-31281:
-----------------------------------------

The allocated driver memory will be split for storage, memoryOverhead etc. As 
your action is bringing the data to the driver - the driver doesn’t have enough 
memory (and initially trying to GC to free up space). You can either allocate 
more driver memory or change the fraction that it allocations for storage. I 
believe default value is 0.6 e.g reserves 60% of driver memory for storage

> Hit OOM Error - GC Limit
> ------------------------
>
>                 Key: SPARK-31281
>                 URL: https://issues.apache.org/jira/browse/SPARK-31281
>             Project: Spark
>          Issue Type: Question
>          Components: Java API
>    Affects Versions: 2.4.4
>            Reporter: HongJin
>            Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to