[ https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070478#comment-17070478 ]
Alfred Davidson edited comment on SPARK-31281 at 3/29/20, 6:54 PM: ------------------------------------------------------------------- The allocated driver memory will be split for storage, memoryOverhead etc. Your transformation is doing a join (which is likely to be a broadcast join) and you have an action that is bringing the data to the driver - the driver doesn’t have enough memory (and initially trying to GC to free up space). You can either allocate more driver memory or change the fraction that it allocations for storage. I believe default value is 0.6 e.g reserves 60% of driver memory for storage was (Author: alfiewdavidson): The allocated driver memory will be split for storage, memoryOverhead etc. As your action is bringing the data to the driver - the driver doesn’t have enough memory (and initially trying to GC to free up space). You can either allocate more driver memory or change the fraction that it allocations for storage. I believe default value is 0.6 e.g reserves 60% of driver memory for storage > Hit OOM Error - GC Limit > ------------------------ > > Key: SPARK-31281 > URL: https://issues.apache.org/jira/browse/SPARK-31281 > Project: Spark > Issue Type: Question > Components: Java API > Affects Versions: 2.4.4 > Reporter: HongJin > Priority: Critical > > MemoryStore is 2.6GB > conf = new SparkConf().setAppName("test") > //.set("spark.sql.codegen.wholeStage", "false") > .set("spark.driver.host", "localhost") > .set("spark.driver.memory", "4g") > .set("spark.executor.cores","1") > .set("spark.num.executors","1") > .set("spark.executor.memory", "4g") > .set("spark.executor.memoryOverhead", "400m") > .set("spark.dynamicAllocation.enabled", "true") > .set("spark.dynamicAllocation.minExecutors","1") > .set("spark.dynamicAllocation.maxExecutors","2") > .set("spark.ui.enabled","true") //enable spark UI > .set("spark.sql.shuffle.partitions",defaultPartitions) > .setMaster("local[2]") > sparkSession = SparkSession.builder.config(conf).getOrCreate() > > val df = SparkFactory.sparkSession.sqlContext > .read > .option("header", "true") > .option("delimiter", delimiter) > .csv(textFileLocation) > > joinedDf = upperCaseLeft.as("l") > .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer") > .select(compositeKeysCol ::: nonKeyCols.map(col => > mapHelper(col,toleranceValue,caseSensitive)): _*) > > data = joinedDf.take(maxRecords) > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org