Hi Timothy, I think the driver memory in all your examples is more than what is necessary in usual cases and executor memory is quite less.
I found this devops talk[1] at spark-summit here to be super useful in understanding few of this configuration details. [1] https://.youtube.com/watch?v=l4ZYUfZuRbU Cheers, Sangeeth On Aug 30, 2015 7:28 AM, "timothy22000" <timothy22...@gmail.com> wrote: > I am doing some memory tuning on my Spark job on YARN and I notice > different > settings would give different results and affect the outcome of the Spark > job run. However, I am confused and do not understand completely why it > happens and would appreciate if someone can provide me with some guidance > and explanation. > > I will provide some background information and describe the cases that I > have experienced and post my questions after them below. > > *My environment setting were as below:* > > - Memory 20G, 20 VCores per node (3 nodes in total) > - Hadoop 2.6.0 > - Spark 1.4.0 > > My code recursively filters an RDD to make it smaller (removing examples as > part of an algorithm), then does mapToPair and collect to gather the > results > and save them within a list. > > First Case > > /`/bin/spark-submit --class <class name> --master yarn-cluster > --driver-memory 7g --executor-memory 1g --num-executors 3 --executor-cores > 1 > --jars <jar file>` > / > If I run my program with any driver memory less than 11g, I will get the > error below which is the SparkContext being stopped or a similar error > which > is a method being called on a stopped SparkContext. From what I have > gathered, this is related to memory not being enough. > > > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24507/EKxQD.png > > > > Second Case > > > /`/bin/spark-submit --class <class name> --master yarn-cluster > --driver-memory 7g --executor-memory 3g --num-executors 3 --executor-cores > 1 > --jars <jar file>`/ > > If I run the program with the same driver memory but higher executor > memory, > the job runs longer (about 3-4 minutes) than the first case and then it > will > encounter a different error from earlier which is a Container > requesting/using more memory than allowed and is being killed because of > that. Although I find it weird since the executor memory is increased and > this error occurs instead of the error in the first case. > > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24507/tr24f.png > > > > Third Case > > > /`/bin/spark-submit --class <class name> --master yarn-cluster > --driver-memory 11g --executor-memory 1g --num-executors 3 --executor-cores > 1 --jars <jar file>`/ > > Any setting with driver memory greater than 10g will lead to the job being > able to run successfully. > > Fourth Case > > > /`/bin/spark-submit --class <class name> --master yarn-cluster > --driver-memory 2g --executor-memory 1g --conf > spark.yarn.executor.memoryOverhead=1024 --conf > spark.yarn.driver.memoryOverhead=1024 --num-executors 3 --executor-cores 1 > --jars <jar file>` > / > The job will run successfully with this setting (driver memory 2g and > executor memory 1g but increasing the driver memory overhead(1g) and the > executor memory overhead(1g). > > Questions > > > 1. Why is a different error thrown and the job runs longer (for the second > case) between the first and second case with only the executor memory being > increased? Are the two errors linked in some way? > > 2. Both the third and fourth case succeeds and I understand that it is > because I am giving more memory which solves the memory problems. However, > in the third case, > > /spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory that > YARN will create a JVM > = 11g + (driverMemory * 0.07, with minimum of 384m) > = 11g + 1.154g > = 12.154g/ > > So, from the formula, I can see that my job requires MEMORY_TOTAL of around > 12.154g to run successfully which explains why I need more than 10g for the > driver memory setting. > > But for the fourth case, > > / > spark.driver.memory + spark.yarn.driver.memoryOverhead = the memory that > YARN will create a JVM > = 2 + (driverMemory * 0.07, with minimum of 384m) > = 2g + 0.524g > = 2.524g > / > > It seems that just by increasing the memory overhead by a small amount of > 1024(1g) it leads to the successful run of the job with driver memory of > only 2g and the MEMORY_TOTAL is only 2.524g! Whereas without the overhead > configuration, driver memory less than 11g fails but it doesn't make sense > from the formula which is why I am confused. > > Why increasing the memory overhead (for both driver and executor) allows my > job to complete successfully with a lower MEMORY_TOTAL (12.154g vs 2.524g)? > Is there some other internal things at work here that I am missing? > > I would really appreciate any helped offered as it would really help with > my > understanding of Spark. Thanks in advance. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Effects-of-Driver-Memory-Executor-Memory-Driver-Memory-Overhead-and-Executor-Memory-Overhead-os-tp24507.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >