I launch around 30-60 of these jobs defined like start-job.sh in the background from a wrapper script. I wait about 30 seconds between launches, then the wrapper monitors yarn to determine when to launch more. There is a limit defined at around 60 jobs, but even if I set it to 30, I run out of memory on the host submitting the jobs. Why does my approach to using spark-submit cause me to run out of memory. I have about 6G free, and I don't feel like I should be running out of memory when submitting jobs.
start-job.sh export HADOOP_CONF_DIR=/etc/hadoop/conf spark-submit \ --class sap.whcounter.WarehouseCounter \ --master yarn-cluster \ --num-executors 1 \ --driver-memory 1024m \ --executor-memory 1024m \ --executor-cores 4 \ --queue hlab \ --conf spark.yarn.submit.waitAppCompletion=false \ --conf spark.app.name=wh_reader_sp \ --conf spark.streaming.receiver.maxRate=1000 \ --conf spark.streaming.concurrentJobs=2 \ --conf spark.eventLog.dir="hdfs:///user/spark/applicationHistory" \ --conf spark.eventLog.enabled=true \ --conf spark.eventLog.overwrite=true \ --conf spark.yarn.historyServer.address="http://spark-history.local:18080/" \ --conf spark.yarn.jar="hdfs:///user/spark/share/lib/spark-assembly.jar" \ --conf spark.yarn.dist.files="hdfs:///user/colin.williams/warehouse-counter-0.0.1-SNAPSHOT-uber.jar" \ hdfs:///user/colin.williams/warehouse-counter-0.0.1-SNAPSHOT-uber.jar \ $1 $2 ps aux | grep java /usr/java/latest/bin/java -cp ::/usr/lib/spark/conf:/usr/lib/spark/lib/spark-assembly.jar:/etc/hadoop/conf:/usr/lib/hadoop/client/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop/../hadoop-hdfs/./:/usr/lib/hadoop/../hadoop-hdfs/lib/*:/usr/lib/hadoop/../hadoop-hdfs/.//*:/usr/lib/hadoop/../hadoop-yarn/lib/*:/usr/lib/hadoop/../hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/spark/lib/scala-library.jar:/usr/lib/spark/lib/scala-compiler.jar:/usr/lib/spark/lib/jline.jar -XX:MaxPermSize=128m -Xms1024m -Xmx1024m org.apache.spark.deploy.SparkSubmit --class sap.whcounter.WarehouseCounter --master yarn-cluster --num-executors 1 --driver-memory 1024m --executor-memory 1024m --executor-cores 4 --queue hlab --conf spark.yarn.submit.waitAppCompletion=false --conf spark.app.name=wh_reader_sp --conf spark.streaming.receiver.maxRate=1000 --conf spark.streaming.concurrentJobs=2 --conf spark.eventLog.dir=hdfs:///user/spark/applicationHistory --conf spark.eventLog.enabled=true --conf spark.eventLog.overwrite=true --conf spark.yarn.historyServer.address=http://spark-history.local:18080/ --conf spark.yarn.jar=hdfs:///user/spark/share/lib/spark-assembly.jar --conf spark.yarn.dist.files=hdfs:///user/colin.williams/warehouse-counter-0.0.1-SNAPSHOT-uber.jar hdfs:///user/colin.williams/warehouse-counter-0.0.1-SNAPSHOT-uber.jar hdfs:///wh/2015/04/19/* free -m total used free shared buffers Mem: 7873 992 6881 0 62 -/+ buffers/cache: 500 7373 Swap: 14947 574 14373 hs_err_pid7433.log # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 716177408 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Checes insufficient memory for the Java Runtime Environment to continue. # # Native memory allocation (malloc) failed to allocate 716177408 bytes for committing reserved memory. # # Possible reasons: # # The system is out of physical RAM or swap space # # In 32 bit mode, the process size limit was hit # # Possible solutions: # # Reduce memory load on the system # # Increase physical memory or swap space # # Check if swap backing store is full # # Use 64 bit Java on a 64 bit OS # # Decrease Java heap size (-Xmx/-Xms) # # Decrease number of Java threads # # Decrease Java thread stack sizes (-Xss) # # Set larger code cache with -XX:ReservedCodeCacheSize= # # This output file may be truncated or incomplete. # # # # Out of Memory Error (os_linux.cpp:2747), pid=7357, tid=140414250673920 # # # # JRE version: (7.0_60-b19) (build ) # # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.60-b09 mixed mode linux-amd64 compressed oops) # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (os_linux.cpp:2747), pid=7357, tid=140414250673920 # # JRE version: (7.0_60-b19) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.60-b09 mixed mode linux-amd64 compressed oops) # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # VM Arguments: jvm_args: -XX:MaxPermSize=128m -Xms1024m -Xmx1024m java_command: org.apache.spark.deploy.SparkSubmit --class sap.whcounter.WarehouseCounter --master yarn-cluster --num-executors 1 --driver-memory 1024m --executor-memory 1024m --executor-cores 4 --queue hlab --conf spark.yarn.submit.waitAppCompletion=false --conf spark.app.name=wh_reader_sp --conf spark.streaming.receiver.maxRate=1000 --conf spark.streaming.concurrentJobs=2 --conf spark.eventLog.dir=hdfs:///user/spark/applicationHistory --conf spark.eventLog.enabled=true --conf spark.eventLog.overwrite=true --conf spark.yarn.historyServer.address=http://spark-history.local:18080/ --conf spark.yarn.dist.files=hdfs:///user/colin.williams/warehouse-counter-0.0.1-SNAPSHOT-uber.jar hdfs:///user/colin.williams/warehouse-counter-0.0.1-SNAPSHOT-uber.jar hdfs://wh/2015/04/10/* 2015-04-10T00:00:00+00:00 Launcher Type: SUN_STANDARD Environment Variables: JAVA_HOME=/usr/java/latest CLASSPATH=::/usr/lib/spark/conf:/usr/lib/spark/lib/spark-assembly.jar:/etc/hadoop/conf:/usr/lib/hadoop/client/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop/../hadoop-hdfs/./:/usr/lib/hadoop/../hadoop-hdfs/lib/*:/usr/lib/hadoop/../hadoop-hdfs/.//*:/usr/lib/hadoop/../hadoop-yarn/lib/*:/usr/lib/hadoop/../hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/spark/lib/scala-library.jar:/usr/lib/spark/lib/scala-compiler.jar:/usr/lib/spark/lib/jline.jar PATH=/home/colin.williams/bin:/home/colin.williams/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/rvm/bin SHELL=/bin/bash