OOM on yarn-cluster mode

Julio Antonio Soto Tue, 19 Jan 2016 13:16:48 -0800

Hi,

I'm having trouble when uploadig spark jobs in yarn-cluster mode. While the
job works and completes in yarn-client mode, I hit the following error when
using spark-submit in yarn-cluster (simplified):


16/01/19 21:43:31 INFO hive.metastore: Connected to metastore.
16/01/19 21:43:32 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
16/01/19 21:43:32 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/julio/appcache/application_1453120455858_0040/container_1453120455858_0040_01_000001/tmp/77350a02-d900-4c84-9456-134305044d21_resources
16/01/19 21:43:32 INFO session.SessionState: Created HDFS directory:
/tmp/hive/nobody/77350a02-d900-4c84-9456-134305044d21
16/01/19 21:43:32 INFO session.SessionState: Created local directory:
/yarn/nm/usercache/julio/appcache/application_1453120455858_0040/container_1453120455858_0040_01_000001/tmp/nobody/77350a02-d900-4c84-9456-134305044d21
16/01/19 21:43:32 INFO session.SessionState: Created HDFS directory:
/tmp/hive/nobody/77350a02-d900-4c84-9456-134305044d21/_tmp_space.db
16/01/19 21:43:32 INFO parquet.ParquetRelation: Listing
hdfs://namenode01:8020/user/julio/PFM/CDRs_parquet_np on driver
16/01/19 21:43:33 INFO spark.SparkContext: Starting job: table at code.scala:13
16/01/19 21:43:33 INFO scheduler.DAGScheduler: Got job 0 (table at
code.scala:13) with 8 output partitions
16/01/19 21:43:33 INFO scheduler.DAGScheduler: Final stage:
ResultStage 0(table at code.scala:13)
16/01/19 21:43:33 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/01/19 21:43:33 INFO scheduler.DAGScheduler: Missing parents: List()
16/01/19 21:43:33 INFO scheduler.DAGScheduler: Submitting ResultStage
0 (MapPartitionsRDD[1] at table at code.scala:13), which has no
missing parents
Exception in thread "dag-scheduler-event-loop"
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "dag-scheduler-event-loop"
Exception in thread "SparkListenerBus"
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "SparkListenerBus"

It happens with whatever program I build, for example:

object MainClass {
    def main(args:Array[String]):Unit = {
        val conf = (new org.apache.spark.SparkConf()
                         .setAppName("test")
                         )

        val sc = new org.apache.spark.SparkContext(conf)
        val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

        val rdd = (sqlContext.read.table("cdrs_np")
                        .na.drop(how="any")
                        .map(_.toSeq.map(y=>y.toString))
                        .map(x=>(x.head,x.tail)
                        )

        rdd.saveAsTextFile(args(0))
    }
}

The command I'm using in spark-submit is the following:

spark-submit --master yarn \
             --deploy-mode cluster \
             --driver-memory 1G \
             --executor-memory 3000m \
             --executor-cores 1 \
             --num-executors 8 \
             --class MainClass \
             spark-yarn-cluster-test_2.10-0.1.jar \
             hdfs://namenode01/etl/test

I've got more than enough resources in my cluster in order to run the job
(in fact, the exact same command works in --deploy-mode client).

I tried to increase yarn.app.mapreduce.am.resource.mb to 2GB, but that
didn't work. I guess there is another parameter I should tweak, but I have
not found any info whatsoever in the Internet.

I'm running Spark 1.5.2 and YARN from Hadoop 2.6.0-cdh5.5.1.


Any help would be greatly appreciated!

Thank you.

-- 
Julio Antonio Soto de Vicente

OOM on yarn-cluster mode

Reply via email to