Re: OOM on yarn-cluster mode

Julio Antonio Soto de Vicente Tue, 19 Jan 2016 16:58:57 -0800

Hi,

I tried with --driver-memory 16G (more than enough to read a simple parquet 
table), but the problem still persists.


Everything works fine in yarn-client.

--
Julio Antonio Soto de Vicente

> El 19 ene 2016, a las 22:18, Saisai Shao <sai.sai.s...@gmail.com> escribió:
> 
> You could try increase the driver memory by "--driver-memory", looks like the 
> OOM is came from driver side, so the simple solution is to increase the 
> memory of driver.
> 
>> On Tue, Jan 19, 2016 at 1:15 PM, Julio Antonio Soto <ju...@esbet.es> wrote:
>> Hi,
>> 
>> I'm having trouble when uploadig spark jobs in yarn-cluster mode. While the 
>> job works and completes in yarn-client mode, I hit the following error when 
>> using spark-submit in yarn-cluster (simplified):
>> 16/01/19 21:43:31 INFO hive.metastore: Connected to metastore.
>> 16/01/19 21:43:32 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 16/01/19 21:43:32 INFO session.SessionState: Created local directory: 
>> /yarn/nm/usercache/julio/appcache/application_1453120455858_0040/container_1453120455858_0040_01_000001/tmp/77350a02-d900-4c84-9456-134305044d21_resources
>> 16/01/19 21:43:32 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/nobody/77350a02-d900-4c84-9456-134305044d21
>> 16/01/19 21:43:32 INFO session.SessionState: Created local directory: 
>> /yarn/nm/usercache/julio/appcache/application_1453120455858_0040/container_1453120455858_0040_01_000001/tmp/nobody/77350a02-d900-4c84-9456-134305044d21
>> 16/01/19 21:43:32 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/nobody/77350a02-d900-4c84-9456-134305044d21/_tmp_space.db
>> 16/01/19 21:43:32 INFO parquet.ParquetRelation: Listing 
>> hdfs://namenode01:8020/user/julio/PFM/CDRs_parquet_np on driver
>> 16/01/19 21:43:33 INFO spark.SparkContext: Starting job: table at 
>> code.scala:13
>> 16/01/19 21:43:33 INFO scheduler.DAGScheduler: Got job 0 (table at 
>> code.scala:13) with 8 output partitions
>> 16/01/19 21:43:33 INFO scheduler.DAGScheduler: Final stage: ResultStage 
>> 0(table at code.scala:13)
>> 16/01/19 21:43:33 INFO scheduler.DAGScheduler: Parents of final stage: List()
>> 16/01/19 21:43:33 INFO scheduler.DAGScheduler: Missing parents: List()
>> 16/01/19 21:43:33 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
>> (MapPartitionsRDD[1] at table at code.scala:13), which has no missing parents
>> Exception in thread "dag-scheduler-event-loop" 
>> Exception: java.lang.OutOfMemoryError thrown from the 
>> UncaughtExceptionHandler in thread "dag-scheduler-event-loop"
>> Exception in thread "SparkListenerBus" 
>> Exception: java.lang.OutOfMemoryError thrown from the 
>> UncaughtExceptionHandler in thread "SparkListenerBus"
>> It happens with whatever program I build, for example:
>> 
>> object MainClass {
>>     def main(args:Array[String]):Unit = {
>>         val conf = (new org.apache.spark.SparkConf()
>>                          .setAppName("test")
>>                          )
>> 
>>         val sc = new org.apache.spark.SparkContext(conf)
>>         val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> 
>>         val rdd = (sqlContext.read.table("cdrs_np")
>>                         .na.drop(how="any")
>>                         .map(_.toSeq.map(y=>y.toString))
>>                         .map(x=>(x.head,x.tail)
>>                         )
>> 
>>         rdd.saveAsTextFile(args(0))
>>     }
>> }
>> 
>> The command I'm using in spark-submit is the following:
>> 
>> spark-submit --master yarn \
>>              --deploy-mode cluster \
>>              --driver-memory 1G \
>>              --executor-memory 3000m \
>>              --executor-cores 1 \
>>              --num-executors 8 \
>>              --class MainClass \
>>              spark-yarn-cluster-test_2.10-0.1.jar \
>>              hdfs://namenode01/etl/test
>> 
>> I've got more than enough resources in my cluster in order to run the job 
>> (in fact, the exact same command works in --deploy-mode client).
>> 
>> I tried to increase yarn.app.mapreduce.am.resource.mb to 2GB, but that 
>> didn't work. I guess there is another parameter I should tweak, but I have 
>> not found any info whatsoever in the Internet.
>> 
>> I'm running Spark 1.5.2 and YARN from Hadoop 2.6.0-cdh5.5.1.
>> 
>> 
>> Any help would be greatly appreciated!
>> 
>> Thank you.
>> 
>> -- 
>> Julio Antonio Soto de Vicente
>

Re: OOM on yarn-cluster mode

Reply via email to