Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

Aditya Fri, 23 Sep 2016 00:13:21 -0700

Hi Yash,

What is your total cluster memory and number of cores?

Problem might be with the number of executors you are allocating. Thelogs shows it as 168510 which is on very high side. Try reducing yourexecutors.


On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:

Hi All,
I have a spark job which runs over a huge bulk of data with Dynamicallocation enabled.
The job takes some 15 minutes to start up and fails as soon as it starts*.
Is there anything I can check to debug this problem. There is not alot of information in logs for the exact cause but here is somesnapshot below.
Thanks All.
* - by starts I mean when it shows something on the spark web ui,before that its just blank page.
Logs here -

{code}
16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporterthread with (heartbeat : 3000, initial allocation : 200) intervals16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total numberof 168510 executor(s).16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executorcontainers, each with 2 cores and 6758 MB memory including 614 MB overhead16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 2216/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1916/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1816/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1216/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1116/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 2016/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1516/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 716/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 816/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1616/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 2116/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 616/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1316/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1416/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 916/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 316/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1716/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 116/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 1016/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 416/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 216/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason fornon-existent executor 516/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1time(s) in a row.
java.lang.StackOverflowError
atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)atscala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
{code}

... <trimmed logs>

{code}
16/09/23 06:33:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:Attempted to get executor loss reason for executor id 7 at RPC address, but got no response. Marking as slave lost.org.apache.spark.SparkException: Fail to find loss reason fornon-existent executor 7atorg.apache.spark.deploy.yarn.YarnAllocator.enqueueGetLossReasonRequest(YarnAllocator.scala:554)atorg.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint$$anonfun$receiveAndReply$1.applyOrElse(ApplicationMaster.scala:632)atorg.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:104)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
atorg.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}






---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

Reply via email to