Hi Yash,
What is your total cluster memory and number of cores?
Problem might be with the number of executors you are allocating. The
logs shows it as 168510 which is on very high side. Try reducing your
executors.
On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:
Hi All,
I have a spark job which runs over a huge bulk of data with Dynamic
allocation enabled.
The job takes some 15 minutes to start up and fails as soon as it starts*.
Is there anything I can check to debug this problem. There is not a
lot of information in logs for the exact cause but here is some
snapshot below.
Thanks All.
* - by starts I mean when it shows something on the spark web ui,
before that its just blank page.
Logs here -
{code}
16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter
thread with (heartbeat : 3000, initial allocation : 200) intervals
16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number
of 168510 executor(s).
16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
containers, each with 2 cores and 6758 MB memory including 614 MB overhead
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 22
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 19
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 18
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 12
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 11
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 20
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 15
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 7
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 8
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 16
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 21
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 6
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 13
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 14
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 9
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 3
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 17
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 1
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 10
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 4
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 2
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 5
16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1
time(s) in a row.
java.lang.StackOverflowError
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
{code}
... <trimmed logs>
{code}
16/09/23 06:33:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:
Attempted to get executor loss reason for executor id 7 at RPC address
, but got no response. Marking as slave lost.
org.apache.spark.SparkException: Fail to find loss reason for
non-existent executor 7
at
org.apache.spark.deploy.yarn.YarnAllocator.enqueueGetLossReasonRequest(YarnAllocator.scala:554)
at
org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint$$anonfun$receiveAndReply$1.applyOrElse(ApplicationMaster.scala:632)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:104)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]