For testing purpose can you run with fix number of executors and try. May be 12 
executors for testing and let know the status.


Get Outlook for Android






On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma" <yash...@gmail.com> wrote:










Thanks Aditya, appreciate the help.
I had the exact thought about the huge number of executors requested.I am going 
with the dynamic executors and not specifying the number of executors. Are you 
suggesting that I should limit the number of executors when the dynamic 
allocator requests for more number of executors.
Its a 12 node EMR cluster and has more than a Tb of memory. 


On Fri, Sep 23, 2016 at 5:12 PM, Aditya <aditya.calangut...@augmentiq.co.in> 
wrote:
Hi Yash,



What is your total cluster memory and number of cores?

Problem might be with the number of executors you are allocating. The logs 
shows it as 168510 which is on very high side. Try reducing your executors.



On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:


Hi All,

I have a spark job which runs over a huge bulk of data with Dynamic allocation 
enabled.

The job takes some 15 minutes to start up and fails as soon as it starts*.



Is there anything I can check to debug this problem. There is not a lot of 
information in logs for the exact cause but here is some snapshot below.



Thanks All.



* - by starts I mean when it shows something on the spark web ui, before that 
its just blank page.



Logs here -



{code}

16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter thread with 
(heartbeat : 3000, initial allocation : 200) intervals

16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number of 168510 
executor(s).

16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor containers, 
each with 2 cores and 6758 MB memory including 614 MB overhead

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 22

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 19

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 18

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 12

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 11

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 20

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 15

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 7

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 8

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 16

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 21

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 6

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 13

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 14

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 9

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 3

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 17

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 1

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 10

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 4

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 2

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 5

16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1 time(s) in a 
row.

java.lang.StackOverflowError

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

{code}



... <trimmed logs>



{code}

16/09/23 06:33:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to 
get executor loss reason for executor id 7 at RPC address , but got no 
response. Marking as slave lost.

org.apache.spark.SparkException: Fail to find loss reason for non-existent 
executor 7

        at 
org.apache.spark.deploy.yarn.YarnAllocator.enqueueGetLossReasonRequest(YarnAllocator.scala:554)

        at 
org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint$$anonfun$receiveAndReply$1.applyOrElse(ApplicationMaster.scala:632)

        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:104)

        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)

        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)

        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

{code}



















Reply via email to