SparkApplication UI shows that one of the executor "Cannot find Addresss"
Aggregated Metrics by Executor                                                  
                        
Executor ID     Address Task Time       Total Tasks     Failed Tasks    
Succeeded Tasks Input
Shuffle Read    Shuffle Write   Shuffle Spill (Memory)  Shuffle Spill (Disk)
0       mddworker1.c.fi-mdd-poc.internal:42197  0 ms    0       0       0       
0.0 B   136.1 MB        184.9 MB
146.8 GB        135.4 MB
1       CANNOT FIND ADDRESS     0 ms    0       0       0       0.0 B   87.4 MB 
142.0 MB        61.4 GB 81.4 MB

I also see following in one of the executor logs for which the driver may
have lost communication.

14/10/29 13:18:33 WARN : Master_Client Heartbeat last execution took 90859
ms. Longer than  the FIXED_EXECUTION_INTERVAL_MS 5000
14/10/29 13:18:33 WARN : WorkerClientToWorkerHeartbeat last execution took
90859 ms. Longer than  the FIXED_EXECUTION_INTERVAL_MS 1000
14/10/29 13:18:33 WARN AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
        at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:362)

I have also seen other variation of timeouts

14/10/29 06:21:05 WARN SendingConnection: Error finishing connection to
mddworker1.c.fi-mdd-poc.internal/10.240.179.241:40442
java.net.ConnectException: Connection refused
14/10/29 06:21:05 ERROR BlockManager: Failed to report broadcast_6_piece0 to
master; giving up.

or

14/10/29 07:23:40 WARN AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
        at
org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:218)
        at
org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58)
        at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:310)
        at
org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:190)
        at
org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:188)
        at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at
org.apache.spark.util.TimeStampedHashMap.foreach(TimeStampedHashMap.scala:107)
        at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at
org.apache.spark.storage.BlockManager.reportAllBlocks(BlockManager.scala:188)
        at 
org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:207)
        at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:366)

How do I track down what is causing this problem?  Any suggestion on
solution, debugging or workaround will be helpful!







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to