"CANNOT FIND ADDRESS" occurs when your executor has crashed. I'll look
further down where it shows each task and see if you see any tasks failed.
Then you can examine the error log of that executor and see why it died.

On Wed, Oct 29, 2014 at 9:35 AM, akhandeshi <ami.khande...@gmail.com> wrote:

> SparkApplication UI shows that one of the executor "Cannot find Addresss"
> Aggregated Metrics by Executor
> Executor ID     Address Task Time       Total Tasks     Failed Tasks
> Succeeded Tasks Input
> Shuffle Read    Shuffle Write   Shuffle Spill (Memory)  Shuffle Spill
> (Disk)
> 0       mddworker1.c.fi-mdd-poc.internal:42197  0 ms    0       0       0
>      0.0 B   136.1 MB        184.9 MB
> 146.8 GB        135.4 MB
> 1       CANNOT FIND ADDRESS     0 ms    0       0       0       0.0 B
>  87.4 MB 142.0 MB        61.4 GB 81.4 MB
>
> I also see following in one of the executor logs for which the driver may
> have lost communication.
>
> 14/10/29 13:18:33 WARN : Master_Client Heartbeat last execution took 90859
> ms. Longer than  the FIXED_EXECUTION_INTERVAL_MS 5000
> 14/10/29 13:18:33 WARN : WorkerClientToWorkerHeartbeat last execution took
> 90859 ms. Longer than  the FIXED_EXECUTION_INTERVAL_MS 1000
> 14/10/29 13:18:33 WARN AkkaUtils: Error sending message in 1 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>         at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at
>
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at
> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
>         at
> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:362)
>
> I have also seen other variation of timeouts
>
> 14/10/29 06:21:05 WARN SendingConnection: Error finishing connection to
> mddworker1.c.fi-mdd-poc.internal/10.240.179.241:40442
> java.net.ConnectException: Connection refused
> 14/10/29 06:21:05 ERROR BlockManager: Failed to report broadcast_6_piece0
> to
> master; giving up.
>
> or
>
> 14/10/29 07:23:40 WARN AkkaUtils: Error sending message in 1 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
>         at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at
>
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at
> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
>         at
>
> org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:218)
>         at
>
> org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58)
>         at
> org.apache.spark.storage.BlockManager.org
> $apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:310)
>         at
>
> org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:190)
>         at
>
> org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:188)
>         at
>
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>         at
>
> org.apache.spark.util.TimeStampedHashMap.foreach(TimeStampedHashMap.scala:107)
>         at
>
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>         at
>
> org.apache.spark.storage.BlockManager.reportAllBlocks(BlockManager.scala:188)
>         at
> org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:207)
>         at
> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:366)
>
> How do I track down what is causing this problem?  Any suggestion on
> solution, debugging or workaround will be helpful!
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to