Re: CANNOT FIND ADDRESS
no luck :(! Still observing the same behavior! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17988.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
Tr this spark.storage.memoryFraction0.9 On 31 Oct 2014 20:27, akhandeshi ami.khande...@gmail.com wrote: Thanks for the pointers! I did tried but didn't seem to help... In my latest try, I am doing spark-submit local But see the same message in spark App ui (4040) localhost CANNOT FIND ADDRESS In the logs, I see a lot of in-memory map to disk. I don't understand why that is the case. There should be over 35 gb ram available for input that is not significantly large. It may be link to the performance issues that I am seeing. I have another post for seeing advice on that. It seems, I am not able to tune spark sufficiently to execute my process successfully. 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 206 spilling in-memory map of 1777 MB to disk (2 times so far) 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 393 MB to disk (1 time so f ar) 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 392 MB to disk (2 times so far) 14/10/31 13:45:14 INctorsBySecId();^ZFO ExternalAppendOnlyMap: Thread 230 spilling in-memory map of 554 MB to disk (2 times so far) 14/10/31 13:45:15 INFO ExternalAppendOnlyMap: Thread 235 spilling in-memory map of 3990 MB to disk (1 time so far) 14/10/31 13:45:15 INFO ExternalAppendOnlyMap: Thread 236 spilling in-memory map of 2667 MB to disk (1 time so far) 14/10/31 13:45:17 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 825 MB to disk (3 times so far) 14/10/31 13:45:24 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 4618 MB to disk (2 times so far) 14/10/31 13:45:26 INFO ExternalAppendOnlyMap: Thread 233 spilling in-memory map of 15869 MB to disk (1 time so far) 14/10/31 13:45:37 INFO ExternalAppendOnlyMap: Thread 206 spilling in-memory map of 3026 MB to disk (3 times so far) 14/10/31 13:45:48 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 401 MB to disk (3 times so far) 14/10/31 13:45:48 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 392 MB to disk (4 times so My spark properties are: NameValue spark.akka.frameSize50 spark.akka.timeout 900 spark.app.name Simple File Merge Application spark.core.connection.ack.wait.timeout 900 spark.default.parallelism 10 spark.driver.host spark-single.c.fi-mdd-poc.internal spark.driver.memory 35g spark.driver.port 40255 spark.eventLog.enabled true spark.fileserver.urihttp://10.240.106.135:59255 spark.jars file:/home/ami_khandeshi_gmail_com/SparkExample-1.0.jar spark.masterlocal[16] spark.scheduler.modeFIFO spark.shuffle.consolidateFiles true spark.storage.memoryFraction0.3 spark.tachyonStore.baseDir /tmp spark.tachyonStore.folderName spark-21ad0fd2-2177-48ce-9242-8dbb33f2a1f1 spark.tachyonStore.url tachyon://mdd:19998 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17824.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
Thanks for the pointers! I did tried but didn't seem to help... In my latest try, I am doing spark-submit local But see the same message in spark App ui (4040) localhost CANNOT FIND ADDRESS In the logs, I see a lot of in-memory map to disk. I don't understand why that is the case. There should be over 35 gb ram available for input that is not significantly large. It may be link to the performance issues that I am seeing. I have another post for seeing advice on that. It seems, I am not able to tune spark sufficiently to execute my process successfully. 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 206 spilling in-memory map of 1777 MB to disk (2 times so far) 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 393 MB to disk (1 time so f ar) 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 392 MB to disk (2 times so far) 14/10/31 13:45:14 INctorsBySecId();^ZFO ExternalAppendOnlyMap: Thread 230 spilling in-memory map of 554 MB to disk (2 times so far) 14/10/31 13:45:15 INFO ExternalAppendOnlyMap: Thread 235 spilling in-memory map of 3990 MB to disk (1 time so far) 14/10/31 13:45:15 INFO ExternalAppendOnlyMap: Thread 236 spilling in-memory map of 2667 MB to disk (1 time so far) 14/10/31 13:45:17 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 825 MB to disk (3 times so far) 14/10/31 13:45:24 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 4618 MB to disk (2 times so far) 14/10/31 13:45:26 INFO ExternalAppendOnlyMap: Thread 233 spilling in-memory map of 15869 MB to disk (1 time so far) 14/10/31 13:45:37 INFO ExternalAppendOnlyMap: Thread 206 spilling in-memory map of 3026 MB to disk (3 times so far) 14/10/31 13:45:48 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 401 MB to disk (3 times so far) 14/10/31 13:45:48 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 392 MB to disk (4 times so My spark properties are: NameValue spark.akka.frameSize50 spark.akka.timeout 900 spark.app.name Simple File Merge Application spark.core.connection.ack.wait.timeout 900 spark.default.parallelism 10 spark.driver.host spark-single.c.fi-mdd-poc.internal spark.driver.memory 35g spark.driver.port 40255 spark.eventLog.enabled true spark.fileserver.urihttp://10.240.106.135:59255 spark.jars file:/home/ami_khandeshi_gmail_com/SparkExample-1.0.jar spark.masterlocal[16] spark.scheduler.modeFIFO spark.shuffle.consolidateFiles true spark.storage.memoryFraction0.3 spark.tachyonStore.baseDir /tmp spark.tachyonStore.folderName spark-21ad0fd2-2177-48ce-9242-8dbb33f2a1f1 spark.tachyonStore.url tachyon://mdd:19998 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17824.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
CANNOT FIND ADDRESS occurs when your executor has crashed. I'll look further down where it shows each task and see if you see any tasks failed. Then you can examine the error log of that executor and see why it died. On Wed, Oct 29, 2014 at 9:35 AM, akhandeshi ami.khande...@gmail.com wrote: SparkApplication UI shows that one of the executor Cannot find Addresss Aggregated Metrics by Executor Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle ReadShuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 mddworker1.c.fi-mdd-poc.internal:42197 0 ms0 0 0 0.0 B 136.1 MB184.9 MB 146.8 GB135.4 MB 1 CANNOT FIND ADDRESS 0 ms0 0 0 0.0 B 87.4 MB 142.0 MB61.4 GB 81.4 MB I also see following in one of the executor logs for which the driver may have lost communication. 14/10/29 13:18:33 WARN : Master_Client Heartbeat last execution took 90859 ms. Longer than the FIXED_EXECUTION_INTERVAL_MS 5000 14/10/29 13:18:33 WARN : WorkerClientToWorkerHeartbeat last execution took 90859 ms. Longer than the FIXED_EXECUTION_INTERVAL_MS 1000 14/10/29 13:18:33 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:362) I have also seen other variation of timeouts 14/10/29 06:21:05 WARN SendingConnection: Error finishing connection to mddworker1.c.fi-mdd-poc.internal/10.240.179.241:40442 java.net.ConnectException: Connection refused 14/10/29 06:21:05 ERROR BlockManager: Failed to report broadcast_6_piece0 to master; giving up. or 14/10/29 07:23:40 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [10 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:218) at org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58) at org.apache.spark.storage.BlockManager.org $apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:310) at org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:190) at org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:188) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at org.apache.spark.util.TimeStampedHashMap.foreach(TimeStampedHashMap.scala:107) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.storage.BlockManager.reportAllBlocks(BlockManager.scala:188) at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:207) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:366) How do I track down what is causing this problem? Any suggestion on solution, debugging or workaround will be helpful! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
Thanks...hmm It is seems to be a timeout issue perhaps?? Not sure what is causing it? or how to debug? I see following error message... 4/10/29 13:26:04 ERROR ContextCleaner: Error cleaning broadcast 9 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$ $unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/29 13:26:04 WARN BlockManagerMaster: Failed to remove broadcast 9 with removeFromMaster = true - Timed o ut} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17646.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
Can you try setting the following while creating the sparkContext and see if the issue still exists? .set(spark.core.connection.ack.wait.timeout,900) .set(spark.akka.frameSize,50) .set(spark.akka.timeout,900) Looks like your executor is stuck on GC Pause. Thanks Best Regards On Wed, Oct 29, 2014 at 9:20 PM, akhandeshi ami.khande...@gmail.com wrote: Thanks...hmm It is seems to be a timeout issue perhaps?? Not sure what is causing it? or how to debug? I see following error message... 4/10/29 13:26:04 ERROR ContextCleaner: Error cleaning broadcast 9 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$ $unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/29 13:26:04 WARN BlockManagerMaster: Failed to remove broadcast 9 with removeFromMaster = true - Timed o ut} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17646.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org