[ 
https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra updated SPARK-2586:
---------------------------------

    Labels: tachyon  (was: )

> Lack of information to figure out connection to Tachyon master is inactive/ 
> down
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-2586
>                 URL: https://issues.apache.org/jira/browse/SPARK-2586
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Henry Saputra
>              Labels: tachyon
>
> When you running Spark with Tachyon, when the connection to Tachyon master is 
> down (due to problem in network or the Master node is down) there is no clear 
> log or error message to indicate it.
> Here is sample stack running SparkTachyonPi example with Tachyon connecting:
> 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to 
> a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
> 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra
> 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hsaputra)
> 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started
> 14/07/15 16:43:11 INFO Remoting: Starting remoting
> 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
> 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
> 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker
> 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster
> 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164311-e63c
> 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id 
> = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204)
> 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
> 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager
> 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM
> 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager
> 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at 
> http://10.64.5.148:53205
> 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e
> 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at 
> http://office-5-148.pa.gopivotal.com:4040
> 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from 
> SCDynamicStore
> 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/07/15 16:43:12 INFO SparkContext: Added JAR 
> examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
> http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
> with timestamp 1405467792813
> 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master 
> spark://henry-pivotal.local:7077...
> 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at 
> SparkTachyonPi.scala:43
> 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at 
> SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
> 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at 
> SparkTachyonPi.scala:43)
> 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List()
> 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List()
> 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map 
> at SparkTachyonPi.scala:39), which has no missing parents
> 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
> (MappedRDD[1] at map at SparkTachyonPi.scala:39)
> 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
> 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20140715164313-0000
> 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: 
> app-20140715164313-0000/0 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164313-0000/0 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164313-0000/0 is now RUNNING
> 14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256]
>  with ID 0
> 14/07/15 16:43:15 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:0 as 1428 bytes in 
> 3 ms
> 14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:1 as 1428 bytes in 
> 1 ms
> 14/07/15 16:43:15 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:53218 with 294.9 MB RAM
> 14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_1 on tachyon on 
> office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
> 14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_0 on tachyon on 
> office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
> 14/07/15 16:43:16 INFO TaskSetManager: Finished TID 0 in 1307 ms on 
> office-5-148.pa.gopivotal.com (progress: 1/2)
> 14/07/15 16:43:16 INFO TaskSetManager: Finished TID 1 in 1300 ms on 
> office-5-148.pa.gopivotal.com (progress: 2/2)
> 14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 0)
> 14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 1)
> 14/07/15 16:43:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool 
> 14/07/15 16:43:16 INFO DAGScheduler: Stage 0 (reduce at 
> SparkTachyonPi.scala:43) finished in 3.336 s
> 14/07/15 16:43:16 INFO SparkContext: Job finished: reduce at 
> SparkTachyonPi.scala:43, took 3.413498 s
> Pi is roughly 3.14254
> 14/07/15 16:43:16 INFO SparkUI: Stopped Spark web UI at 
> http://office-5-148.pa.gopivotal.com:4040
> 14/07/15 16:43:16 INFO DAGScheduler: Stopping DAGScheduler
> 14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Shutting down all 
> executors
> 14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Asking each executor to 
> shut down
> 14/07/15 16:43:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor 
> stopped!
> 14/07/15 16:43:17 INFO ConnectionManager: Selector thread was interrupted!
> 14/07/15 16:43:17 INFO ConnectionManager: ConnectionManager stopped
> 14/07/15 16:43:17 INFO MemoryStore: MemoryStore cleared
> 14/07/15 16:43:17 INFO BlockManager: BlockManager stopped
> 14/07/15 16:43:17 INFO BlockManagerMasterActor: Stopping BlockManagerMaster
> 14/07/15 16:43:17 INFO BlockManagerMaster: BlockManagerMaster stopped
> 14/07/15 16:43:17 INFO SparkContext: Successfully stopped SparkContext
> 14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting 
> down remote daemon.
> 14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote 
> daemon shut down; proceeding with flushing remote transports.
> Process finished with exit code 0
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> And here is the stack when Tachyon cannot be reached:
> 14/07/15 16:49:17 INFO Utils: Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 14/07/15 16:49:17 WARN Utils: Your hostname, henry-pivotal.local resolves to 
> a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
> 14/07/15 16:49:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 14/07/15 16:49:17 INFO SecurityManager: Changing view acls to: hsaputra
> 14/07/15 16:49:17 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hsaputra)
> 14/07/15 16:49:17 INFO Slf4jLogger: Slf4jLogger started
> 14/07/15 16:49:17 INFO Remoting: Starting remoting
> 14/07/15 16:49:17 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
> 14/07/15 16:49:17 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
> 14/07/15 16:49:17 INFO SparkEnv: Registering MapOutputTracker
> 14/07/15 16:49:17 INFO SparkEnv: Registering BlockManagerMaster
> 14/07/15 16:49:17 INFO DiskBlockManager: Created local directory at 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164917-bf9e
> 14/07/15 16:49:17 INFO ConnectionManager: Bound socket to port 54542 with id 
> = ConnectionManagerId(office-5-148.pa.gopivotal.com,54542)
> 14/07/15 16:49:17 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
> 14/07/15 16:49:17 INFO BlockManagerMaster: Trying to register BlockManager
> 14/07/15 16:49:17 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54542 with 2.1 GB RAM
> 14/07/15 16:49:17 INFO BlockManagerMaster: Registered BlockManager
> 14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:49:17 INFO HttpBroadcast: Broadcast server started at 
> http://10.64.5.148:54543
> 14/07/15 16:49:17 INFO HttpFileServer: HTTP File server directory is 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-400178c7-8c6e-4e44-9610-926bd1f84877
> 14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:49:18 INFO SparkUI: Started SparkUI at 
> http://office-5-148.pa.gopivotal.com:4040
> 2014-07-15 16:49:18.144 java[39346:1903] Unable to load realm info from 
> SCDynamicStore
> 14/07/15 16:49:18 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/07/15 16:49:18 INFO SparkContext: Added JAR 
> examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
> http://10.64.5.148:54544/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
> with timestamp 1405468158551
> 14/07/15 16:49:18 INFO AppClient$ClientActor: Connecting to master 
> spark://henry-pivotal.local:7077...
> 14/07/15 16:49:18 INFO SparkContext: Starting job: reduce at 
> SparkTachyonPi.scala:43
> 14/07/15 16:49:18 INFO DAGScheduler: Got job 0 (reduce at 
> SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
> 14/07/15 16:49:18 INFO DAGScheduler: Final stage: Stage 0(reduce at 
> SparkTachyonPi.scala:43)
> 14/07/15 16:49:18 INFO DAGScheduler: Parents of final stage: List()
> 14/07/15 16:49:18 INFO DAGScheduler: Missing parents: List()
> 14/07/15 16:49:18 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map 
> at SparkTachyonPi.scala:39), which has no missing parents
> 14/07/15 16:49:18 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
> (MappedRDD[1] at map at SparkTachyonPi.scala:39)
> 14/07/15 16:49:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
> 14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20140715164918-0001
> 14/07/15 16:49:18 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/0 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/0 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:18 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/0 is now RUNNING
> 14/07/15 16:49:20 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54548/user/Executor#-221675010]
>  with ID 0
> 14/07/15 16:49:20 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 1 ms
> 14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:20 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54553 with 294.9 MB RAM
> 14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 0 disconnected, 
> so removing it
> 14/07/15 16:49:26 ERROR TaskSchedulerImpl: Lost executor 0 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:26 INFO TaskSetManager: Re-queueing tasks for 0 from TaskSet 
> 0.0
> 14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/0 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:26 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
> 14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 
> app-20140715164918-0001/0 removed: Command exited with code 55
> 14/07/15 16:49:26 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/07/15 16:49:26 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/1 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/1 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:26 INFO DAGScheduler: Executor lost: 0 (epoch 0)
> 14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/1 is now RUNNING
> 14/07/15 16:49:26 INFO BlockManagerMasterActor: Trying to remove executor 0 
> from BlockManagerMaster.
> 14/07/15 16:49:26 INFO BlockManagerMaster: Removed 0 successfully in 
> removeExecutor
> 14/07/15 16:49:28 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54573/user/Executor#1564333236]
>  with ID 1
> 14/07/15 16:49:28 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:0 as TID 2 on 
> executor 1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:1 as TID 3 on 
> executor 1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:28 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54578 with 294.9 MB RAM
> 14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 1 disconnected, 
> so removing it
> 14/07/15 16:49:34 ERROR TaskSchedulerImpl: Lost executor 1 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:34 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 14/07/15 16:49:34 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
> 14/07/15 16:49:34 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
> 14/07/15 16:49:34 INFO DAGScheduler: Executor lost: 1 (epoch 1)
> 14/07/15 16:49:34 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 14/07/15 16:49:34 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> 14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/1 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 
> app-20140715164918-0001/1 removed: Command exited with code 55
> 14/07/15 16:49:34 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/2 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/2 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/2 is now RUNNING
> 14/07/15 16:49:37 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54599/user/Executor#-557403228]
>  with ID 2
> 14/07/15 16:49:37 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:1 as TID 4 on 
> executor 2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 1 ms
> 14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:0 as TID 5 on 
> executor 2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:37 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54604 with 294.9 MB RAM
> 14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 2 disconnected, 
> so removing it
> 14/07/15 16:49:43 ERROR TaskSchedulerImpl: Lost executor 2 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:43 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 
> 0.0
> 14/07/15 16:49:43 WARN TaskSetManager: Lost TID 5 (task 0.0:0)
> 14/07/15 16:49:43 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
> 14/07/15 16:49:43 INFO DAGScheduler: Executor lost: 2 (epoch 2)
> 14/07/15 16:49:43 INFO BlockManagerMasterActor: Trying to remove executor 2 
> from BlockManagerMaster.
> 14/07/15 16:49:43 INFO BlockManagerMaster: Removed 2 successfully in 
> removeExecutor
> 14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/2 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 
> app-20140715164918-0001/2 removed: Command exited with code 55
> 14/07/15 16:49:43 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/3 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/3 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/3 is now RUNNING
> 14/07/15 16:49:45 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54627/user/Executor#-1697612197]
>  with ID 3
> 14/07/15 16:49:45 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:1 as TID 6 on 
> executor 3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:0 as TID 7 on 
> executor 3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:45 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54634 with 294.9 MB RAM
> 14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Executor 3 disconnected, 
> so removing it
> 14/07/15 16:49:51 ERROR TaskSchedulerImpl: Lost executor 3 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:51 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet 
> 0.0
> 14/07/15 16:49:51 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
> 14/07/15 16:49:51 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting 
> job
> 14/07/15 16:49:51 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool 
> 14/07/15 16:49:51 INFO TaskSchedulerImpl: Cancelling stage 0
> 14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/3 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:51 INFO DAGScheduler: Failed to run reduce at 
> SparkTachyonPi.scala:43
> Exception in thread "main" 14/07/15 16:49:51 INFO 
> SparkDeploySchedulerBackend: Executor app-20140715164918-0001/3 removed: 
> Command exited with code 55
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 
> failed 4 times, most recent failure: TID 7 on host 
> office-5-148.pa.gopivotal.com failed for unknown reason
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1046)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1030)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1028)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:632)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/07/15 16:49:51 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/4 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/4 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/4 is now RUNNING
> 14/07/15 16:49:51 INFO DAGScheduler: Executor lost: 3 (epoch 3)
> 14/07/15 16:49:51 INFO BlockManagerMasterActor: Trying to remove executor 3 
> from BlockManagerMaster.
> 14/07/15 16:49:51 INFO BlockManagerMaster: Removed 3 successfully in 
> removeExecutor
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to