So it seems that this problem was related to 
http://apache-spark-developers-list.1001551.n3.nabble.com/Lost-executor-on-YARN-ALS-iterations-td7916.html
 and increasing the executor memory worked for me.
______________________________________________________________


Hi,

I am getting ExecutorLostFailure when I run spark on YARN and in map I perform 
very long tasks (couple of hours). Error Log is below.

Do you know if it is possible to set something to make it possible for Spark to 
perform these very long running jobs in map?

Thank you very much for any advice.

Best regards,
Jan 
 
Spark log:
4533,931: [GC 394578K->20882K(1472000K), 0,0226470 secs]
Traceback (most recent call last):
  File "/home/hadoop/spark_stuff/spark_lda.py", line 112, in <module>
    models.saveAsTextFile(sys.argv[1])
  File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile
    keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
  File 
"/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 
538, in __call__
  File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in 
stage 0.0 failed 4 times, most recent failure: Lost task 28.3 in stage 0.0 (TID 
41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor 
lost)
Driver stacktrace:
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
 
 
Yarn log:
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:41091 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:39160 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:45058 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-241.us-west-2.compute.internal:54111 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-238.us-west-2.compute.internal:45772 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-241.us-west-2.compute.internal:59509 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-238.us-west-2.compute.internal:35720 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509)
14/11/08 08:21:11 ERROR network.ConnectionManager: Corresponding 
SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) not found
14/11/08 08:21:11 INFO cluster.YarnClientSchedulerBackend: Executor 10 
disconnected, so removing it
14/11/08 08:21:11 ERROR cluster.YarnClientClusterScheduler: Lost executor 10 on 
ip-172-16-1-241.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:11 INFO scheduler.TaskSetManager: Re-queueing tasks for 10 from 
TaskSet 0.0
14/11/08 08:21:11 WARN scheduler.TaskSetManager: Lost task 28.0 in stage 0.0 
(TID 28, ip-172-16-1-241.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:11 INFO scheduler.DAGScheduler: Executor lost: 10 (epoch 0)
14/11/08 08:21:11 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 10 from BlockManagerMaster.
14/11/08 08:21:11 INFO storage.BlockManagerMaster: Removed 10 successfully in 
removeExecutor
14/11/08 08:21:20 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823)
14/11/08 08:21:20 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823)
14/11/08 08:21:20 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823)
14/11/08 08:21:20 INFO cluster.YarnClientSchedulerBackend: Executor 5 
disconnected, so removing it
14/11/08 08:21:20 ERROR cluster.YarnClientClusterScheduler: Lost executor 5 on 
ip-172-16-1-194.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:20 INFO scheduler.TaskSetManager: Re-queueing tasks for 5 from 
TaskSet 0.0
14/11/08 08:21:20 WARN scheduler.TaskSetManager: Lost task 21.0 in stage 0.0 
(TID 21, ip-172-16-1-194.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:20 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch 1)
14/11/08 08:21:20 INFO network.ConnectionManager: key already cancelled ? 
sun.nio.ch.SelectionKeyImpl@3bb633cd
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at 
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:289)
        at 
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
14/11/08 08:21:20 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 5 from BlockManagerMaster.
14/11/08 08:21:20 INFO storage.BlockManagerMaster: Removed 5 successfully in 
removeExecutor
14/11/08 08:21:21 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928)
14/11/08 08:21:21 INFO cluster.YarnClientSchedulerBackend: Executor 27 
disconnected, so removing it
14/11/08 08:21:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 27 on 
ip-172-16-1-92.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 27 from 
TaskSet 0.0
14/11/08 08:21:21 WARN scheduler.TaskSetManager: Lost task 27.0 in stage 0.0 
(TID 27, ip-172-16-1-92.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:21 INFO scheduler.DAGScheduler: Executor lost: 27 (epoch 2)
14/11/08 08:21:21 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 27 from BlockManagerMaster.
14/11/08 08:21:21 INFO storage.BlockManagerMaster: Removed 27 successfully in 
removeExecutor
14/11/08 08:21:21 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091)
14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091)
14/11/08 08:21:21 INFO cluster.YarnClientSchedulerBackend: Executor 20 
disconnected, so removing it
14/11/08 08:21:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 20 on 
ip-172-16-1-152.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 20 from 
TaskSet 0.0
14/11/08 08:21:21 WARN scheduler.TaskSetManager: Lost task 29.0 in stage 0.0 
(TID 29, ip-172-16-1-152.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:21 INFO scheduler.DAGScheduler: Executor lost: 20 (epoch 3)
14/11/08 08:21:21 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 20 from BlockManagerMaster.
14/11/08 08:21:21 INFO storage.BlockManagerMaster: Removed 20 successfully in 
removeExecutor
14/11/08 08:21:26 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269)
14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269)
14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269)
14/11/08 08:21:26 INFO cluster.YarnClientSchedulerBackend: Executor 6 
disconnected, so removing it
14/11/08 08:21:26 ERROR cluster.YarnClientClusterScheduler: Lost executor 6 on 
ip-172-16-1-23.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:26 INFO scheduler.TaskSetManager: Re-queueing tasks for 6 from 
TaskSet 0.0
14/11/08 08:21:26 WARN scheduler.TaskSetManager: Lost task 24.0 in stage 0.0 
(TID 24, ip-172-16-1-23.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:26 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 4)
14/11/08 08:21:26 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 6 from BlockManagerMaster.
14/11/08 08:21:26 INFO storage.BlockManagerMaster: Removed 6 successfully in 
removeExecutor
14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792)
14/11/08 08:21:26 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792)
14/11/08 08:21:26 ERROR network.ConnectionManager: Corresponding 
SendingConnection to 
ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792) not found
14/11/08 08:21:26 INFO cluster.YarnClientSchedulerBackend: Executor 21 
disconnected, so removing it
14/11/08 08:21:26 ERROR cluster.YarnClientClusterScheduler: Lost executor 21 on 
ip-172-16-1-90.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:26 INFO scheduler.TaskSetManager: Re-queueing tasks for 21 from 
TaskSet 0.0
14/11/08 08:21:26 WARN scheduler.TaskSetManager: Lost task 25.0 in stage 0.0 
(TID 25, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:26 INFO scheduler.DAGScheduler: Executor lost: 21 (epoch 5)
14/11/08 08:21:26 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 21 from BlockManagerMaster.
14/11/08 08:21:26 INFO storage.BlockManagerMaster: Removed 21 successfully in 
removeExecutor
14/11/08 08:21:29 INFO cluster.YarnClientSchedulerBackend: Executor 18 
disconnected, so removing it
14/11/08 08:21:29 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883)
14/11/08 08:21:29 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883)
14/11/08 08:21:29 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883)
14/11/08 08:21:29 ERROR cluster.YarnClientClusterScheduler: Lost executor 18 on 
ip-172-16-1-222.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:29 INFO scheduler.TaskSetManager: Re-queueing tasks for 18 from 
TaskSet 0.0
14/11/08 08:21:29 WARN scheduler.TaskSetManager: Lost task 26.0 in stage 0.0 
(TID 26, ip-172-16-1-222.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:29 INFO scheduler.DAGScheduler: Executor lost: 18 (epoch 6)
14/11/08 08:21:29 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 18 from BlockManagerMaster.
14/11/08 08:21:29 INFO storage.BlockManagerMaster: Removed 18 successfully in 
removeExecutor
14/11/08 08:21:30 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-194.us-west-2.compute.internal:50858/user/Executor#935992941]
 with ID 31
14/11/08 08:21:30 INFO scheduler.TaskSetManager: Starting task 26.1 in stage 
0.0 (TID 30, ip-172-16-1-194.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:30 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-194.us-west-2.compute.internal:44263 with 776.3 MB RAM
14/11/08 08:21:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-194.us-west-2.compute.internal:44263 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:33 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102)
14/11/08 08:21:33 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102)
14/11/08 08:21:33 ERROR network.ConnectionManager: Corresponding 
SendingConnection to 
ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102) not found
14/11/08 08:21:33 INFO cluster.YarnClientSchedulerBackend: Executor 26 
disconnected, so removing it
14/11/08 08:21:33 ERROR cluster.YarnClientClusterScheduler: Lost executor 26 on 
ip-172-16-1-222.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:33 INFO scheduler.TaskSetManager: Re-queueing tasks for 26 from 
TaskSet 0.0
14/11/08 08:21:33 WARN scheduler.TaskSetManager: Lost task 23.0 in stage 0.0 
(TID 23, ip-172-16-1-222.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:33 INFO scheduler.DAGScheduler: Executor lost: 26 (epoch 7)
14/11/08 08:21:33 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 26 from BlockManagerMaster.
14/11/08 08:21:33 INFO storage.BlockManagerMaster: Removed 26 successfully in 
removeExecutor
14/11/08 08:21:36 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO cluster.YarnClientSchedulerBackend: Executor 1 
disconnected, so removing it
14/11/08 08:21:36 ERROR cluster.YarnClientClusterScheduler: Lost executor 1 on 
ip-172-16-1-241.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:21:36 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from 
TaskSet 0.0
14/11/08 08:21:36 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 0.0 
(TID 22, ip-172-16-1-241.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:21:36 ERROR network.SendingConnection: Exception while reading 
SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
java.nio.channels.ClosedChannelException
        at 
sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
        at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
        at 
org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/11/08 08:21:36 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 8)
14/11/08 08:21:36 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 1 from BlockManagerMaster.
14/11/08 08:21:36 INFO storage.BlockManagerMaster: Removed 1 successfully in 
removeExecutor
14/11/08 08:21:36 INFO network.ConnectionManager: Handling connection error on 
connection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310)
14/11/08 08:21:40 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-194.us-west-2.compute.internal:58099/user/Executor#-112835629]
 with ID 34
14/11/08 08:21:40 INFO scheduler.TaskSetManager: Starting task 22.1 in stage 
0.0 (TID 31, ip-172-16-1-194.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:41 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-194.us-west-2.compute.internal:41093 with 776.3 MB RAM
14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-228.us-west-2.compute.internal:36136/user/Executor#318736262]
 with ID 32
14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 23.1 in stage 
0.0 (TID 32, ip-172-16-1-228.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:33130/user/Executor#1744030597]
 with ID 33
14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 25.1 in stage 
0.0 (TID 33, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-92.us-west-2.compute.internal:55503/user/Executor#574084779]
 with ID 35
14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 24.1 in stage 
0.0 (TID 34, ip-172-16-1-92.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-228.us-west-2.compute.internal:40128 with 776.3 MB RAM
14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:32839 with 776.3 MB RAM
14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-92.us-west-2.compute.internal:58081 with 776.3 MB RAM
14/11/08 08:21:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-194.us-west-2.compute.internal:41093 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-228.us-west-2.compute.internal:40128 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-92.us-west-2.compute.internal:58081 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:32839 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:43 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-152.us-west-2.compute.internal:34268/user/Executor#-937582169]
 with ID 36
14/11/08 08:21:43 INFO scheduler.TaskSetManager: Starting task 29.1 in stage 
0.0 (TID 35, ip-172-16-1-152.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:44 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-152.us-west-2.compute.internal:52550 with 776.3 MB RAM
14/11/08 08:21:45 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-152.us-west-2.compute.internal:52550 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:46 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:34555/user/Executor#-94727554]
 with ID 37
14/11/08 08:21:46 INFO scheduler.TaskSetManager: Starting task 27.1 in stage 
0.0 (TID 36, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:46 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-228.us-west-2.compute.internal:34471/user/Executor#1412546630]
 with ID 38
14/11/08 08:21:46 INFO scheduler.TaskSetManager: Starting task 21.1 in stage 
0.0 (TID 37, ip-172-16-1-228.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:47 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:46194 with 776.3 MB RAM
14/11/08 08:21:47 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-228.us-west-2.compute.internal:42275 with 776.3 MB RAM
14/11/08 08:21:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:46194 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-228.us-west-2.compute.internal:42275 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:21:50 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-23.us-west-2.compute.internal:37122/user/Executor#1404320204]
 with ID 39
14/11/08 08:21:51 INFO scheduler.TaskSetManager: Starting task 28.1 in stage 
0.0 (TID 38, ip-172-16-1-23.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:21:51 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-23.us-west-2.compute.internal:33106 with 776.3 MB RAM
14/11/08 08:21:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-23.us-west-2.compute.internal:33106 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:22:36 INFO cluster.YarnClientSchedulerBackend: Executor 39 
disconnected, so removing it
14/11/08 08:22:36 ERROR cluster.YarnClientClusterScheduler: Lost executor 39 on 
ip-172-16-1-23.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:22:36 INFO scheduler.TaskSetManager: Re-queueing tasks for 39 from 
TaskSet 0.0
14/11/08 08:22:36 WARN scheduler.TaskSetManager: Lost task 28.1 in stage 0.0 
(TID 38, ip-172-16-1-23.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:22:36 INFO scheduler.DAGScheduler: Executor lost: 39 (epoch 9)
14/11/08 08:22:36 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 39 from BlockManagerMaster.
14/11/08 08:22:36 INFO storage.BlockManagerMaster: Removed 39 successfully in 
removeExecutor
14/11/08 08:22:57 INFO cluster.YarnClientSchedulerBackend: Executor 36 
disconnected, so removing it
14/11/08 08:22:57 ERROR cluster.YarnClientClusterScheduler: Lost executor 36 on 
ip-172-16-1-152.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:22:57 INFO scheduler.TaskSetManager: Re-queueing tasks for 36 from 
TaskSet 0.0
14/11/08 08:22:57 WARN scheduler.TaskSetManager: Lost task 29.1 in stage 0.0 
(TID 35, ip-172-16-1-152.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:22:57 INFO scheduler.DAGScheduler: Executor lost: 36 (epoch 10)
14/11/08 08:22:57 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 36 from BlockManagerMaster.
14/11/08 08:22:57 INFO storage.BlockManagerMaster: Removed 36 successfully in 
removeExecutor
14/11/08 08:23:00 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:48033/user/Executor#-1088273404]
 with ID 40
14/11/08 08:23:00 INFO scheduler.TaskSetManager: Starting task 29.2 in stage 
0.0 (TID 39, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:23:01 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:39067 with 776.3 MB RAM
14/11/08 08:23:03 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:39067 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:23:15 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-23.us-west-2.compute.internal:48860/user/Executor#-369895446]
 with ID 41
14/11/08 08:23:15 INFO scheduler.TaskSetManager: Starting task 28.2 in stage 
0.0 (TID 40, ip-172-16-1-23.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:23:16 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-23.us-west-2.compute.internal:38093 with 776.3 MB RAM
14/11/08 08:23:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-23.us-west-2.compute.internal:38093 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:23:32 INFO cluster.YarnClientSchedulerBackend: Executor 34 
disconnected, so removing it
14/11/08 08:23:32 ERROR cluster.YarnClientClusterScheduler: Lost executor 34 on 
ip-172-16-1-194.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:23:32 INFO scheduler.TaskSetManager: Re-queueing tasks for 34 from 
TaskSet 0.0
14/11/08 08:23:32 WARN scheduler.TaskSetManager: Lost task 22.1 in stage 0.0 
(TID 31, ip-172-16-1-194.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:23:32 INFO scheduler.DAGScheduler: Executor lost: 34 (epoch 11)
14/11/08 08:23:32 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 34 from BlockManagerMaster.
14/11/08 08:23:32 INFO storage.BlockManagerMaster: Removed 34 successfully in 
removeExecutor
14/11/08 08:23:53 INFO cluster.YarnClientSchedulerBackend: Executor 41 
disconnected, so removing it
14/11/08 08:23:53 ERROR cluster.YarnClientClusterScheduler: Lost executor 41 on 
ip-172-16-1-23.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:23:53 INFO scheduler.TaskSetManager: Re-queueing tasks for 41 from 
TaskSet 0.0
14/11/08 08:23:53 WARN scheduler.TaskSetManager: Lost task 28.2 in stage 0.0 
(TID 40, ip-172-16-1-23.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:23:53 INFO scheduler.DAGScheduler: Executor lost: 41 (epoch 12)
14/11/08 08:23:53 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 41 from BlockManagerMaster.
14/11/08 08:23:53 INFO storage.BlockManagerMaster: Removed 41 successfully in 
removeExecutor
14/11/08 08:23:57 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:58017/user/Executor#2094507560]
 with ID 42
14/11/08 08:23:57 INFO scheduler.TaskSetManager: Starting task 28.3 in stage 
0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:23:58 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:41182 with 776.3 MB RAM
14/11/08 08:24:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:41182 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:24:04 INFO cluster.YarnClientSchedulerBackend: Executor 35 
disconnected, so removing it
14/11/08 08:24:04 ERROR cluster.YarnClientClusterScheduler: Lost executor 35 on 
ip-172-16-1-92.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:24:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 35 from 
TaskSet 0.0
14/11/08 08:24:04 WARN scheduler.TaskSetManager: Lost task 24.1 in stage 0.0 
(TID 34, ip-172-16-1-92.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:24:04 INFO scheduler.DAGScheduler: Executor lost: 35 (epoch 13)
14/11/08 08:24:04 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 35 from BlockManagerMaster.
14/11/08 08:24:04 INFO storage.BlockManagerMaster: Removed 35 successfully in 
removeExecutor
14/11/08 08:24:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:36395/user/Executor#-1907878650]
 with ID 43
14/11/08 08:24:17 INFO scheduler.TaskSetManager: Starting task 24.2 in stage 
0.0 (TID 42, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:24:18 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:46948 with 776.3 MB RAM
14/11/08 08:24:20 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:46948 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:24:21 INFO cluster.YarnClientSchedulerBackend: Executor 40 
disconnected, so removing it
14/11/08 08:24:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 40 on 
ip-172-16-1-90.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:24:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 40 from 
TaskSet 0.0
14/11/08 08:24:21 WARN scheduler.TaskSetManager: Lost task 29.2 in stage 0.0 
(TID 39, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:24:21 INFO scheduler.DAGScheduler: Executor lost: 40 (epoch 14)
14/11/08 08:24:21 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 40 from BlockManagerMaster.
14/11/08 08:24:21 INFO storage.BlockManagerMaster: Removed 40 successfully in 
removeExecutor
14/11/08 08:24:31 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:34467/user/Executor#-1100688472]
 with ID 44
14/11/08 08:24:31 INFO scheduler.TaskSetManager: Starting task 29.3 in stage 
0.0 (TID 43, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:24:32 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:40126 with 776.3 MB RAM
14/11/08 08:24:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:40126 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:24:48 INFO cluster.YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:53257/user/Executor#-745380917]
 with ID 45
14/11/08 08:24:48 INFO scheduler.TaskSetManager: Starting task 22.2 in stage 
0.0 (TID 44, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 
bytes)
14/11/08 08:24:49 INFO storage.BlockManagerMasterActor: Registering block 
manager ip-172-16-1-90.us-west-2.compute.internal:46252 with 776.3 MB RAM
14/11/08 08:24:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on ip-172-16-1-90.us-west-2.compute.internal:46252 (size: 596.9 KB, 
free: 775.7 MB)
14/11/08 08:25:16 INFO cluster.YarnClientSchedulerBackend: Executor 38 
disconnected, so removing it
14/11/08 08:25:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 38 on 
ip-172-16-1-228.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:25:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 38 from 
TaskSet 0.0
14/11/08 08:25:16 WARN scheduler.TaskSetManager: Lost task 21.1 in stage 0.0 
(TID 37, ip-172-16-1-228.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:25:16 INFO scheduler.DAGScheduler: Executor lost: 38 (epoch 15)
14/11/08 08:25:16 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 38 from BlockManagerMaster.
14/11/08 08:25:16 INFO storage.BlockManagerMaster: Removed 38 successfully in 
removeExecutor
14/11/08 08:25:37 INFO cluster.YarnClientSchedulerBackend: Executor 42 
disconnected, so removing it
14/11/08 08:25:37 ERROR cluster.YarnClientClusterScheduler: Lost executor 42 on 
ip-172-16-1-90.us-west-2.compute.internal: remote Akka client disassociated
14/11/08 08:25:37 INFO scheduler.TaskSetManager: Re-queueing tasks for 42 from 
TaskSet 0.0
14/11/08 08:25:37 WARN scheduler.TaskSetManager: Lost task 28.3 in stage 0.0 
(TID 41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure 
(executor lost)
14/11/08 08:25:37 ERROR scheduler.TaskSetManager: Task 28 in stage 0.0 failed 4 
times; aborting job
14/11/08 08:25:37 INFO cluster.YarnClientClusterScheduler: Cancelling stage 0
14/11/08 08:25:37 INFO cluster.YarnClientClusterScheduler: Stage 0 was cancelled
14/11/08 08:25:37 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile at 
NativeMethodAccessorImpl.java:-2
14/11/08 08:25:37 INFO scheduler.DAGScheduler: Executor lost: 42 (epoch 16)
14/11/08 08:25:37 INFO storage.BlockManagerMasterActor: Trying to remove 
executor 42 from BlockManagerMaster.
14/11/08 08:25:37 INFO storage.BlockManagerMaster: Removed 42 successfully in 
removeExecutor

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to