When running stand-alone cluster mode job, the process hangs up randomly during 
a DataFrame flatMap or explode operation, in HiveContext:

-->> df.flatMap(r => for (n <- 1 to r.getInt(ind)) yield r)

This does not happen either with SQLContext in cluster, or Hive/SQL in local 
mode, where it works fine.

A couple minutes after the hangup, executors start dropping. I am attching the 
logs

Saif




15/10/07 12:15:19 INFO TaskSetManager: Finished task 50.0 in stage 17.0 (TID 
166) in 2511 ms on 162.101.194.47 (180/200)
15/10/07 12:15:19 INFO TaskSetManager: Finished task 66.0 in stage 17.0 (TID 
182) in 2510 ms on 162.101.194.47 (181/200)
15/10/07 12:15:19 INFO TaskSetManager: Finished task 110.0 in stage 17.0 (TID 
226) in 2505 ms on 162.101.194.47 (182/200)
15/10/07 12:15:19 INFO TaskSetManager: Finished task 74.0 in stage 17.0 (TID 
190) in 2530 ms on 162.101.194.47 (183/200)
15/10/07 12:15:19 INFO TaskSetManager: Finished task 106.0 in stage 17.0 (TID 
222) in 2530 ms on 162.101.194.47 (184/200)
15/10/07 12:20:01 WARN HeartbeatReceiver: Removing executor 2 with no recent 
heartbeats: 141447 ms exceeds timeout 120000 ms
15/10/07 12:20:01 ERROR TaskSchedulerImpl: Lost executor 2 on 162.101.194.44: 
Executor heartbeat timed out after 141447 ms
15/10/07 12:20:01 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 17.0
15/10/07 12:20:01 WARN TaskSetManager: Lost task 113.0 in stage 17.0 (TID 229, 
162.101.194.44): ExecutorLostFailure (executor 2 lost)
15/10/07 12:20:01 WARN TaskSetManager: Lost task 73.0 in stage 17.0 (TID 189, 
162.101.194.44): ExecutorLostFailure (executor 2 lost)
15/10/07 12:20:01 WARN TaskSetManager: Lost task 81.0 in stage 17.0 (TID 197, 
162.101.194.44): ExecutorLostFailure (executor 2 lost)
15/10/07 12:20:01 INFO TaskSetManager: Starting task 81.1 in stage 17.0 (TID 
316, 162.101.194.45, PROCESS_LOCAL, 2045 bytes)
15/10/07 12:20:01 INFO TaskSetManager: Starting task 73.1 in stage 17.0 (TID 
317, 162.101.194.44, PROCESS_LOCAL, 2045 bytes)
15/10/07 12:20:01 INFO TaskSetManager: Starting task 113.1 in stage 17.0 (TID 
318, 162.101.194.48, PROCESS_LOCAL, 2045 bytes)
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Requesting to kill 
executor(s) 2
15/10/07 12:20:01 INFO DAGScheduler: Executor lost: 2 (epoch 4)
15/10/07 12:20:01 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 
from BlockManagerMaster.
15/10/07 12:20:01 INFO BlockManagerMasterEndpoint: Removing block manager 
BlockManagerId(2, 162.101.194.44, 42537)
15/10/07 12:20:01 INFO BlockManagerMaster: Removed 2 successfully in 
removeExecutor
15/10/07 12:20:01 INFO ShuffleMapStage: ShuffleMapStage 15 is now unavailable 
on executor 2 (1/2, false)
15/10/07 12:20:01 INFO ShuffleMapStage: ShuffleMapStage 16 is now unavailable 
on executor 2 (8/16, false)
15/10/07 12:20:01 INFO DAGScheduler: Host added was in lost list earlier: 
162.101.194.44
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/69 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/69 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/69 is now RUNNING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/69 is now LOADING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/69 is now EXITED (Command exited with code 1)
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/69 removed: Command exited with code 1
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 69
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/70 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/70 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/70 is now RUNNING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/70 is now LOADING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/70 is now EXITED (Command exited with code 1)
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/70 removed: Command exited with code 1
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 70
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/71 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/71 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/71 is now LOADING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/71 is now RUNNING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/71 is now EXITED (Command exited with code 1)
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/71 removed: Command exited with code 1
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 71
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/72 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/72 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/72 is now LOADING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/72 is now RUNNING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/72 is now EXITED (Command exited with code 1)
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/72 removed: Command exited with code 1
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 72
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/73 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/73 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/73 is now LOADING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/73 is now RUNNING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/73 is now EXITED (Command exited with code 1)
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/73 removed: Command exited with code 1
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 73
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/74 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/74 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/74 is now LOADING
15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/74 is now RUNNING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/74 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/74 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 74
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/75 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/75 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/75 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/75 is now RUNNING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/75 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/75 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 75
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/76 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/76 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/76 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/76 is now RUNNING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/76 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/76 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 76
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/77 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/77 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/77 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/77 is now RUNNING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/77 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/77 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 77
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/78 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/78 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/78 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/78 is now RUNNING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/78 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/78 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 78
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/79 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/79 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/79 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/79 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/79 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 79
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/80 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/80 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/80 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/80 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/80 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 80
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/81 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/81 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/81 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/81 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/81 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 81
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/82 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/82 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/82 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/82 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/82 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 82
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/83 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/83 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/83 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/83 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/83 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 83
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/84 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/84 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/84 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/84 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/84 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 84
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/85 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/85 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/85 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/85 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/85 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 85
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/86 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/86 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/86 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/86 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/86 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 86
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: 
app-20151007121501-0022/87 on worker-20151007063932-162.101.194.44-57091 
(162.101.194.44:57091) with 32 cores
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20151007121501-0022/87 on hostPort 162.101.194.44:57091 with 32 cores, 
100.0 GB RAM
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/87 is now LOADING
15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: 
app-20151007121501-0022/87 is now EXITED (Command exited with code 1)
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor 
app-20151007121501-0022/87 removed: Command exited with code 1
15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove 
non-existent executor 87
15/10/07 12:20:17 INFO BlockManagerMasterEndpoint: Registering block manager 
162.101.194.44:42537 with 51.8 GB RAM, BlockManagerId(2, 162.101.194.44, 42537)
15/10/07 12:20:17 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 
162.101.194.44:42537 (size: 10.0 KB, free: 51.8 GB)
15/10/07 12:20:18 ERROR TaskSchedulerImpl: Lost executor 2 on 162.101.194.44: 
remote Rpc client disassociated
15/10/07 12:20:18 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 17.0
15/10/07 12:20:18 WARN TaskSetManager: Lost task 73.1 in stage 17.0 (TID 317, 
162.101.194.44): ExecutorLostFailure (executor 2 lost)
15/10/07 12:20:18 INFO DAGScheduler: Executor lost: 2 (epoch 6)
15/10/07 12:20:18 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 
from BlockManagerMaster.
15/10/07 12:20:18 INFO BlockManagerMasterEndpoint: Removing block manager 
BlockManagerId(2, 162.101.194.44, 42537)
15/10/07 12:20:18 INFO BlockManagerMaster: Removed 2 successfully in 
removeExecutor
15/10/07 12:20:18 INFO TaskSetManager: Starting task 73.2 in stage 17.0 (TID 
319, 162.101.194.48, PROCESS_LOCAL, 2045 bytes)
15/10/07 12:20:18 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkExecutor@162.101.194.44:43010] has failed, address is 
now gated for [5000] ms. Reason: [Disassociated]
15/10/07 12:20:34 INFO CoarseGrainedExecutorBackend: Got assigned task 316
15/10/07 12:20:34 INFO Executor: Running task 81.1 in stage 17.0 (TID 316)
15/10/07 12:20:34 INFO ShuffleBlockFetcherIterator: Getting 16 non-empty blocks 
out of 16 blocks
15/10/07 12:20:34 INFO TransportClientFactory: Found inactive connection to 
/162.101.194.44:42537, creating a new one.
15/10/07 12:20:34 ERROR RetryingBlockFetcher: Exception while beginning fetch 
of 8 outstanding blocks
java.io.IOException: Failed to connect to /162.101.194.44:42537
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
        at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
        at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88)
        at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
        at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
        at 
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:97)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:152)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:265)
        at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:112)
        at 
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:43)
        at 
org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:71)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /162.101.194.44:42537
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
        at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        ... 1 more
15/10/07 12:20:34 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 8 
outstanding blocks after 5000 ms
15/10/07 12:20:34 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 
15 ms
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to