When running stand-alone cluster mode job, the process hangs up randomly during a DataFrame flatMap or explode operation, in HiveContext:
-->> df.flatMap(r => for (n <- 1 to r.getInt(ind)) yield r) This does not happen either with SQLContext in cluster, or Hive/SQL in local mode, where it works fine. A couple minutes after the hangup, executors start dropping. I am attching the logs Saif
15/10/07 12:15:19 INFO TaskSetManager: Finished task 50.0 in stage 17.0 (TID 166) in 2511 ms on 162.101.194.47 (180/200) 15/10/07 12:15:19 INFO TaskSetManager: Finished task 66.0 in stage 17.0 (TID 182) in 2510 ms on 162.101.194.47 (181/200) 15/10/07 12:15:19 INFO TaskSetManager: Finished task 110.0 in stage 17.0 (TID 226) in 2505 ms on 162.101.194.47 (182/200) 15/10/07 12:15:19 INFO TaskSetManager: Finished task 74.0 in stage 17.0 (TID 190) in 2530 ms on 162.101.194.47 (183/200) 15/10/07 12:15:19 INFO TaskSetManager: Finished task 106.0 in stage 17.0 (TID 222) in 2530 ms on 162.101.194.47 (184/200) 15/10/07 12:20:01 WARN HeartbeatReceiver: Removing executor 2 with no recent heartbeats: 141447 ms exceeds timeout 120000 ms 15/10/07 12:20:01 ERROR TaskSchedulerImpl: Lost executor 2 on 162.101.194.44: Executor heartbeat timed out after 141447 ms 15/10/07 12:20:01 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 17.0 15/10/07 12:20:01 WARN TaskSetManager: Lost task 113.0 in stage 17.0 (TID 229, 162.101.194.44): ExecutorLostFailure (executor 2 lost) 15/10/07 12:20:01 WARN TaskSetManager: Lost task 73.0 in stage 17.0 (TID 189, 162.101.194.44): ExecutorLostFailure (executor 2 lost) 15/10/07 12:20:01 WARN TaskSetManager: Lost task 81.0 in stage 17.0 (TID 197, 162.101.194.44): ExecutorLostFailure (executor 2 lost) 15/10/07 12:20:01 INFO TaskSetManager: Starting task 81.1 in stage 17.0 (TID 316, 162.101.194.45, PROCESS_LOCAL, 2045 bytes) 15/10/07 12:20:01 INFO TaskSetManager: Starting task 73.1 in stage 17.0 (TID 317, 162.101.194.44, PROCESS_LOCAL, 2045 bytes) 15/10/07 12:20:01 INFO TaskSetManager: Starting task 113.1 in stage 17.0 (TID 318, 162.101.194.48, PROCESS_LOCAL, 2045 bytes) 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Requesting to kill executor(s) 2 15/10/07 12:20:01 INFO DAGScheduler: Executor lost: 2 (epoch 4) 15/10/07 12:20:01 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 15/10/07 12:20:01 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, 162.101.194.44, 42537) 15/10/07 12:20:01 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor 15/10/07 12:20:01 INFO ShuffleMapStage: ShuffleMapStage 15 is now unavailable on executor 2 (1/2, false) 15/10/07 12:20:01 INFO ShuffleMapStage: ShuffleMapStage 16 is now unavailable on executor 2 (8/16, false) 15/10/07 12:20:01 INFO DAGScheduler: Host added was in lost list earlier: 162.101.194.44 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/69 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/69 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/69 is now RUNNING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/69 is now LOADING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/69 is now EXITED (Command exited with code 1) 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/69 removed: Command exited with code 1 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 69 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/70 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/70 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/70 is now RUNNING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/70 is now LOADING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/70 is now EXITED (Command exited with code 1) 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/70 removed: Command exited with code 1 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 70 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/71 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/71 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/71 is now LOADING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/71 is now RUNNING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/71 is now EXITED (Command exited with code 1) 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/71 removed: Command exited with code 1 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 71 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/72 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/72 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/72 is now LOADING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/72 is now RUNNING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/72 is now EXITED (Command exited with code 1) 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/72 removed: Command exited with code 1 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 72 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/73 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/73 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/73 is now LOADING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/73 is now RUNNING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/73 is now EXITED (Command exited with code 1) 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/73 removed: Command exited with code 1 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 73 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/74 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:01 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/74 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/74 is now LOADING 15/10/07 12:20:01 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/74 is now RUNNING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/74 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/74 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 74 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/75 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/75 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/75 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/75 is now RUNNING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/75 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/75 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 75 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/76 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/76 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/76 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/76 is now RUNNING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/76 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/76 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 76 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/77 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/77 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/77 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/77 is now RUNNING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/77 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/77 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 77 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/78 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/78 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/78 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/78 is now RUNNING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/78 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/78 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 78 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/79 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/79 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/79 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/79 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/79 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 79 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/80 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/80 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/80 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/80 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/80 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 80 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/81 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/81 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/81 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/81 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/81 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 81 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/82 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/82 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/82 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/82 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/82 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 82 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/83 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/83 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/83 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/83 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/83 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 83 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/84 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/84 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/84 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/84 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/84 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 84 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/85 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/85 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/85 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/85 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/85 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 85 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/86 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/86 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/86 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/86 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/86 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 86 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor added: app-20151007121501-0022/87 on worker-20151007063932-162.101.194.44-57091 (162.101.194.44:57091) with 32 cores 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20151007121501-0022/87 on hostPort 162.101.194.44:57091 with 32 cores, 100.0 GB RAM 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/87 is now LOADING 15/10/07 12:20:02 INFO AppClient$ClientEndpoint: Executor updated: app-20151007121501-0022/87 is now EXITED (Command exited with code 1) 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Executor app-20151007121501-0022/87 removed: Command exited with code 1 15/10/07 12:20:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 87 15/10/07 12:20:17 INFO BlockManagerMasterEndpoint: Registering block manager 162.101.194.44:42537 with 51.8 GB RAM, BlockManagerId(2, 162.101.194.44, 42537) 15/10/07 12:20:17 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on 162.101.194.44:42537 (size: 10.0 KB, free: 51.8 GB) 15/10/07 12:20:18 ERROR TaskSchedulerImpl: Lost executor 2 on 162.101.194.44: remote Rpc client disassociated 15/10/07 12:20:18 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 17.0 15/10/07 12:20:18 WARN TaskSetManager: Lost task 73.1 in stage 17.0 (TID 317, 162.101.194.44): ExecutorLostFailure (executor 2 lost) 15/10/07 12:20:18 INFO DAGScheduler: Executor lost: 2 (epoch 6) 15/10/07 12:20:18 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 15/10/07 12:20:18 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, 162.101.194.44, 42537) 15/10/07 12:20:18 INFO BlockManagerMaster: Removed 2 successfully in removeExecutor 15/10/07 12:20:18 INFO TaskSetManager: Starting task 73.2 in stage 17.0 (TID 319, 162.101.194.48, PROCESS_LOCAL, 2045 bytes) 15/10/07 12:20:18 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@162.101.194.44:43010] has failed, address is now gated for [5000] ms. Reason: [Disassociated] 15/10/07 12:20:34 INFO CoarseGrainedExecutorBackend: Got assigned task 316 15/10/07 12:20:34 INFO Executor: Running task 81.1 in stage 17.0 (TID 316) 15/10/07 12:20:34 INFO ShuffleBlockFetcherIterator: Getting 16 non-empty blocks out of 16 blocks 15/10/07 12:20:34 INFO TransportClientFactory: Found inactive connection to /162.101.194.44:42537, creating a new one. 15/10/07 12:20:34 ERROR RetryingBlockFetcher: Exception while beginning fetch of 8 outstanding blocks java.io.IOException: Failed to connect to /162.101.194.44:42537 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156) at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:88) at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120) at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:97) at org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:152) at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:265) at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:112) at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:43) at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: /162.101.194.44:42537 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more 15/10/07 12:20:34 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 8 outstanding blocks after 5000 ms 15/10/07 12:20:34 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 15 ms
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org