Dear Spark developers, I have created a simple Spark application for spark submit. It calls a machine learning library from Spark MLlib that is executed in a number of iterations that correspond to the same number of task in Spark. It seems that Spark creates an executor for each task and then removes it. The following messages indicate this in my log:
15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24463 is now RUNNING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24463 is now EXITED (Command exited with code 1) 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor app-20150929120924-0000/24463 removed: Command exited with code 1 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 24463 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: app-20150929120924-0000/24464 on worker-20150929120330-16.111.35.101-46374 (16.111.35.101:46374) with 12 cores 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150929120924-0000/24464 on hostPort 16.111.35.101:46374 with 12 cores, 30.0 GB RAM 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24464 is now LOADING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24464 is now RUNNING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24464 is now EXITED (Command exited with code 1) 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor app-20150929120924-0000/24464 removed: Command exited with code 1 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 24464 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: app-20150929120924-0000/24465 on worker-20150929120330-16.111.35.101-46374 (16.111.35.101:46374) with 12 cores 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150929120924-0000/24465 on hostPort 16.111.35.101:46374 with 12 cores, 30.0 GB RAM 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24465 is now LOADING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24465 is now EXITED (Command exited with code 1) 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor app-20150929120924-0000/24465 removed: Command exited with code 1 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 24465 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added: app-20150929120924-0000/24466 on worker-20150929120330-16.111.35.101-46374 (16.111.35.101:46374) with 12 cores 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150929120924-0000/24466 on hostPort 16.111.35.101:46374 with 12 cores, 30.0 GB RAM 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24466 is now LOADING 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-0000/24466 is now RUNNING It end up creating and removing thousands of executors. Is this a normal behavior? If I run the same code within spark-shell, this does not happen. Could you suggest what might be wrong in my setting? Best regards, Alexander