[ https://issues.apache.org/jira/browse/SPARK-37910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477095#comment-17477095 ]
Petri commented on SPARK-37910: ------------------------------- We are executing spark-submit in a Kubernetes pod to start Spark driver. We specify the deploy-mode to be 'client' in spark-submit. Driver seems to start OK without any errors. Then driver pod tries to start the executor. But the executor fails soon after starting with above mentioned error of "driver disassociated". Then driver pod seems to try to start the executor again, but that also fails with same error. We have been using Spark 2.4.x successfully, but the difference is that when we used Spark 2.4.x, we had different deploy-mode of 'cluster' in the spark-submit. > Spark executor self-exiting due to driver disassociated in Kubernetes with > client deploy-mode > --------------------------------------------------------------------------------------------- > > Key: SPARK-37910 > URL: https://issues.apache.org/jira/browse/SPARK-37910 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.2.0 > Reporter: Petri > Priority: Major > > I have Spark driver running in a Kubernetes pod with client deploy-mode and > it tries to start an executor. > Executor will fail with error: > \{"type":"log", "level":"ERROR", "name":"STREAMING_OTHERS", > "time":"2022-01-14T12:29:38.318Z", "timezone":"UTC", > "class":"dispatcher-Executor", > "method":"spark.executor.CoarseGrainedExecutorBackend.logError(73)", > "log":"Executor self-exiting due to : Driver > 192-168-39-71.mni-system.pod.cluster.local:40752 disassociated! Shutting > down.\n"} > Then driver will attempt to start another executor which fails with same > error and this goes on and on. > In the driver pod, I see only following errors: > 22/01/14 12:26:32 ERROR TaskSchedulerImpl: Lost executor 1 on > 192.168.43.250: > 22/01/14 12:27:16 ERROR TaskSchedulerImpl: Lost executor 2 on > 192.168.43.233: > 22/01/14 12:27:59 ERROR TaskSchedulerImpl: Lost executor 3 on > 192.168.43.221: > 22/01/14 12:28:43 ERROR TaskSchedulerImpl: Lost executor 4 on > 192.168.43.217: > 22/01/14 12:29:27 ERROR TaskSchedulerImpl: Lost executor 5 on > 192.168.43.197: > 22/01/14 12:30:10 ERROR TaskSchedulerImpl: Lost executor 6 on > 192.168.43.237: > 22/01/14 12:30:53 ERROR TaskSchedulerImpl: Lost executor 7 on > 192.168.43.196: > 22/01/14 12:31:42 ERROR TaskSchedulerImpl: Lost executor 8 on > 192.168.43.228: > 22/01/14 12:32:31 ERROR TaskSchedulerImpl: Lost executor 9 on > 192.168.43.254: > 22/01/14 12:33:14 ERROR TaskSchedulerImpl: Lost executor 10 on > 192.168.43.204: > 22/01/14 12:33:57 ERROR TaskSchedulerImpl: Lost executor 11 on > 192.168.43.231: > What is wrong? And how can I get executors running correctly? -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org