[ 
https://issues.apache.org/jira/browse/SPARK-37910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477560#comment-17477560
 ] 

Dongjoon Hyun commented on SPARK-37910:
---------------------------------------

Did you check the official Apache Spark K8s documentation? Please check your 
pod network status.
- 
https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode-networking

BTW, according to your comment, it seems that you didn't succeed that 
configuration before in other versions, right? Have you ever succeed with the 
same configuration in some other environment, [~Silen]?
> We have been using Spark 2.4.x successfully, but the difference is that when 
> we used Spark 2.4.x, we had different deploy-mode of 'cluster' in the 
> spark-submit. 

> Spark executor self-exiting due to driver disassociated in Kubernetes with 
> client deploy-mode
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37910
>                 URL: https://issues.apache.org/jira/browse/SPARK-37910
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Petri
>            Priority: Major
>
> I have Spark driver running in a Kubernetes pod with client deploy-mode and 
> it tries to start an executor.
> Executor will fail with error:
>     \{"type":"log", "level":"ERROR", "name":"STREAMING_OTHERS", 
> "time":"2022-01-14T12:29:38.318Z", "timezone":"UTC", 
> "class":"dispatcher-Executor", 
> "method":"spark.executor.CoarseGrainedExecutorBackend.logError(73)", 
> "log":"Executor self-exiting due to : Driver 
> 192-168-39-71.mni-system.pod.cluster.local:40752 disassociated! Shutting 
> down.\n"}
> Then driver will attempt to start another executor which fails with same 
> error and this goes on and on.
> In the driver pod, I see only following errors:
>     22/01/14 12:26:32 ERROR TaskSchedulerImpl: Lost executor 1 on 
> 192.168.43.250:
>     22/01/14 12:27:16 ERROR TaskSchedulerImpl: Lost executor 2 on 
> 192.168.43.233:
>     22/01/14 12:27:59 ERROR TaskSchedulerImpl: Lost executor 3 on 
> 192.168.43.221:
>     22/01/14 12:28:43 ERROR TaskSchedulerImpl: Lost executor 4 on 
> 192.168.43.217:
>     22/01/14 12:29:27 ERROR TaskSchedulerImpl: Lost executor 5 on 
> 192.168.43.197:
>     22/01/14 12:30:10 ERROR TaskSchedulerImpl: Lost executor 6 on 
> 192.168.43.237:
>     22/01/14 12:30:53 ERROR TaskSchedulerImpl: Lost executor 7 on 
> 192.168.43.196:
>     22/01/14 12:31:42 ERROR TaskSchedulerImpl: Lost executor 8 on 
> 192.168.43.228:
>     22/01/14 12:32:31 ERROR TaskSchedulerImpl: Lost executor 9 on 
> 192.168.43.254:
>     22/01/14 12:33:14 ERROR TaskSchedulerImpl: Lost executor 10 on 
> 192.168.43.204:
>     22/01/14 12:33:57 ERROR TaskSchedulerImpl: Lost executor 11 on 
> 192.168.43.231:
> What is wrong? And how can I get executors running correctly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to