[ https://issues.apache.org/jira/browse/SPARK-35334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-35334: ------------------------------------ Assignee: Attila Zsolt Piros (was: Apache Spark) > Spark should be more resilient to intermittent K8s flakiness > ------------------------------------------------------------ > > Key: SPARK-35334 > URL: https://issues.apache.org/jira/browse/SPARK-35334 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.2.0 > Reporter: Attila Zsolt Piros > Assignee: Attila Zsolt Piros > Priority: Major > > Internal K8s errors such as an etcdserver leader election is propagated to > the API client and could cause serious issues in Spark, like: > {noformat} > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: GET at: > https://kubernetes.default.svc/api/v1/namespaces/dex-app-bl24w4z9/pods/sparkpi-10-fcd3f6781a874212-driver. > Message: etcdserver: > leader changed. Received status: Status(apiVersion=v1, code=500, > details=null, kind=Status, message=etcdserver: leader changed, > metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {noformat} > First I try to fix in kubernetes-client by adding retries with exponential > backoff: > https://github.com/fabric8io/kubernetes-client/issues/3087 > If I manage it then this will could be just version update and introducing > some new configs in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org