subject:"Re\: \[Spark on Kubernetes\]\: Seeking Guidance on Handling Persistent Executor Failures"

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Mich Talebzadeh

Thanks for your kind words Sri Well it is true that as yet spark on kubernetes is not on-par with spark on YARN in maturity and essentially spark on kubernetes is still work in progress.* So in the first place IMO one needs to think why executors are failing. What causes this behaviour? Is it the

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Cheng Pan

Spark has supported the window-based executor failure-tracking mechanism for YARN for a long time, SPARK-41210[1][2] (included in 3.5.0) extended this feature to K8s. [1] https://issues.apache.org/jira/browse/SPARK-41210 [2] https://github.com/apache/spark/pull/38732 Thanks, Cheng Pan > On

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Sri Potluri

Dear Mich, Thank you for your detailed response and the suggested approach to handling retry logic. I appreciate you taking the time to outline the method of embedding custom retry mechanisms directly into the application code. While the solution of wrapping the main logic of the Spark job in a

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Mich Talebzadeh

Went through your issue with the code running on k8s When an executor of a Spark application fails, the system attempts to maintain the desired level of parallelism by automatically recreating a new executor to replace the failed one. While this behavior is beneficial for transient errors,

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Mich Talebzadeh

Not that I am aware of any configuration parameter in Spark classic to limit executor creation. Because of fault tolerance Spark will try to recreate failed executors. Not really that familiar with the Spark operator for k8s. There may be something there. Have you considered custom monitoring and

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

5 matches

Site Navigation

Mail list logo

Footer information