Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Mich Talebzadeh
Thanks for your kind words Sri Well it is true that as yet spark on kubernetes is not on-par with spark on YARN in maturity and essentially spark on kubernetes is still work in progress.* So in the first place IMO one needs to think why executors are failing. What causes this behaviour? Is it the

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Cheng Pan
Spark has supported the window-based executor failure-tracking mechanism for YARN for a long time, SPARK-41210[1][2] (included in 3.5.0) extended this feature to K8s. [1] https://issues.apache.org/jira/browse/SPARK-41210 [2] https://github.com/apache/spark/pull/38732 Thanks, Cheng Pan > On

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Sri Potluri
Dear Mich, Thank you for your detailed response and the suggested approach to handling retry logic. I appreciate you taking the time to outline the method of embedding custom retry mechanisms directly into the application code. While the solution of wrapping the main logic of the Spark job in a

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Mich Talebzadeh
Went through your issue with the code running on k8s When an executor of a Spark application fails, the system attempts to maintain the desired level of parallelism by automatically recreating a new executor to replace the failed one. While this behavior is beneficial for transient errors,

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Mich Talebzadeh
Ok thanks for your clarifications Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Mich Talebzadeh
Not that I am aware of any configuration parameter in Spark classic to limit executor creation. Because of fault tolerance Spark will try to recreate failed executors. Not really that familiar with the Spark operator for k8s. There may be something there. Have you considered custom monitoring and

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Chao Sun
Hi Mich, > Also have you got some benchmark results from your tests that you can possibly share? We only have some partial benchmark results internally so far. Once shuffle and better memory management have been introduced, we plan to publish the benchmark results (at least TPC-H) in the repo.

[Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Sri Potluri
Hello Spark Community, I am currently leveraging Spark on Kubernetes, managed by the Spark Operator, for running various Spark applications. While the system generally works well, I've encountered a challenge related to how Spark applications handle executor failures, specifically in scenarios

Re: Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Jagannath Majhi
Yes I have gone through it. So let's give me the setup. More context - My jar file is in java language On Mon, Feb 19, 2024, 8:53 PM Mich Talebzadeh wrote: > Sure but first it would be beneficial to understand the way Spark works on > Kubernetes and the concept.s > > Have a look at this article

Re: Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Jagannath Majhi
I am not using any private docker image. Only I am running the jar file in EMR using spark-submit command so now I want to run this jar file in eks so can you please tell me how can I set-up for this ?? On Mon, Feb 19, 2024, 8:06 PM Jagannath Majhi < jagannath.ma...@cloud.cbnits.com> wrote: >

Re: Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Mich Talebzadeh
Sure but first it would be beneficial to understand the way Spark works on Kubernetes and the concept.s Have a look at this article of mine Spark on Kubernetes, A Practitioner’s Guide

Re: Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Mich Talebzadeh
OK you have a jar file that you want to work with when running using Spark on k8s as the execution engine (EKS) as opposed to YARN on EMR as the execution engine? Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Mich Talebzadeh
Where is your docker file? In ECR container registry. If you are going to use EKS, then it need to be accessible to all nodes of cluster When you build your docker image, put your jar under the $SPARK_HOME directory. Then add a line to your docker build file as below Here I am accessing Google

Re: Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Richard Smith
I run my Spark jobs in GCP with Google Dataproc using GCS buckets. I've not used AWS, but its EMR product offers similar functionality to Dataproc. The title of your post implies your Spark cluster runs on EKS. You might be better off using EMR, see links below: EMR

Regarding Spark on Kubernetes(EKS)

2024-02-19 Thread Jagannath Majhi
Dear Spark Community, I hope this email finds you well. I am reaching out to seek assistance and guidance regarding a task I'm currently working on involving Apache Spark. I have developed a JAR file that contains some Spark applications and functionality, and I need to run this JAR file within

Re: Re-create SparkContext of SparkSession inside long-lived Spark app

2024-02-19 Thread Mich Talebzadeh
OK got it Someone asked a similar but not related to shuffle question in Spark slack channel.. This is a simple Python code that creates shuffle files in shuffle_directory = "/tmp/spark_shuffles" and simulates working examples using a loop and periodically cleans up shuffle files older than 1

Re: Re-create SparkContext of SparkSession inside long-lived Spark app

2024-02-19 Thread Saha, Daniel
Thanks for the suggestions Mich, Jörn, and Adam. The rationale for long-lived app with loop versus submitting multiple yarn applications is mainly for simplicity. Plan to run app on an multi-tenant EMR cluster alongside other yarn apps. Implementing the loop outside the Spark app will work but