Spark on tin boxes like Google Dataproc or AWS EC2 often utilise YARN resource manager. YARN is the most widely used resource manager not just for Spark but for other artefacts as well. On-premise YARN is used extensively. In Cloud it is also used widely in Infrastructure as a Service such as Google Dataproc which I mentioned.
With regard to your questions: Q1: What are the causes and reasons for Spark on K8s to be slower than Serverful? --> It should be noted that Spark on Kubernetes is work in progress and as of now there is future work outstanding. It is not in parity with Spark on Yarn Q2: How or is there a scenario to show the most apparent difference in performance and cost of these two environments (Serverless (K8S) and Serverful (Traditional server)? --> Simple. One experiment is worth 10 hypothesis Install spark on serverful and install spark on k8s and run the same workload and observer the performance through SPARK GUI for the same workload See this article of mine to help you with some features. A bit dated but still covers concepts Spark on Kubernetes, A Practitioner’s Guide <https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/?trackingId=01Fj2t28THWpLEldU0Q9ow%3D%3D> HTH Mich Talebzadeh, Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 27 Jul 2023 at 18:20, Trường Trần Phan An <truong...@vlute.edu.vn> wrote: > Hi all, > > I am learning about the performance difference of Spark when performing a > JOIN problem on Serverless (K8S) and Serverful (Traditional server) > environments. > > Through experiment, Spark on K8s tends to run slower than Serverful. > Through understanding the architecture, I know that Spark runs on K8s as > Containers (Pods) so it takes a certain time to initialize, but when I look > at each job, stage, and task, Spark K8s tends to be slower. Serverful. > > *I have some questions:* > Q1: What are the causes and reasons for Spark on K8s to be slower than > Serverful? > Q2: How or is there a scenario to show the most apparent difference in > performance and cost of these two environments (Serverless (K8S) and > Serverful (Traditional server)? > > Thank you so much! > > Best regards, > Truong > > >