Spark on tin boxes like Google Dataproc or AWS EC2 often utilise YARN
resource manager. YARN  is the most widely used resource manager not just
for Spark but for other artefacts as well. On-premise YARN is used
extensively. In Cloud it is also used widely in Infrastructure as a Service
such as Google Dataproc which I mentioned.

With regard to your questions:

Q1: What are the causes and reasons for Spark on K8s to be slower than
Serverful?
--> It should be noted that Spark on Kubernetes is work in progress and as
of now there is future work outstanding.  It is not in parity with Spark on
Yarn

Q2: How or is there a scenario to show the most apparent difference in
performance and cost of these two environments (Serverless (K8S) and
Serverful (Traditional server)?
--> Simple. One experiment is worth 10 hypothesis  Install spark on
serverful and install spark on k8s and run the same workload and observer
the performance through SPARK GUI for the same workload

See this article of mine to help you with some features. A bit dated but
still covers concepts

Spark on Kubernetes, A Practitioner’s Guide
<https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-/?trackingId=01Fj2t28THWpLEldU0Q9ow%3D%3D>

HTH

Mich Talebzadeh,
Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 27 Jul 2023 at 18:20, Trường Trần Phan An <truong...@vlute.edu.vn>
wrote:

> Hi all,
>
> I am learning about the performance difference of Spark when performing a
> JOIN problem on Serverless (K8S) and Serverful (Traditional server)
> environments.
>
> Through experiment, Spark on K8s tends to run slower than Serverful.
> Through understanding the architecture, I know that Spark runs on K8s as
> Containers (Pods) so it takes a certain time to initialize, but when I look
> at each job, stage, and task, Spark K8s tends to be slower. Serverful.
>
> *I have some questions:*
> Q1: What are the causes and reasons for Spark on K8s to be slower than
> Serverful?
> Q2: How or is there a scenario to show the most apparent difference in
> performance and cost of these two environments (Serverless (K8S) and
> Serverful (Traditional server)?
>
> Thank you so much!
>
> Best regards,
> Truong
>
>
>

Reply via email to