Re: A scene with unstable Spark performance

Sungwoo Park Tue, 17 May 2022 22:28:13 -0700

The problem you describe is the motivation for developing Spark on MR3.
>From the blog article (https://www.datamonad.com/post/2021-08-18-spark-mr3/
):


*The main motivation for developing Spark on MR3 is to allow multiple Spark
applications to share compute resources such as Yarn containers or
Kubernetes Pods.*

The problem is due to an architectural limitation of Spark, and I guess
fixing the problem would require a heavy rewrite of Spark core. When we
developed Spark on MR3, we were not aware of any attempt being made
elsewhere (in academia and industry) to address this limitation.

A potential workaround might be to implement a custom Spark application
that manages the submission of two groups of Spark jobs and controls their
execution (similarly to Spark Thrift Server). Not sure if this approach
would fix your problem, though.

If you are interested, see the webpage of Spark on MR3:
https://mr3docs.datamonad.com/docs/spark/

We have released Spark 3.0.1 on MR3, and Spark 3.2.1 on MR3 is under
development. For Spark 3.0.1 on MR3, no change is made to Spark and MR3 is
used as an add-on. The main application of MR3 is Hive on MR3, but Spark on
MR3 is equally ready for production.

Thank you,

--- Sungwoo

>

Re: A scene with unstable Spark performance

Reply via email to