Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: Versioning of Spark Operator

2024-04-09 Thread L. C. Hsieh
For Spark Operator, I think the answer is yes. According to my impression, Spark Operator should be Spark version-agnostic. Zhou, please correct me if I'm wrong. I am not sure about the Spark Connector Go client, but if it is going to talk with Spark cluster, I guess it should be still related to

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Do we have a compatibility matrix of Apache Connect Go client already, Bo? Specifically, I'm wondering which versions the existing Apache Spark Connect Go repository is able to support as of now. We know that it is supposed to be compatible always, but do we have a way to verify that actually

Re: Versioning of Spark Operator

2024-04-09 Thread bo yang
Thanks Liang-Chi for the Spark Operator work, and also the discussion here! For Spark Operator and Connector Go Client, I am guessing they need to support multiple versions of Spark? e.g. same Spark Operator may support running multiple versions of Spark, and Connector Go Client might support

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Ya, that's simple and possible. However, it may cause many confusions because it implies that new `Spark K8s Operator 4.0.0` and `Spark Connect Go 4.0.0` follow the same `Semantic Versioning` policy like Apache Spark 4.0.0. In addition, `Versioning` is directly related to the Release Cadence.

Re: Versioning of Spark Operator

2024-04-09 Thread DB Tsai
Aligning with Spark releases is sensible, as it allows us to guarantee that the Spark operator functions correctly with the new version while also maintaining support for previous versions. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Apr 9, 2024, at 9:45 AM, Mridul

Re: Versioning of Spark Operator

2024-04-09 Thread Mridul Muralidharan
I am trying to understand if we can simply align with Spark's version for this ? Makes the release and jira management much more simpler for developers and intuitive for users. Regards, Mridul On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun wrote: > Hi, Liang-Chi. > > Thank you for leading

Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Hi, Liang-Chi. Thank you for leading Apache Spark K8s operator as a shepherd. I took a look at `Apache Spark Connect Go` repo mentioned in the thread. Sadly, there is no release at all and no activity since last 6 months. It seems to be the first time for Apache Spark community to consider

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Jia. I have the same questions like the previous Weiting's thread. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Weiting. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under active development now, and doesn't have a stable release. >

Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread WeitingChen
Hi all, We are excited to introduce a new Apache incubating project called Gluten. Gluten serves as a middleware layer designed to offload Spark to native engines like Velox or ClickHouse. For more detailed information, please visit the project repository at

SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Ke Jia
Apache Spark currently lacks an official mechanism to support cross-platform execution of physical plans. The Gluten project offers a mechanism that utilizes the Substrait standard to convert and optimize Spark's physical plans. By introducing Gluten's plan conversion, validation, and fallback