Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Binwei Yang
Gluten java part is pretty stable now. The development is more in the c++ code, velox code as well as Clickhouse backend. The SPIP doesn't plan to introduce whole Gluten stack into Spark. But the way to serialize Spark physical plan and be able to send to native backend, through JNI or gRPC.

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Binwei Yang
We (Gluten and Arrow guys) actually do planned to put the plan conversation in the substrait-java repo. But to me it makes more sense to put it as part of Spark repo. Native library and accelerator support will be more and more import in future. On 2024/04/10 08:29:08 Wenchen Fan wrote: >

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread L. C. Hsieh
+1 for Wenchen's point. I don't see a strong reason to pull these transformations into Spark instead of keeping them in third party packages/projects. On Wed, Apr 10, 2024 at 5:32 AM Wenchen Fan wrote: > > It's good to reduce duplication between different native accelerators of > Spark, and

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Mich Talebzadeh
I read the SPIP. I have a number of ;points if I may - Maturity of Gluten: as the excerpt mentions, Gluten is a project, and its feature set and stability IMO are still under development. Integrating a non-core component could introduce risks if it is not fully mature - Complexity: integrating

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Wenchen Fan
It's good to reduce duplication between different native accelerators of Spark, and AFAIK there is already a project trying to solve it: https://substrait.io/ I'm not sure why we need to do this inside Spark, instead of doing the unification for a wider scope (for all engines, not only Spark).

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and really anything that might reduce code duplication among the ~4 or so different accelerators. Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Jia. I have the same questions like the previous Weiting's thread. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under

SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Ke Jia
Apache Spark currently lacks an official mechanism to support cross-platform execution of physical plans. The Gluten project offers a mechanism that utilizes the Substrait standard to convert and optimize Spark's physical plans. By introducing Gluten's plan conversion, validation, and fallback