It's good to reduce duplication between different native accelerators of Spark, and AFAIK there is already a project trying to solve it: https://substrait.io/
I'm not sure why we need to do this inside Spark, instead of doing the unification for a wider scope (for all engines, not only Spark). On Wed, Apr 10, 2024 at 10:11 AM Holden Karau <holden.ka...@gmail.com> wrote: > I like the idea of improving flexibility of Sparks physical plans and > really anything that might reduce code duplication among the ~4 or so > different accelerators. > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > On Tue, Apr 9, 2024 at 3:14 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Thank you for sharing, Jia. >> >> I have the same questions like the previous Weiting's thread. >> >> Do you think you can share the future milestone of Apache Gluten? >> I'm wondering when the first stable release will come and how we can >> coordinate across the ASF communities. >> >> > This project is still under active development now, and doesn't have a >> stable release. >> > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1 >> >> In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of >> support. >> And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is >> scheduled in October. >> >> For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if >> there is something we need to do from Spark side. >> > +1 I think any changes need to target 4.0 > >> >> Thanks, >> Dongjoon. >> >> >> On Tue, Apr 9, 2024 at 12:22 AM Ke Jia <kejia1...@gmail.com> wrote: >> >>> Apache Spark currently lacks an official mechanism to support >>> cross-platform execution of physical plans. The Gluten project offers a >>> mechanism that utilizes the Substrait standard to convert and optimize >>> Spark's physical plans. By introducing Gluten's plan conversion, >>> validation, and fallback mechanisms into Spark, we can significantly >>> enhance the portability and interoperability of Spark's physical plans, >>> enabling them to operate across a broader spectrum of execution >>> environments without requiring users to migrate, while also improving >>> Spark's execution efficiency through the utilization of Gluten's advanced >>> optimization techniques. And the integration of Gluten into Spark has >>> already shown significant performance improvements with ClickHouse and >>> Velox backends and has been successfully deployed in production by several >>> customers. >>> >>> References: >>> JIAR Ticket <https://issues.apache.org/jira/browse/SPARK-47773> >>> SPIP Doc >>> <https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing> >>> >>> Your feedback and comments are welcome and appreciated. Thanks. >>> >>> Thanks, >>> Jia Ke >>> >>