+1 for Wenchen's point. I don't see a strong reason to pull these transformations into Spark instead of keeping them in third party packages/projects.
On Wed, Apr 10, 2024 at 5:32 AM Wenchen Fan <cloud0...@gmail.com> wrote: > > It's good to reduce duplication between different native accelerators of > Spark, and AFAIK there is already a project trying to solve it: > https://substrait.io/ > > I'm not sure why we need to do this inside Spark, instead of doing the > unification for a wider scope (for all engines, not only Spark). > > > On Wed, Apr 10, 2024 at 10:11 AM Holden Karau <holden.ka...@gmail.com> wrote: >> >> I like the idea of improving flexibility of Sparks physical plans and really >> anything that might reduce code duplication among the ~4 or so different >> accelerators. >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >> >> On Tue, Apr 9, 2024 at 3:14 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >>> >>> Thank you for sharing, Jia. >>> >>> I have the same questions like the previous Weiting's thread. >>> >>> Do you think you can share the future milestone of Apache Gluten? >>> I'm wondering when the first stable release will come and how we can >>> coordinate across the ASF communities. >>> >>> > This project is still under active development now, and doesn't have a >>> > stable release. >>> > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1 >>> >>> In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of >>> support. >>> And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is >>> scheduled in October. >>> >>> For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if there >>> is something we need to do from Spark side. >> >> +1 I think any changes need to target 4.0 >>> >>> >>> Thanks, >>> Dongjoon. >>> >>> >>> On Tue, Apr 9, 2024 at 12:22 AM Ke Jia <kejia1...@gmail.com> wrote: >>>> >>>> Apache Spark currently lacks an official mechanism to support >>>> cross-platform execution of physical plans. The Gluten project offers a >>>> mechanism that utilizes the Substrait standard to convert and optimize >>>> Spark's physical plans. By introducing Gluten's plan conversion, >>>> validation, and fallback mechanisms into Spark, we can significantly >>>> enhance the portability and interoperability of Spark's physical plans, >>>> enabling them to operate across a broader spectrum of execution >>>> environments without requiring users to migrate, while also improving >>>> Spark's execution efficiency through the utilization of Gluten's advanced >>>> optimization techniques. And the integration of Gluten into Spark has >>>> already shown significant performance improvements with ClickHouse and >>>> Velox backends and has been successfully deployed in production by several >>>> customers. >>>> >>>> References: >>>> JIAR Ticket >>>> SPIP Doc >>>> >>>> Your feedback and comments are welcome and appreciated. Thanks. >>>> >>>> Thanks, >>>> Jia Ke --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org