Gluten java part is pretty stable now. The development is more in the c++ code,
velox code as well as Clickhouse backend.
The SPIP doesn't plan to introduce whole Gluten stack into Spark. But the way
to serialize Spark physical plan and be able to send to native backend, through
JNI or gRPC.
We (Gluten and Arrow guys) actually do planned to put the plan conversation in
the substrait-java repo. But to me it makes more sense to put it as part of
Spark repo. Native library and accelerator support will be more and more import
in future.
On 2024/04/10 08:29:08 Wenchen Fan wrote:
>
+1 for Wenchen's point.
I don't see a strong reason to pull these transformations into Spark
instead of keeping them in third party packages/projects.
On Wed, Apr 10, 2024 at 5:32 AM Wenchen Fan wrote:
>
> It's good to reduce duplication between different native accelerators of
> Spark, and
I read the SPIP. I have a number of ;points if I may
- Maturity of Gluten: as the excerpt mentions, Gluten is a project, and its
feature set and stability IMO are still under development. Integrating a
non-core component could introduce risks if it is not fully mature
- Complexity: integrating
It's good to reduce duplication between different native accelerators of
Spark, and AFAIK there is already a project trying to solve it:
https://substrait.io/
I'm not sure why we need to do this inside Spark, instead of doing
the unification for a wider scope (for all engines, not only Spark).
I like the idea of improving flexibility of Sparks physical plans and
really anything that might reduce code duplication among the ~4 or so
different accelerators.
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9
Thank you for sharing, Jia.
I have the same questions like the previous Weiting's thread.
Do you think you can share the future milestone of Apache Gluten?
I'm wondering when the first stable release will come and how we can
coordinate across the ASF communities.
> This project is still under
Apache Spark currently lacks an official mechanism to support
cross-platform execution of physical plans. The Gluten project offers a
mechanism that utilizes the Substrait standard to convert and optimize
Spark's physical plans. By introducing Gluten's plan conversion,
validation, and fallback