It's good to reduce duplication between different native accelerators of
Spark, and AFAIK there is already a project trying to solve it:
https://substrait.io/

I'm not sure why we need to do this inside Spark, instead of doing
the unification for a wider scope (for all engines, not only Spark).


On Wed, Apr 10, 2024 at 10:11 AM Holden Karau <holden.ka...@gmail.com>
wrote:

> I like the idea of improving flexibility of Sparks physical plans and
> really anything that might reduce code duplication among the ~4 or so
> different accelerators.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Tue, Apr 9, 2024 at 3:14 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Thank you for sharing, Jia.
>>
>> I have the same questions like the previous Weiting's thread.
>>
>> Do you think you can share the future milestone of Apache Gluten?
>> I'm wondering when the first stable release will come and how we can
>> coordinate across the ASF communities.
>>
>> > This project is still under active development now, and doesn't have a
>> stable release.
>> > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1
>>
>> In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of
>> support.
>> And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is
>> scheduled in October.
>>
>> For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if
>> there is something we need to do from Spark side.
>>
> +1 I think any changes need to target 4.0
>
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Tue, Apr 9, 2024 at 12:22 AM Ke Jia <kejia1...@gmail.com> wrote:
>>
>>> Apache Spark currently lacks an official mechanism to support
>>> cross-platform execution of physical plans. The Gluten project offers a
>>> mechanism that utilizes the Substrait standard to convert and optimize
>>> Spark's physical plans. By introducing Gluten's plan conversion,
>>> validation, and fallback mechanisms into Spark, we can significantly
>>> enhance the portability and interoperability of Spark's physical plans,
>>> enabling them to operate across a broader spectrum of execution
>>> environments without requiring users to migrate, while also improving
>>> Spark's execution efficiency through the utilization of Gluten's advanced
>>> optimization techniques. And the integration of Gluten into Spark has
>>> already shown significant performance improvements with ClickHouse and
>>> Velox backends and has been successfully deployed in production by several
>>> customers.
>>>
>>> References:
>>> JIAR Ticket <https://issues.apache.org/jira/browse/SPARK-47773>
>>> SPIP Doc
>>> <https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing>
>>>
>>> Your feedback and comments are welcome and appreciated.  Thanks.
>>>
>>> Thanks,
>>> Jia Ke
>>>
>>

Reply via email to