+1 for Wenchen's point.

I don't see a strong reason to pull these transformations into Spark
instead of keeping them in third party packages/projects.

On Wed, Apr 10, 2024 at 5:32 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>
> It's good to reduce duplication between different native accelerators of 
> Spark, and AFAIK there is already a project trying to solve it: 
> https://substrait.io/
>
> I'm not sure why we need to do this inside Spark, instead of doing the 
> unification for a wider scope (for all engines, not only Spark).
>
>
> On Wed, Apr 10, 2024 at 10:11 AM Holden Karau <holden.ka...@gmail.com> wrote:
>>
>> I like the idea of improving flexibility of Sparks physical plans and really 
>> anything that might reduce code duplication among the ~4 or so different 
>> accelerators.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Tue, Apr 9, 2024 at 3:14 AM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
>>>
>>> Thank you for sharing, Jia.
>>>
>>> I have the same questions like the previous Weiting's thread.
>>>
>>> Do you think you can share the future milestone of Apache Gluten?
>>> I'm wondering when the first stable release will come and how we can 
>>> coordinate across the ASF communities.
>>>
>>> > This project is still under active development now, and doesn't have a 
>>> > stable release.
>>> > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1
>>>
>>> In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of 
>>> support.
>>> And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is 
>>> scheduled in October.
>>>
>>> For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if there 
>>> is something we need to do from Spark side.
>>
>> +1 I think any changes need to target 4.0
>>>
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Apr 9, 2024 at 12:22 AM Ke Jia <kejia1...@gmail.com> wrote:
>>>>
>>>> Apache Spark currently lacks an official mechanism to support 
>>>> cross-platform execution of physical plans. The Gluten project offers a 
>>>> mechanism that utilizes the Substrait standard to convert and optimize 
>>>> Spark's physical plans. By introducing Gluten's plan conversion, 
>>>> validation, and fallback mechanisms into Spark, we can significantly 
>>>> enhance the portability and interoperability of Spark's physical plans, 
>>>> enabling them to operate across a broader spectrum of execution 
>>>> environments without requiring users to migrate, while also improving 
>>>> Spark's execution efficiency through the utilization of Gluten's advanced 
>>>> optimization techniques. And the integration of Gluten into Spark has 
>>>> already shown significant performance improvements with ClickHouse and 
>>>> Velox backends and has been successfully deployed in production by several 
>>>> customers.
>>>>
>>>> References:
>>>> JIAR Ticket
>>>> SPIP Doc
>>>>
>>>> Your feedback and comments are welcome and appreciated.  Thanks.
>>>>
>>>> Thanks,
>>>> Jia Ke

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to