[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-22 Thread Ke Jia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839497#comment-17839497
 ] 

Ke Jia commented on SPARK-47773:


We have refined the above 
[SPIP|[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]]
  in accordance with the specifications from the Spark community. The latest 
version of the SPIP is now available 
[here|https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit?usp=sharing].
  Welcome and value your suggestions and comments.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835181#comment-17835181
 ] 

Dongjoon Hyun commented on SPARK-47773:
---

Since this is a new feature which cannot affect old releases, I updated the 
Affected Version to 4.0.0.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Ke Jia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835171#comment-17835171
 ] 

Ke Jia commented on SPARK-47773:


[~viirya] Great, I have added comment permissions in the SPIP Doc. I'm very 
much looking forward to your feedback. Thanks.

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835168#comment-17835168
 ] 

L. C. Hsieh commented on SPARK-47773:
-

Can you open comment on the SPIP doc?

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org