[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-22 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
SPIP doc: 
[https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]

 

Refined SPIP doc:

[https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit#heading=h.ud7930xhlsm6]

 

This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
SPIP doc: 
https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing

This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> [https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  
> Refined SPIP doc:
> [https://docs.google.com/document/d/1oY26KtqXoJJNHbAhtmVgaXSVt6NlO6t1iWYEuGvCc1s/edit#heading=h.ud7930xhlsm6]
>  
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47773:
--
Affects Version/s: 4.0.0
   (was: 3.5.1)

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> assessment of the Substrait plan against native backends to ensure 
> compatibility. In instances where validation does not succeed, Spark's native 
> operators will be deployed, with requisite transformations to adapt data 
> formats accordingly. The proposal emphasizes the centrality of the plan 
> transformation phase, positing it as the foundational step. The subsequent 
> validation and fallback procedures are slated for consideration upon the 
> successful establishment of the initial phase.
> The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
SPIP doc: 
https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing

This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> SPIP doc: 
> https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase. 

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: 
This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency.

The design proposal advocates for the incorporation of the TransformSupport 
interface and its specialized variants—LeafTransformSupport, 
UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
streamlining the conversion of different operator types into a Substrait-based 
common format. The validation phase entails a thorough assessment of the 
Substrait plan against native backends to ensure compatibility. In instances 
where validation does not succeed, Spark's native operators will be deployed, 
with requisite transformations to adapt data formats accordingly. The proposal 
emphasizes the centrality of the plan transformation phase, positing it as the 
foundational step. The subsequent validation and fallback procedures are slated 
for consideration upon the successful establishment of the initial phase. 

The integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

  was:This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency. The design 
proposal advocates for the incorporation of the TransformSupport interface and 
its specialized variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 


> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency.
> The design proposal advocates for the incorporation of the TransformSupport 
> interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: This 
[SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
 outlines the integration of Gluten's physical plan conversion, validation, and 
fallback framework into Apache Spark. The goal is to enhance Spark's 
flexibility and robustness in executing physical plans and to leverage Gluten's 
performance optimizations. Currently, Spark lacks an official cross-platform 
execution support for physical plans. Gluten's mechanism, which employs the 
Substrait standard, can convert and optimize Spark's physical plans, thus 
improving portability, interoperability, and execution efficiency. The design 
proposal advocates for the incorporation of the TransformSupport interface and 
its specialized variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers.   (was: This SPIP outlines the 
integration of Gluten's physical plan conversion, validation, and fallback 
framework into Apache Spark. The goal is to enhance Spark's flexibility and 
robustness in executing physical plans and to leverage Gluten's performance 
optimizations. Currently, Spark lacks an official cross-platform execution 
support for physical plans. Gluten's mechanism, which employs the Substrait 
standard, can convert and optimize Spark's physical plans, thus improving 
portability, interoperability, and execution efficiency. The design proposal 
advocates for the incorporation of the TransformSupport interface and its 
specialized variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. )

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This 
> [SPIP|https://docs.google.com/document/d/1v7sndtIHIBdzc4YvLPI8InXxhI7SnnAQ5HvmM2DGjVE/edit?usp=sharing]
>  outlines the integration of Gluten's physical plan conversion, validation, 
> and fallback framework into Apache Spark. The goal is to enhance Spark's 
> flexibility and robustness in executing physical plans and to leverage 
> Gluten's performance optimizations. Currently, Spark lacks an official 
> cross-platform execution support for physical plans. Gluten's mechanism, 
> which employs the Substrait standard, can convert and optimize Spark's 
> physical plans, thus improving portability, interoperability, and execution 
> efficiency. The design proposal advocates for the incorporation of the 
> TransformSupport interface and its specialized variants—LeafTransformSupport, 
> UnaryTransformSupport, and BinaryTransformSupport. These are instrumental in 
> streamlining the conversion of different operator types into a 
> Substrait-based common format. The validation phase entails a thorough 
> 

[jira] [Updated] (SPARK-47773) Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-08 Thread Ke Jia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-47773:
---
Description: This SPIP outlines the integration of Gluten's physical plan 
conversion, validation, and fallback framework into Apache Spark. The goal is 
to enhance Spark's flexibility and robustness in executing physical plans and 
to leverage Gluten's performance optimizations. Currently, Spark lacks an 
official cross-platform execution support for physical plans. Gluten's 
mechanism, which employs the Substrait standard, can convert and optimize 
Spark's physical plans, thus improving portability, interoperability, and 
execution efficiency. The design proposal advocates for the incorporation of 
the TransformSupport interface and its specialized 
variants—LeafTransformSupport, UnaryTransformSupport, and 
BinaryTransformSupport. These are instrumental in streamlining the conversion 
of different operator types into a Substrait-based common format. The 
validation phase entails a thorough assessment of the Substrait plan against 
native backends to ensure compatibility. In instances where validation does not 
succeed, Spark's native operators will be deployed, with requisite 
transformations to adapt data formats accordingly. The proposal emphasizes the 
centrality of the plan transformation phase, positing it as the foundational 
step. The subsequent validation and fallback procedures are slated for 
consideration upon the successful establishment of the initial phase.  The 
integration of Gluten into Spark has already shown significant performance 
improvements with ClickHouse and Velox backends and has been successfully 
deployed in production by several customers. 

> Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on 
> Various Native Engines
> 
>
> Key: SPARK-47773
> URL: https://issues.apache.org/jira/browse/SPARK-47773
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Ke Jia
>Priority: Major
>
> This SPIP outlines the integration of Gluten's physical plan conversion, 
> validation, and fallback framework into Apache Spark. The goal is to enhance 
> Spark's flexibility and robustness in executing physical plans and to 
> leverage Gluten's performance optimizations. Currently, Spark lacks an 
> official cross-platform execution support for physical plans. Gluten's 
> mechanism, which employs the Substrait standard, can convert and optimize 
> Spark's physical plans, thus improving portability, interoperability, and 
> execution efficiency. The design proposal advocates for the incorporation of 
> the TransformSupport interface and its specialized 
> variants—LeafTransformSupport, UnaryTransformSupport, and 
> BinaryTransformSupport. These are instrumental in streamlining the conversion 
> of different operator types into a Substrait-based common format. The 
> validation phase entails a thorough assessment of the Substrait plan against 
> native backends to ensure compatibility. In instances where validation does 
> not succeed, Spark's native operators will be deployed, with requisite 
> transformations to adapt data formats accordingly. The proposal emphasizes 
> the centrality of the plan transformation phase, positing it as the 
> foundational step. The subsequent validation and fallback procedures are 
> slated for consideration upon the successful establishment of the initial 
> phase.  The integration of Gluten into Spark has already shown significant 
> performance improvements with ClickHouse and Velox backends and has been 
> successfully deployed in production by several customers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org