Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Holden Karau
I like the idea of improving flexibility of Sparks physical plans and
really anything that might reduce code duplication among the ~4 or so
different accelerators.

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Tue, Apr 9, 2024 at 3:14 AM Dongjoon Hyun 
wrote:

> Thank you for sharing, Jia.
>
> I have the same questions like the previous Weiting's thread.
>
> Do you think you can share the future milestone of Apache Gluten?
> I'm wondering when the first stable release will come and how we can
> coordinate across the ASF communities.
>
> > This project is still under active development now, and doesn't have a
> stable release.
> > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1
>
> In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of
> support.
> And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is
> scheduled in October.
>
> For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if there
> is something we need to do from Spark side.
>
+1 I think any changes need to target 4.0

>
> Thanks,
> Dongjoon.
>
>
> On Tue, Apr 9, 2024 at 12:22 AM Ke Jia  wrote:
>
>> Apache Spark currently lacks an official mechanism to support
>> cross-platform execution of physical plans. The Gluten project offers a
>> mechanism that utilizes the Substrait standard to convert and optimize
>> Spark's physical plans. By introducing Gluten's plan conversion,
>> validation, and fallback mechanisms into Spark, we can significantly
>> enhance the portability and interoperability of Spark's physical plans,
>> enabling them to operate across a broader spectrum of execution
>> environments without requiring users to migrate, while also improving
>> Spark's execution efficiency through the utilization of Gluten's advanced
>> optimization techniques. And the integration of Gluten into Spark has
>> already shown significant performance improvements with ClickHouse and
>> Velox backends and has been successfully deployed in production by several
>> customers.
>>
>> References:
>> JIAR Ticket 
>> SPIP Doc
>> 
>>
>> Your feedback and comments are welcome and appreciated.  Thanks.
>>
>> Thanks,
>> Jia Ke
>>
>


Re: Versioning of Spark Operator

2024-04-09 Thread L. C. Hsieh
For Spark Operator, I think the answer is yes. According to my
impression, Spark Operator should be Spark version-agnostic. Zhou,
please correct me if I'm wrong.
I am not sure about the Spark Connector Go client, but if it is going
to talk with Spark cluster, I guess it should be still related to
Spark version (there is compatible issue).


> On 2024/04/09 21:35:45 bo yang wrote:
> > Thanks Liang-Chi for the Spark Operator work, and also the discussion here!
> >
> > For Spark Operator and Connector Go Client, I am guessing they need to
> > support multiple versions of Spark? e.g. same Spark Operator may support
> > running multiple versions of Spark, and Connector Go Client might support
> > multiple versions of Spark driver as well.
> >
> > How do people think of using the minimum supported Spark version as the
> > version name for Spark Operator and Connector Go Client? For example,
> > Spark Operator 3.5.x supports Spark 3.5 and above.
> >
> > Best,
> > Bo
> >
> >
> > On Tue, Apr 9, 2024 at 10:14 AM Dongjoon Hyun  wrote:
> >
> > > Ya, that's simple and possible.
> > >
> > > However, it may cause many confusions because it implies that new `Spark
> > > K8s Operator 4.0.0` and `Spark Connect Go 4.0.0` follow the same `Semantic
> > > Versioning` policy like Apache Spark 4.0.0.
> > >
> > > In addition, `Versioning` is directly related to the Release Cadence. It's
> > > unlikely for us to have `Spark K8s Operator` and `Spark Connect Go`
> > > releases at every Apache Spark maintenance release. For example, there is
> > > no commit in Spark Connect Go repository.
> > >
> > > I believe the versioning and release cadence is related to those
> > > subprojects' maturity more.
> > >
> > > Dongjoon.
> > >
> > > On 2024/04/09 16:59:40 DB Tsai wrote:
> > > >  Aligning with Spark releases is sensible, as it allows us to guarantee
> > > that the Spark operator functions correctly with the new version while 
> > > also
> > > maintaining support for previous versions.
> > > >
> > > > DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
> > > >
> > > > > On Apr 9, 2024, at 9:45 AM, Mridul Muralidharan 
> > > wrote:
> > > > >
> > > > >
> > > > >   I am trying to understand if we can simply align with Spark's
> > > version for this ?
> > > > > Makes the release and jira management much more simpler for developers
> > > and intuitive for users.
> > > > >
> > > > > Regards,
> > > > > Mridul
> > > > >
> > > > >
> > > > > On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  > > > wrote:
> > > > >> Hi, Liang-Chi.
> > > > >>
> > > > >> Thank you for leading Apache Spark K8s operator as a shepherd.
> > > > >>
> > > > >> I took a look at `Apache Spark Connect Go` repo mentioned in the
> > > thread. Sadly, there is no release at all and no activity since last 6
> > > months. It seems to be the first time for Apache Spark community to
> > > consider these sister repositories (Go and K8s Operator).
> > > > >>
> > > > >> https://github.com/apache/spark-connect-go/commits/master/
> > > > >>
> > > > >> Dongjoon.
> > > > >>
> > > > >> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> > > > >> > Hi all,
> > > > >> >
> > > > >> > We've opened the dedicated repository of Spark Kubernetes Operator,
> > > > >> > and the first PR is created.
> > > > >> > Thank you for the review from the community so far.
> > > > >> >
> > > > >> > About the versioning of Spark Operator, there are questions.
> > > > >> >
> > > > >> > As we are using Spark JIRA, when we are going to merge PRs, we need
> > > to
> > > > >> > choose a Spark version. However, the Spark Operator is versioning
> > > > >> > differently than Spark. I'm wondering how we deal with this?
> > > > >> >
> > > > >> > Not sure if Connect also has its versioning different to Spark? If
> > > so,
> > > > >> > maybe we can follow how Connect does.
> > > > >> >
> > > > >> > Can someone who is familiar with Connect versioning give some
> > > suggestions?
> > > > >> >
> > > > >> > Thank you.
> > > > >> >
> > > > >> > Liang-Chi
> > > > >> >
> > > > >> >
> > > -
> > > > >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  > > dev-unsubscr...@spark.apache.org>
> > > > >> >
> > > > >> >
> > > > >>
> > > > >> -
> > > > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  > > dev-unsubscr...@spark.apache.org>
> > > > >>
> > > >
> > > >
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >
> > >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Do we have a compatibility matrix of Apache Connect Go client already, Bo?

Specifically, I'm wondering which versions the existing Apache Spark Connect Go 
repository is able to support as of now.

We know that it is supposed to be compatible always, but do we have a way to 
verify that actually via CI to make it sure inside Go repository?

Dongjoon.

On 2024/04/09 21:35:45 bo yang wrote:
> Thanks Liang-Chi for the Spark Operator work, and also the discussion here!
> 
> For Spark Operator and Connector Go Client, I am guessing they need to
> support multiple versions of Spark? e.g. same Spark Operator may support
> running multiple versions of Spark, and Connector Go Client might support
> multiple versions of Spark driver as well.
> 
> How do people think of using the minimum supported Spark version as the
> version name for Spark Operator and Connector Go Client? For example,
> Spark Operator 3.5.x supports Spark 3.5 and above.
> 
> Best,
> Bo
> 
> 
> On Tue, Apr 9, 2024 at 10:14 AM Dongjoon Hyun  wrote:
> 
> > Ya, that's simple and possible.
> >
> > However, it may cause many confusions because it implies that new `Spark
> > K8s Operator 4.0.0` and `Spark Connect Go 4.0.0` follow the same `Semantic
> > Versioning` policy like Apache Spark 4.0.0.
> >
> > In addition, `Versioning` is directly related to the Release Cadence. It's
> > unlikely for us to have `Spark K8s Operator` and `Spark Connect Go`
> > releases at every Apache Spark maintenance release. For example, there is
> > no commit in Spark Connect Go repository.
> >
> > I believe the versioning and release cadence is related to those
> > subprojects' maturity more.
> >
> > Dongjoon.
> >
> > On 2024/04/09 16:59:40 DB Tsai wrote:
> > >  Aligning with Spark releases is sensible, as it allows us to guarantee
> > that the Spark operator functions correctly with the new version while also
> > maintaining support for previous versions.
> > >
> > > DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
> > >
> > > > On Apr 9, 2024, at 9:45 AM, Mridul Muralidharan 
> > wrote:
> > > >
> > > >
> > > >   I am trying to understand if we can simply align with Spark's
> > version for this ?
> > > > Makes the release and jira management much more simpler for developers
> > and intuitive for users.
> > > >
> > > > Regards,
> > > > Mridul
> > > >
> > > >
> > > > On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  > > wrote:
> > > >> Hi, Liang-Chi.
> > > >>
> > > >> Thank you for leading Apache Spark K8s operator as a shepherd.
> > > >>
> > > >> I took a look at `Apache Spark Connect Go` repo mentioned in the
> > thread. Sadly, there is no release at all and no activity since last 6
> > months. It seems to be the first time for Apache Spark community to
> > consider these sister repositories (Go and K8s Operator).
> > > >>
> > > >> https://github.com/apache/spark-connect-go/commits/master/
> > > >>
> > > >> Dongjoon.
> > > >>
> > > >> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> > > >> > Hi all,
> > > >> >
> > > >> > We've opened the dedicated repository of Spark Kubernetes Operator,
> > > >> > and the first PR is created.
> > > >> > Thank you for the review from the community so far.
> > > >> >
> > > >> > About the versioning of Spark Operator, there are questions.
> > > >> >
> > > >> > As we are using Spark JIRA, when we are going to merge PRs, we need
> > to
> > > >> > choose a Spark version. However, the Spark Operator is versioning
> > > >> > differently than Spark. I'm wondering how we deal with this?
> > > >> >
> > > >> > Not sure if Connect also has its versioning different to Spark? If
> > so,
> > > >> > maybe we can follow how Connect does.
> > > >> >
> > > >> > Can someone who is familiar with Connect versioning give some
> > suggestions?
> > > >> >
> > > >> > Thank you.
> > > >> >
> > > >> > Liang-Chi
> > > >> >
> > > >> >
> > -
> > > >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  > dev-unsubscr...@spark.apache.org>
> > > >> >
> > > >> >
> > > >>
> > > >> -
> > > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  > dev-unsubscr...@spark.apache.org>
> > > >>
> > >
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Versioning of Spark Operator

2024-04-09 Thread bo yang
Thanks Liang-Chi for the Spark Operator work, and also the discussion here!

For Spark Operator and Connector Go Client, I am guessing they need to
support multiple versions of Spark? e.g. same Spark Operator may support
running multiple versions of Spark, and Connector Go Client might support
multiple versions of Spark driver as well.

How do people think of using the minimum supported Spark version as the
version name for Spark Operator and Connector Go Client? For example,
Spark Operator 3.5.x supports Spark 3.5 and above.

Best,
Bo


On Tue, Apr 9, 2024 at 10:14 AM Dongjoon Hyun  wrote:

> Ya, that's simple and possible.
>
> However, it may cause many confusions because it implies that new `Spark
> K8s Operator 4.0.0` and `Spark Connect Go 4.0.0` follow the same `Semantic
> Versioning` policy like Apache Spark 4.0.0.
>
> In addition, `Versioning` is directly related to the Release Cadence. It's
> unlikely for us to have `Spark K8s Operator` and `Spark Connect Go`
> releases at every Apache Spark maintenance release. For example, there is
> no commit in Spark Connect Go repository.
>
> I believe the versioning and release cadence is related to those
> subprojects' maturity more.
>
> Dongjoon.
>
> On 2024/04/09 16:59:40 DB Tsai wrote:
> >  Aligning with Spark releases is sensible, as it allows us to guarantee
> that the Spark operator functions correctly with the new version while also
> maintaining support for previous versions.
> >
> > DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
> >
> > > On Apr 9, 2024, at 9:45 AM, Mridul Muralidharan 
> wrote:
> > >
> > >
> > >   I am trying to understand if we can simply align with Spark's
> version for this ?
> > > Makes the release and jira management much more simpler for developers
> and intuitive for users.
> > >
> > > Regards,
> > > Mridul
> > >
> > >
> > > On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  > wrote:
> > >> Hi, Liang-Chi.
> > >>
> > >> Thank you for leading Apache Spark K8s operator as a shepherd.
> > >>
> > >> I took a look at `Apache Spark Connect Go` repo mentioned in the
> thread. Sadly, there is no release at all and no activity since last 6
> months. It seems to be the first time for Apache Spark community to
> consider these sister repositories (Go and K8s Operator).
> > >>
> > >> https://github.com/apache/spark-connect-go/commits/master/
> > >>
> > >> Dongjoon.
> > >>
> > >> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> > >> > Hi all,
> > >> >
> > >> > We've opened the dedicated repository of Spark Kubernetes Operator,
> > >> > and the first PR is created.
> > >> > Thank you for the review from the community so far.
> > >> >
> > >> > About the versioning of Spark Operator, there are questions.
> > >> >
> > >> > As we are using Spark JIRA, when we are going to merge PRs, we need
> to
> > >> > choose a Spark version. However, the Spark Operator is versioning
> > >> > differently than Spark. I'm wondering how we deal with this?
> > >> >
> > >> > Not sure if Connect also has its versioning different to Spark? If
> so,
> > >> > maybe we can follow how Connect does.
> > >> >
> > >> > Can someone who is familiar with Connect versioning give some
> suggestions?
> > >> >
> > >> > Thank you.
> > >> >
> > >> > Liang-Chi
> > >> >
> > >> >
> -
> > >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
> > >> >
> > >> >
> > >>
> > >> -
> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org  dev-unsubscr...@spark.apache.org>
> > >>
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Ya, that's simple and possible.

However, it may cause many confusions because it implies that new `Spark K8s 
Operator 4.0.0` and `Spark Connect Go 4.0.0` follow the same `Semantic 
Versioning` policy like Apache Spark 4.0.0.

In addition, `Versioning` is directly related to the Release Cadence. It's 
unlikely for us to have `Spark K8s Operator` and `Spark Connect Go` releases at 
every Apache Spark maintenance release. For example, there is no commit in 
Spark Connect Go repository.

I believe the versioning and release cadence is related to those subprojects' 
maturity more.

Dongjoon.

On 2024/04/09 16:59:40 DB Tsai wrote:
>  Aligning with Spark releases is sensible, as it allows us to guarantee that 
> the Spark operator functions correctly with the new version while also 
> maintaining support for previous versions.
>  
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
> 
> > On Apr 9, 2024, at 9:45 AM, Mridul Muralidharan  wrote:
> > 
> > 
> >   I am trying to understand if we can simply align with Spark's version for 
> > this ?
> > Makes the release and jira management much more simpler for developers and 
> > intuitive for users.
> > 
> > Regards,
> > Mridul
> > 
> > 
> > On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  > > wrote:
> >> Hi, Liang-Chi.
> >> 
> >> Thank you for leading Apache Spark K8s operator as a shepherd. 
> >> 
> >> I took a look at `Apache Spark Connect Go` repo mentioned in the thread. 
> >> Sadly, there is no release at all and no activity since last 6 months. It 
> >> seems to be the first time for Apache Spark community to consider these 
> >> sister repositories (Go and K8s Operator).
> >> 
> >> https://github.com/apache/spark-connect-go/commits/master/
> >> 
> >> Dongjoon.
> >> 
> >> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> >> > Hi all,
> >> > 
> >> > We've opened the dedicated repository of Spark Kubernetes Operator,
> >> > and the first PR is created.
> >> > Thank you for the review from the community so far.
> >> > 
> >> > About the versioning of Spark Operator, there are questions.
> >> > 
> >> > As we are using Spark JIRA, when we are going to merge PRs, we need to
> >> > choose a Spark version. However, the Spark Operator is versioning
> >> > differently than Spark. I'm wondering how we deal with this?
> >> > 
> >> > Not sure if Connect also has its versioning different to Spark? If so,
> >> > maybe we can follow how Connect does.
> >> > 
> >> > Can someone who is familiar with Connect versioning give some 
> >> > suggestions?
> >> > 
> >> > Thank you.
> >> > 
> >> > Liang-Chi
> >> > 
> >> > -
> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> >> > 
> >> > 
> >> > 
> >> 
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> >> 
> >> 
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Versioning of Spark Operator

2024-04-09 Thread DB Tsai
 Aligning with Spark releases is sensible, as it allows us to guarantee that 
the Spark operator functions correctly with the new version while also 
maintaining support for previous versions.
 
DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1

> On Apr 9, 2024, at 9:45 AM, Mridul Muralidharan  wrote:
> 
> 
>   I am trying to understand if we can simply align with Spark's version for 
> this ?
> Makes the release and jira management much more simpler for developers and 
> intuitive for users.
> 
> Regards,
> Mridul
> 
> 
> On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  > wrote:
>> Hi, Liang-Chi.
>> 
>> Thank you for leading Apache Spark K8s operator as a shepherd. 
>> 
>> I took a look at `Apache Spark Connect Go` repo mentioned in the thread. 
>> Sadly, there is no release at all and no activity since last 6 months. It 
>> seems to be the first time for Apache Spark community to consider these 
>> sister repositories (Go and K8s Operator).
>> 
>> https://github.com/apache/spark-connect-go/commits/master/
>> 
>> Dongjoon.
>> 
>> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
>> > Hi all,
>> > 
>> > We've opened the dedicated repository of Spark Kubernetes Operator,
>> > and the first PR is created.
>> > Thank you for the review from the community so far.
>> > 
>> > About the versioning of Spark Operator, there are questions.
>> > 
>> > As we are using Spark JIRA, when we are going to merge PRs, we need to
>> > choose a Spark version. However, the Spark Operator is versioning
>> > differently than Spark. I'm wondering how we deal with this?
>> > 
>> > Not sure if Connect also has its versioning different to Spark? If so,
>> > maybe we can follow how Connect does.
>> > 
>> > Can someone who is familiar with Connect versioning give some suggestions?
>> > 
>> > Thank you.
>> > 
>> > Liang-Chi
>> > 
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>> > 
>> > 
>> > 
>> 
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>> 
>> 



Re: Versioning of Spark Operator

2024-04-09 Thread Mridul Muralidharan
  I am trying to understand if we can simply align with Spark's version for
this ?
Makes the release and jira management much more simpler for developers and
intuitive for users.

Regards,
Mridul


On Tue, Apr 9, 2024 at 10:09 AM Dongjoon Hyun  wrote:

> Hi, Liang-Chi.
>
> Thank you for leading Apache Spark K8s operator as a shepherd.
>
> I took a look at `Apache Spark Connect Go` repo mentioned in the thread.
> Sadly, there is no release at all and no activity since last 6 months. It
> seems to be the first time for Apache Spark community to consider these
> sister repositories (Go and K8s Operator).
>
> https://github.com/apache/spark-connect-go/commits/master/
>
> Dongjoon.
>
> On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> > Hi all,
> >
> > We've opened the dedicated repository of Spark Kubernetes Operator,
> > and the first PR is created.
> > Thank you for the review from the community so far.
> >
> > About the versioning of Spark Operator, there are questions.
> >
> > As we are using Spark JIRA, when we are going to merge PRs, we need to
> > choose a Spark version. However, the Spark Operator is versioning
> > differently than Spark. I'm wondering how we deal with this?
> >
> > Not sure if Connect also has its versioning different to Spark? If so,
> > maybe we can follow how Connect does.
> >
> > Can someone who is familiar with Connect versioning give some
> suggestions?
> >
> > Thank you.
> >
> > Liang-Chi
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Versioning of Spark Operator

2024-04-09 Thread Dongjoon Hyun
Hi, Liang-Chi.

Thank you for leading Apache Spark K8s operator as a shepherd. 

I took a look at `Apache Spark Connect Go` repo mentioned in the thread. Sadly, 
there is no release at all and no activity since last 6 months. It seems to be 
the first time for Apache Spark community to consider these sister repositories 
(Go and K8s Operator).

https://github.com/apache/spark-connect-go/commits/master/

Dongjoon.

On 2024/04/08 17:48:18 "L. C. Hsieh" wrote:
> Hi all,
> 
> We've opened the dedicated repository of Spark Kubernetes Operator,
> and the first PR is created.
> Thank you for the review from the community so far.
> 
> About the versioning of Spark Operator, there are questions.
> 
> As we are using Spark JIRA, when we are going to merge PRs, we need to
> choose a Spark version. However, the Spark Operator is versioning
> differently than Spark. I'm wondering how we deal with this?
> 
> Not sure if Connect also has its versioning different to Spark? If so,
> maybe we can follow how Connect does.
> 
> Can someone who is familiar with Connect versioning give some suggestions?
> 
> Thank you.
> 
> Liang-Chi
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Jia.

I have the same questions like the previous Weiting's thread.

Do you think you can share the future milestone of Apache Gluten?
I'm wondering when the first stable release will come and how we can
coordinate across the ASF communities.

> This project is still under active development now, and doesn't have a
stable release.
> https://github.com/apache/incubator-gluten/releases/tag/v1.1.1

In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of
support.
And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is
scheduled in October.

For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if there
is something we need to do from Spark side.

Thanks,
Dongjoon.


On Tue, Apr 9, 2024 at 12:22 AM Ke Jia  wrote:

> Apache Spark currently lacks an official mechanism to support
> cross-platform execution of physical plans. The Gluten project offers a
> mechanism that utilizes the Substrait standard to convert and optimize
> Spark's physical plans. By introducing Gluten's plan conversion,
> validation, and fallback mechanisms into Spark, we can significantly
> enhance the portability and interoperability of Spark's physical plans,
> enabling them to operate across a broader spectrum of execution
> environments without requiring users to migrate, while also improving
> Spark's execution efficiency through the utilization of Gluten's advanced
> optimization techniques. And the integration of Gluten into Spark has
> already shown significant performance improvements with ClickHouse and
> Velox backends and has been successfully deployed in production by several
> customers.
>
> References:
> JIAR Ticket 
> SPIP Doc
> 
>
> Your feedback and comments are welcome and appreciated.  Thanks.
>
> Thanks,
> Jia Ke
>


Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Weiting.

Do you think you can share the future milestone of Apache Gluten?
I'm wondering when the first stable release will come and how we can
coordinate across the ASF communities.

> This project is still under active development now, and doesn't have a
stable release.
> https://github.com/apache/incubator-gluten/releases/tag/v1.1.1

In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of
support.
And, 3.4 will have 3.4.3 next week and 3.4.4 (another EOL release) is
scheduled in October.

For the SPIP, I guess it's applicable for Apache Spark 4.0.0 only if there
is something we need to do from Spark side.

Thanks,
Dongjoon.


On Mon, Apr 8, 2024 at 11:19 PM WeitingChen  wrote:

> Hi all,
>
> We are excited to introduce a new Apache incubating project called Gluten.
> Gluten serves as a middleware layer designed to offload Spark to native
> engines like Velox or ClickHouse.
> For more detailed information, please visit the project repository at
> https://github.com/apache/incubator-gluten
>
> Additionally, a new Spark SPIP related to Spark + Gluten collaboration has
> been proposed at https://issues.apache.org/jira/browse/SPARK-47773.
> We eagerly await feedback from the Spark community.
>
> Thanks,
> Weiting.
>
>


Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread WeitingChen
Hi all,

We are excited to introduce a new Apache incubating project called Gluten.
Gluten serves as a middleware layer designed to offload Spark to native
engines like Velox or ClickHouse.
For more detailed information, please visit the project repository at
https://github.com/apache/incubator-gluten

Additionally, a new Spark SPIP related to Spark + Gluten collaboration has
been proposed at https://issues.apache.org/jira/browse/SPARK-47773.
We eagerly await feedback from the Spark community.

Thanks,
Weiting.


SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-09 Thread Ke Jia
Apache Spark currently lacks an official mechanism to support
cross-platform execution of physical plans. The Gluten project offers a
mechanism that utilizes the Substrait standard to convert and optimize
Spark's physical plans. By introducing Gluten's plan conversion,
validation, and fallback mechanisms into Spark, we can significantly
enhance the portability and interoperability of Spark's physical plans,
enabling them to operate across a broader spectrum of execution
environments without requiring users to migrate, while also improving
Spark's execution efficiency through the utilization of Gluten's advanced
optimization techniques. And the integration of Gluten into Spark has
already shown significant performance improvements with ClickHouse and
Velox backends and has been successfully deployed in production by several
customers.

References:
JIAR Ticket 
SPIP Doc


Your feedback and comments are welcome and appreciated.  Thanks.

Thanks,
Jia Ke