Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread bo yang
grate with Spark Connect to > provide the
>> routing capabilities to provide a stable DNS endpoint for > all different
>> Spark deployments. > >> [...] the current ecosystem is, subjectively
>> speaking, a bit >> brittle. > > Can you help me understand that a bit
>> better? Do you mean the Spark > ecosystem or the Spark Connect ecosystem?
>>
>> I mean Spark in general. While most of the core and some closely related
>> projects are well maintained, tools built on top of Spark, even ones
>> supported by major stakeholders, are often short-lived and left
>> unmaintained, if not officially abandoned.
>>
>> New languages aside, without a single extension point (which, for core
>> Spark is JVM interface), maintaining public projects on top of Spark
>> becomes even less attractive. That, assuming we don't completely reject the
>> idea of extending Spark functionality while using Spark Connect,
>> effectively limiting the target audience for any 3rd party library.
>>
>> > > Martin > > > On Fri, May 26, 2023 at 5:39 PM Maciej
>>   > wrote: > > It might
>> be a good idea to have a discussion about how new connect > clients fit
>> into the overall process we have. In particular: > > * Under what
>> conditions do we consider adding a new language to the > official channels?
>> What process do we follow? * What guarantees do > we offer in respect to
>> these clients? Is adding a new client the same > type of commitment as for
>> the core API? In other words, do we commit > to maintaining such clients
>> "forever" or do we separate the > "official" and "contrib" clients, with
>> the later being governed by > the ASF, but not guaranteed to be maintained
>> in the future? * Do we > follow the same release schedule as for the core
>> project, or rather > release each client separately, after the main release
>> is completed? > > Also, an elephant in the room is the future of the
>> current API in > Spark 4 and onwards. As useful as connect is, it is not
>> exactly a > replacement for many existing deployments. Furthermore, it
>> doesn't > make extending Spark much easier and the current ecosystem is, >
>> subjectively speaking, a bit brittle. > > -- Best regards, Maciej > > > On
>> 5/26/23 07:26, Martin Grund wrote: >> Thanks everyone for your feedback! I
>> will work on figuring out what >> it takes to get started with a repo for
>> the go client. >> >> On Thu 25. May 2023 at 21:51 Chao Sun
>>   wrote: >> >> +1 on separate
>> repo too >> >> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun >>
>>   wrote: >>> >>> +1
>> for starting on a separate repo. >>> >>> Dongjoon. >>> >>> On Thu, May 25,
>> 2023 at 9:53 AM yangjie01  
>> >>> wrote: >>>> >>>> +1 on start this with a separate repo. >>>> >>>> Which
>> new clients can be placed in the main repo should be >>>> discussed after
>> they are mature enough, >>>> >>>> >>>> >>>> Yang Jie >>>> >>>> >>>> >>>>
>> 发件人: Denny Lee   日期:
>> 2023年5月24日 星期三 >>>> 21:31 收件人: Hyukjin Kwon 
>>  抄送: Maciej >>>> 
>> , "dev@spark.apache.org" 
>> >>>>   主题: Re: [CONNECT] New
>> Clients for Go and >>>> Rust >>>> >>>> >>>> >>>> +1 on separate repo
>> allowing different APIs to run at different >>>> speeds and ensuring they
>> get community support. >>>> >>>> >>>> >>>> On Wed, May 24, 2023 at 00:37
>> Hyukjin Kwon >>>>   wrote:
>> >>>> >>>> I think we can just start this with a separate repo. I am fine
>> >>>> with the second option too but in this case we would have to >>>>
>> triage which language to add into the main repo. >>>> >>>> >>>> >>>> On
>> Fri, 19 May 2023 at 22:28, Maciej 
>>  >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>>
>> Personally, I'm strongly against the second option and have >>>> some
>> preference towards the third one (or maybe a mix of the >>>> first one and
>> the third one). >>>> >>>> >

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Martin Grund
etely reject the
> idea of extending Spark functionality while using Spark Connect,
> effectively limiting the target audience for any 3rd party library.
>
> > > Martin > > > On Fri, May 26, 2023 at 5:39 PM Maciej
>   > wrote: > > It might
> be a good idea to have a discussion about how new connect > clients fit
> into the overall process we have. In particular: > > * Under what
> conditions do we consider adding a new language to the > official channels?
> What process do we follow? * What guarantees do > we offer in respect to
> these clients? Is adding a new client the same > type of commitment as for
> the core API? In other words, do we commit > to maintaining such clients
> "forever" or do we separate the > "official" and "contrib" clients, with
> the later being governed by > the ASF, but not guaranteed to be maintained
> in the future? * Do we > follow the same release schedule as for the core
> project, or rather > release each client separately, after the main release
> is completed? > > Also, an elephant in the room is the future of the
> current API in > Spark 4 and onwards. As useful as connect is, it is not
> exactly a > replacement for many existing deployments. Furthermore, it
> doesn't > make extending Spark much easier and the current ecosystem is, >
> subjectively speaking, a bit brittle. > > -- Best regards, Maciej > > > On
> 5/26/23 07:26, Martin Grund wrote: >> Thanks everyone for your feedback! I
> will work on figuring out what >> it takes to get started with a repo for
> the go client. >> >> On Thu 25. May 2023 at 21:51 Chao Sun
>   wrote: >> >> +1 on separate
> repo too >> >> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun >>
>   wrote: >>> >>> +1 for
> starting on a separate repo. >>> >>> Dongjoon. >>> >>> On Thu, May 25, 2023
> at 9:53 AM yangjie01   >>>
> wrote: >>>> >>>> +1 on start this with a separate repo. >>>> >>>> Which new
> clients can be placed in the main repo should be >>>> discussed after they
> are mature enough, >>>> >>>> >>>> >>>> Yang Jie >>>> >>>> >>>> >>>> 发件人:
> Denny Lee   日期: 2023年5月24日
> 星期三 >>>> 21:31 收件人: Hyukjin Kwon 
>  抄送: Maciej >>>> 
> , "dev@spark.apache.org" 
> >>>>   主题: Re: [CONNECT] New
> Clients for Go and >>>> Rust >>>> >>>> >>>> >>>> +1 on separate repo
> allowing different APIs to run at different >>>> speeds and ensuring they
> get community support. >>>> >>>> >>>> >>>> On Wed, May 24, 2023 at 00:37
> Hyukjin Kwon >>>>   wrote:
> >>>> >>>> I think we can just start this with a separate repo. I am fine
> >>>> with the second option too but in this case we would have to >>>>
> triage which language to add into the main repo. >>>> >>>> >>>> >>>> On
> Fri, 19 May 2023 at 22:28, Maciej 
>  >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>>
> Personally, I'm strongly against the second option and have >>>> some
> preference towards the third one (or maybe a mix of the >>>> first one and
> the third one). >>>> >>>> >>>> >>>> The project is already pretty large
> as-is and, with an >>>> extremely conservative approach towards removal of
> APIs, it >>>> only tends to grow over time. Making it even larger is not
> >>>> going to make things more maintainable and is likely to create >>>> an
> entry barrier for new contributors (that's similar to Jia's >>>>
> arguments). >>>> >>>> >>>> >>>> Moreover, we've seen quite a few different
> language clients >>>> over the years and all but one or two survived while
> none is >>>> particularly active, as far as I'm aware. Taking >>>>
> responsibility for more clients, without being sure that we >>>> have
> resources to maintain them and there is enough community >>>> around them
> to make such effort worthwhile, doesn't seem like a >>>> good idea. >>>>
> >>>> >>>> >>>> -- >>>&g

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Maciej
 Best regards, Maciej 
> > > On 5/26/23 07:26, Martin Grund wrote: >> Thanks everyone for your 
feedback! I will work on figuring out what >> it takes to get started 
with a repo for the go client. >> >> On Thu 25. May 2023 at 21:51 Chao 
Sun  wrote: >> >> +1 on separate repo too >> >> On 
Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun >>  
wrote: >>> >>> +1 for starting on a separate repo. >>> >>> Dongjoon. >>> 
>>> On Thu, May 25, 2023 at 9:53 AM yangjie01  >>> 
wrote: >>>> >>>> +1 on start this with a separate repo. >>>> >>>> Which 
new clients can be placed in the main repo should be >>>> discussed 
after they are mature enough, >>>> >>>> >>>> >>>> Yang Jie >>>> >>>> 
>>>> >>>> 发件人: Denny Lee  日期: 2023年5月24日 
星期三 >>>> 21:31 收件人: Hyukjin Kwon  抄送: Maciej 
>>>> , "dev@spark.apache.org" >>>> 
 主题: Re: [CONNECT] New Clients for Go and >>>> 
Rust >>>> >>>> >>>> >>>> +1 on separate repo allowing different APIs to 
run at different >>>> speeds and ensuring they get community support. 
>>>> >>>> >>>> >>>> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon >>>> 
 wrote: >>>> >>>> I think we can just start this 
with a separate repo. I am fine >>>> with the second option too but in 
this case we would have to >>>> triage which language to add into the 
main repo. >>>> >>>> >>>> >>>> On Fri, 19 May 2023 at 22:28, Maciej 
 >>>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> 
Personally, I'm strongly against the second option and have >>>> some 
preference towards the third one (or maybe a mix of the >>>> first one 
and the third one). >>>> >>>> >>>> >>>> The project is already pretty 
large as-is and, with an >>>> extremely conservative approach towards 
removal of APIs, it >>>> only tends to grow over time. Making it even 
larger is not >>>> going to make things more maintainable and is likely 
to create >>>> an entry barrier for new contributors (that's similar to 
Jia's >>>> arguments). >>>> >>>> >>>> >>>> Moreover, we've seen quite a 
few different language clients >>>> over the years and all but one or 
two survived while none is >>>> particularly active, as far as I'm 
aware. Taking >>>> responsibility for more clients, without being sure 
that we >>>> have resources to maintain them and there is enough 
community >>>> around them to make such effort worthwhile, doesn't seem 
like a >>>> good idea. >>>> >>>> >>>> >>>> -- >>>> >>>> Best regards, 
>>>> >>>> Maciej Szymkiewicz >>>> >>>> >>>> >>>> Web: 
https://zero323.net >>>> >>>> PGP: A30CEF0C31A501EC >>>> >>>> >>>> >>>> 
>>>> >>>> On 5/19/23 14:57, Jia Fan wrote: >>>> >>>> Hi, >>>> >>>> >>>> 
>>>> Thanks for contribution! >>>> >>>> I prefer (1). There are some 
reason: >>>> >>>> >>>> >>>> 1. Different repository can maintain 
independent versions, >>>> different release times, and faster bug fix 
releases. >>>> >>>> >>>> >>>> 2. Different languages have different 
build tools. Putting them >>>> in one repository will make the main 
repository more and more >>>> complicated, and it will become extremely 
difficult to perform >>>> a complete build in the main repository. >>>> 
>>>> >>>> >>>> 3. Different repository will make CI configuration and 
execute >>>> easier, and the PR and commit lists will be clearer. >>>> 
>>>> >>>> >>>> 4. Other repository also have different client to 
governed, >>>> like clickhouse. It use different repository for jdbc, 
odbc, >>>> c++. Please refer: >>>> >>&g

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Martin Grund
aintained in the future?
>>>>- Do we follow the same release schedule as for the core project,
>>>>or rather release each client separately, after the main release is
>>>>completed?
>>>>
>>>> Also, an elephant in the room is the future of the current API in Spark
>>>> 4 and onwards. As useful as connect is, it is not exactly a replacement for
>>>> many existing deployments. Furthermore, it doesn't make extending Spark
>>>> much easier and the current ecosystem is, subjectively speaking, a bit
>>>> brittle.
>>>>
>>>> --
>>>> Best regards,
>>>> Maciej
>>>>
>>>>
>>>> On 5/26/23 07:26, Martin Grund wrote:
>>>>
>>>> Thanks everyone for your feedback! I will work on figuring out what it
>>>> takes to get started with a repo for the go client.
>>>>
>>>> On Thu 25. May 2023 at 21:51 Chao Sun  wrote:
>>>>
>>>>> +1 on separate repo too
>>>>>
>>>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun <
>>>>> dongjoon.h...@gmail.com> wrote:
>>>>> >
>>>>> > +1 for starting on a separate repo.
>>>>> >
>>>>> > Dongjoon.
>>>>> >
>>>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01 
>>>>> wrote:
>>>>> >>
>>>>> >> +1 on start this with a separate repo.
>>>>> >>
>>>>> >> Which new clients can be placed in the main repo should be
>>>>> discussed after they are mature enough,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Yang Jie
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> 发件人: Denny Lee 
>>>>> >> 日期: 2023年5月24日 星期三 21:31
>>>>> >> 收件人: Hyukjin Kwon 
>>>>> >> 抄送: Maciej , "dev@spark.apache.org" <
>>>>> dev@spark.apache.org>
>>>>> >> 主题: Re: [CONNECT] New Clients for Go and Rust
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> +1 on separate repo allowing different APIs to run at different
>>>>> speeds and ensuring they get community support.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon 
>>>>> wrote:
>>>>> >>
>>>>> >> I think we can just start this with a separate repo.
>>>>> >> I am fine with the second option too but in this case we would have
>>>>> to triage which language to add into the main repo.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Fri, 19 May 2023 at 22:28, Maciej 
>>>>> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Personally, I'm strongly against the second option and have some
>>>>> preference towards the third one (or maybe a mix of the first one and the
>>>>> third one).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> The project is already pretty large as-is and, with an extremely
>>>>> conservative approach towards removal of APIs, it only tends to grow over
>>>>> time. Making it even larger is not going to make things more maintainable
>>>>> and is likely to create an entry barrier for new contributors (that's
>>>>> similar to Jia's arguments).
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> Moreover, we've seen quite a few different language clients over
>>>>> the years and all but one or two survived while none is particularly
>>>>> active, as far as I'm aware.  Taking responsibility for more clients,
>>>>> without being sure that we have resources to maintain them and there is
>>>>> enough community around them to make such effort worthwhile, doesn't seem
>>>>> like a good idea.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >>
>>

Re: [CONNECT] New Clients for Go and Rust

2023-05-31 Thread bo yang
 Martin Grund wrote:
>>>
>>> Thanks everyone for your feedback! I will work on figuring out what it
>>> takes to get started with a repo for the go client.
>>>
>>> On Thu 25. May 2023 at 21:51 Chao Sun  wrote:
>>>
>>>> +1 on separate repo too
>>>>
>>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun 
>>>> wrote:
>>>> >
>>>> > +1 for starting on a separate repo.
>>>> >
>>>> > Dongjoon.
>>>> >
>>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01 
>>>> wrote:
>>>> >>
>>>> >> +1 on start this with a separate repo.
>>>> >>
>>>> >> Which new clients can be placed in the main repo should be discussed
>>>> after they are mature enough,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Yang Jie
>>>> >>
>>>> >>
>>>> >>
>>>> >> 发件人: Denny Lee 
>>>> >> 日期: 2023年5月24日 星期三 21:31
>>>> >> 收件人: Hyukjin Kwon 
>>>> >> 抄送: Maciej , "dev@spark.apache.org" <
>>>> dev@spark.apache.org>
>>>> >> 主题: Re: [CONNECT] New Clients for Go and Rust
>>>> >>
>>>> >>
>>>> >>
>>>> >> +1 on separate repo allowing different APIs to run at different
>>>> speeds and ensuring they get community support.
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon 
>>>> wrote:
>>>> >>
>>>> >> I think we can just start this with a separate repo.
>>>> >> I am fine with the second option too but in this case we would have
>>>> to triage which language to add into the main repo.
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Personally, I'm strongly against the second option and have some
>>>> preference towards the third one (or maybe a mix of the first one and the
>>>> third one).
>>>> >>
>>>> >>
>>>> >>
>>>> >> The project is already pretty large as-is and, with an extremely
>>>> conservative approach towards removal of APIs, it only tends to grow over
>>>> time. Making it even larger is not going to make things more maintainable
>>>> and is likely to create an entry barrier for new contributors (that's
>>>> similar to Jia's arguments).
>>>> >>
>>>> >>
>>>> >>
>>>> >> Moreover, we've seen quite a few different language clients over the
>>>> years and all but one or two survived while none is particularly active, as
>>>> far as I'm aware.  Taking responsibility for more clients, without being
>>>> sure that we have resources to maintain them and there is enough community
>>>> around them to make such effort worthwhile, doesn't seem like a good idea.
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >>
>>>> >> Best regards,
>>>> >>
>>>> >> Maciej Szymkiewicz
>>>> >>
>>>> >>
>>>> >>
>>>> >> Web: https://zero323.net
>>>> >>
>>>> >> PGP: A30CEF0C31A501EC
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On 5/19/23 14:57, Jia Fan wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks for contribution!
>>>> >>
>>>> >> I prefer (1). There are some reason:
>>>> >>
>>>> >>
>>>> >>
>>>> >> 1. Different repository can maintain independent versions, different
>>>> release times, and faster bug fix releases.
>>>> >>
>>>> >>
>>>> >>
>>>> >> 2. Different languages have 

Re: [CONNECT] New Clients for Go and Rust

2023-05-30 Thread Martin Grund
Hi folks,

Thanks a lot to the help form Hykjin! We've create the
https://github.com/apache/spark-connect-go as the first contrib repository
for Spark Connect under the Apache Spark project. We will move the
development of the Golang client to this repository and make it very clear
from the README file that this is an experimental client.

Looking forward to all your contributions!

On Tue, May 30, 2023 at 11:50 AM Martin Grund  wrote:

> I think it makes sense to split this discussion into two pieces. On the
> contribution side, my personal perspective is that these new clients are
> explicitly marked as experimental and unsupported until we deem them mature
> enough to be supported using the standard release process etc. However, the
> goal should be that the main contributors of these clients are aiming to
> follow the same release and maintenance schedule. I think we should
> encourage the community to contribute to the Spark Connect clients and as
> such we should explicitly not make it as hard as possible to get started
> (and for that reason reserve the right to abandon).
>
> How exactly the release schedule is going to look is going to require
> probably some experimentation because it's a new area for Spark and it's
> ecosystem. I don't think it requires us to have all answers upfront.
>
> > Also, an elephant in the room is the future of the current API in Spark
> 4 and onwards. As useful as connect is, it is not exactly a replacement for
> many existing deployments. Furthermore, it doesn't make extending Spark
> much easier and the current ecosystem is, subjectively speaking, a bit
> brittle.
>
> The goal of Spark Connect is not to replace the way users are currently
> deploying Spark, it's not meant to be that. Users should continue deploying
> Spark in exactly the way they prefer. Spark Connect allows bringing more
> interactivity and connectivity to Spark. While Spark Connect extends Spark,
> most new language consumers will not try to extend Spark, but simply
> provide the existing surface to their native language. So the goal is not
> so much extensibility but more availability. For example, I believe it
> would be awesome if the Livy community would find a way to integrate with
> Spark Connect to provide the routing capabilities to provide a stable DNS
> endpoint for all different Spark deployments.
>
> > [...] the current ecosystem is, subjectively speaking, a bit brittle.
>
> Can you help me understand that a bit better? Do you mean the Spark
> ecosystem or the Spark Connect ecosystem?
>
>
>
> Martin
>
>
> On Fri, May 26, 2023 at 5:39 PM Maciej  wrote:
>
>> It might be a good idea to have a discussion about how new connect
>> clients fit into the overall process we have. In particular:
>>
>>
>>- Under what conditions do we consider adding a new language to the
>>official channels?  What process do we follow?
>>- What guarantees do we offer in respect to these clients? Is adding
>>a new client the same type of commitment as for the core API? In other
>>words, do we commit to maintaining such clients "forever" or do we 
>> separate
>>the "official" and "contrib" clients, with the later being governed by the
>>ASF, but not guaranteed to be maintained in the future?
>>- Do we follow the same release schedule as for the core project, or
>>rather release each client separately, after the main release is 
>> completed?
>>
>> Also, an elephant in the room is the future of the current API in Spark 4
>> and onwards. As useful as connect is, it is not exactly a replacement for
>> many existing deployments. Furthermore, it doesn't make extending Spark
>> much easier and the current ecosystem is, subjectively speaking, a bit
>> brittle.
>>
>> --
>> Best regards,
>> Maciej
>>
>>
>> On 5/26/23 07:26, Martin Grund wrote:
>>
>> Thanks everyone for your feedback! I will work on figuring out what it
>> takes to get started with a repo for the go client.
>>
>> On Thu 25. May 2023 at 21:51 Chao Sun  wrote:
>>
>>> +1 on separate repo too
>>>
>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > +1 for starting on a separate repo.
>>> >
>>> > Dongjoon.
>>> >
>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01  wrote:
>>> >>
>>> >> +1 on start this with a separate repo.
>>> >>
>>> >> Which new clients can be placed in the main repo should be discussed
>>> after they are mature enough,
&

Re: [CONNECT] New Clients for Go and Rust

2023-05-30 Thread Martin Grund
I think it makes sense to split this discussion into two pieces. On the
contribution side, my personal perspective is that these new clients are
explicitly marked as experimental and unsupported until we deem them mature
enough to be supported using the standard release process etc. However, the
goal should be that the main contributors of these clients are aiming to
follow the same release and maintenance schedule. I think we should
encourage the community to contribute to the Spark Connect clients and as
such we should explicitly not make it as hard as possible to get started
(and for that reason reserve the right to abandon).

How exactly the release schedule is going to look is going to require
probably some experimentation because it's a new area for Spark and it's
ecosystem. I don't think it requires us to have all answers upfront.

> Also, an elephant in the room is the future of the current API in Spark 4
and onwards. As useful as connect is, it is not exactly a replacement for
many existing deployments. Furthermore, it doesn't make extending Spark
much easier and the current ecosystem is, subjectively speaking, a bit
brittle.

The goal of Spark Connect is not to replace the way users are currently
deploying Spark, it's not meant to be that. Users should continue deploying
Spark in exactly the way they prefer. Spark Connect allows bringing more
interactivity and connectivity to Spark. While Spark Connect extends Spark,
most new language consumers will not try to extend Spark, but simply
provide the existing surface to their native language. So the goal is not
so much extensibility but more availability. For example, I believe it
would be awesome if the Livy community would find a way to integrate with
Spark Connect to provide the routing capabilities to provide a stable DNS
endpoint for all different Spark deployments.

> [...] the current ecosystem is, subjectively speaking, a bit brittle.

Can you help me understand that a bit better? Do you mean the Spark
ecosystem or the Spark Connect ecosystem?



Martin


On Fri, May 26, 2023 at 5:39 PM Maciej  wrote:

> It might be a good idea to have a discussion about how new connect clients
> fit into the overall process we have. In particular:
>
>
>- Under what conditions do we consider adding a new language to the
>official channels?  What process do we follow?
>- What guarantees do we offer in respect to these clients? Is adding a
>new client the same type of commitment as for the core API? In other words,
>do we commit to maintaining such clients "forever" or do we separate the
>"official" and "contrib" clients, with the later being governed by the ASF,
>but not guaranteed to be maintained in the future?
>- Do we follow the same release schedule as for the core project, or
>rather release each client separately, after the main release is completed?
>
> Also, an elephant in the room is the future of the current API in Spark 4
> and onwards. As useful as connect is, it is not exactly a replacement for
> many existing deployments. Furthermore, it doesn't make extending Spark
> much easier and the current ecosystem is, subjectively speaking, a bit
> brittle.
>
> --
> Best regards,
> Maciej
>
>
> On 5/26/23 07:26, Martin Grund wrote:
>
> Thanks everyone for your feedback! I will work on figuring out what it
> takes to get started with a repo for the go client.
>
> On Thu 25. May 2023 at 21:51 Chao Sun  wrote:
>
>> +1 on separate repo too
>>
>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun 
>> wrote:
>> >
>> > +1 for starting on a separate repo.
>> >
>> > Dongjoon.
>> >
>> > On Thu, May 25, 2023 at 9:53 AM yangjie01  wrote:
>> >>
>> >> +1 on start this with a separate repo.
>> >>
>> >> Which new clients can be placed in the main repo should be discussed
>> after they are mature enough,
>> >>
>> >>
>> >>
>> >> Yang Jie
>> >>
>> >>
>> >>
>> >> 发件人: Denny Lee 
>> >> 日期: 2023年5月24日 星期三 21:31
>> >> 收件人: Hyukjin Kwon 
>> >> 抄送: Maciej , "dev@spark.apache.org" <
>> dev@spark.apache.org>
>> >> 主题: Re: [CONNECT] New Clients for Go and Rust
>> >>
>> >>
>> >>
>> >> +1 on separate repo allowing different APIs to run at different speeds
>> and ensuring they get community support.
>> >>
>> >>
>> >>
>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon 
>> wrote:
>> >>
>> >> I think we can just start this with a separate repo.
>> >> I am fine with

Re: [CONNECT] New Clients for Go and Rust

2023-05-26 Thread Maciej
It might be a good idea to have a discussion about how new connect 
clients fit into the overall process we have. In particular:


 * Under what conditions do we consider adding a new language to the
   official channels?  What process do we follow?
 * What guarantees do we offer in respect to these clients? Is adding a
   new client the same type of commitment as for the core API? In other
   words, do we commit to maintaining such clients "forever" or do we
   separate the "official" and "contrib" clients, with the later being
   governed by the ASF, but not guaranteed to be maintained in the future?
 * Do we follow the same release schedule as for the core project, or
   rather release each client separately, after the main release is
   completed?

Also, an elephant in the room is the future of the current API in Spark 
4 and onwards. As useful as connect is, it is not exactly a replacement 
for many existing deployments. Furthermore, it doesn't make extending 
Spark much easier and the current ecosystem is, subjectively speaking, a 
bit brittle.


--
Best regards,
Maciej


On 5/26/23 07:26, Martin Grund wrote:
Thanks everyone for your feedback! I will work on figuring out what it 
takes to get started with a repo for the go client.


On Thu 25. May 2023 at 21:51 Chao Sun  wrote:

+1 on separate repo too

On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun
 wrote:
>
> +1 for starting on a separate repo.
>
> Dongjoon.
>
> On Thu, May 25, 2023 at 9:53 AM yangjie01 
wrote:
>>
>> +1 on start this with a separate repo.
>>
>> Which new clients can be placed in the main repo should be
discussed after they are mature enough,
>>
>>
>>
>> Yang Jie
>>
>>
>>
>> 发件人: Denny Lee 
>> 日期: 2023年5月24日 星期三 21:31
>> 收件人: Hyukjin Kwon 
>> 抄送: Maciej , "dev@spark.apache.org"

>> 主题: Re: [CONNECT] New Clients for Go and Rust
>>
>>
>>
>> +1 on separate repo allowing different APIs to run at different
speeds and ensuring they get community support.
>>
>>
>>
>> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon
 wrote:
>>
>> I think we can just start this with a separate repo.
>> I am fine with the second option too but in this case we would
have to triage which language to add into the main repo.
>>
>>
>>
>> On Fri, 19 May 2023 at 22:28, Maciej 
wrote:
>>
>> Hi,
>>
>>
>>
>> Personally, I'm strongly against the second option and have
some preference towards the third one (or maybe a mix of the first
one and the third one).
>>
>>
>>
>> The project is already pretty large as-is and, with an
extremely conservative approach towards removal of APIs, it only
tends to grow over time. Making it even larger is not going to
make things more maintainable and is likely to create an entry
barrier for new contributors (that's similar to Jia's arguments).
>>
>>
>>
>> Moreover, we've seen quite a few different language clients
over the years and all but one or two survived while none is
particularly active, as far as I'm aware.  Taking responsibility
for more clients, without being sure that we have resources to
maintain them and there is enough community around them to make
such effort worthwhile, doesn't seem like a good idea.
>>
>>
>>
>> --
>>
>> Best regards,
>>
>> Maciej Szymkiewicz
>>
>>
>>
>> Web: https://zero323.net
>>
>> PGP: A30CEF0C31A501EC
>>
>>
>>
>>
>>
>> On 5/19/23 14:57, Jia Fan wrote:
>>
>> Hi,
>>
>>
>>
>> Thanks for contribution!
>>
>> I prefer (1). There are some reason:
>>
>>
>>
>> 1. Different repository can maintain independent versions,
different release times, and faster bug fix releases.
>>
>>
>>
>> 2. Different languages have different build tools. Putting them
in one repository will make the main repository more and more
complicated, and it will become extremely difficult to perform a
complete build in the main repository.
>>
>>
>>
>> 3. Different repository will make CI configuration and execute
easier, and the PR and commit lists will be clearer.
>>
>>
 

Re: [CONNECT] New Clients for Go and Rust

2023-05-25 Thread Martin Grund
Thanks everyone for your feedback! I will work on figuring out what it
takes to get started with a repo for the go client.

On Thu 25. May 2023 at 21:51 Chao Sun  wrote:

> +1 on separate repo too
>
> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun 
> wrote:
> >
> > +1 for starting on a separate repo.
> >
> > Dongjoon.
> >
> > On Thu, May 25, 2023 at 9:53 AM yangjie01  wrote:
> >>
> >> +1 on start this with a separate repo.
> >>
> >> Which new clients can be placed in the main repo should be discussed
> after they are mature enough,
> >>
> >>
> >>
> >> Yang Jie
> >>
> >>
> >>
> >> 发件人: Denny Lee 
> >> 日期: 2023年5月24日 星期三 21:31
> >> 收件人: Hyukjin Kwon 
> >> 抄送: Maciej , "dev@spark.apache.org" <
> dev@spark.apache.org>
> >> 主题: Re: [CONNECT] New Clients for Go and Rust
> >>
> >>
> >>
> >> +1 on separate repo allowing different APIs to run at different speeds
> and ensuring they get community support.
> >>
> >>
> >>
> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon 
> wrote:
> >>
> >> I think we can just start this with a separate repo.
> >> I am fine with the second option too but in this case we would have to
> triage which language to add into the main repo.
> >>
> >>
> >>
> >> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
> >>
> >> Hi,
> >>
> >>
> >>
> >> Personally, I'm strongly against the second option and have some
> preference towards the third one (or maybe a mix of the first one and the
> third one).
> >>
> >>
> >>
> >> The project is already pretty large as-is and, with an extremely
> conservative approach towards removal of APIs, it only tends to grow over
> time. Making it even larger is not going to make things more maintainable
> and is likely to create an entry barrier for new contributors (that's
> similar to Jia's arguments).
> >>
> >>
> >>
> >> Moreover, we've seen quite a few different language clients over the
> years and all but one or two survived while none is particularly active, as
> far as I'm aware.  Taking responsibility for more clients, without being
> sure that we have resources to maintain them and there is enough community
> around them to make such effort worthwhile, doesn't seem like a good idea.
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >>
> >> Maciej Szymkiewicz
> >>
> >>
> >>
> >> Web: https://zero323.net
> >>
> >> PGP: A30CEF0C31A501EC
> >>
> >>
> >>
> >>
> >>
> >> On 5/19/23 14:57, Jia Fan wrote:
> >>
> >> Hi,
> >>
> >>
> >>
> >> Thanks for contribution!
> >>
> >> I prefer (1). There are some reason:
> >>
> >>
> >>
> >> 1. Different repository can maintain independent versions, different
> release times, and faster bug fix releases.
> >>
> >>
> >>
> >> 2. Different languages have different build tools. Putting them in one
> repository will make the main repository more and more complicated, and it
> will become extremely difficult to perform a complete build in the main
> repository.
> >>
> >>
> >>
> >> 3. Different repository will make CI configuration and execute easier,
> and the PR and commit lists will be clearer.
> >>
> >>
> >>
> >> 4. Other repository also have different client to governed, like
> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
> >>
> >> https://github.com/ClickHouse/clickhouse-java
> >>
> >> https://github.com/ClickHouse/clickhouse-odbc
> >>
> >> https://github.com/ClickHouse/clickhouse-cpp
> >>
> >>
> >>
> >> PS: I'm looking forward to the javascript connect client!
> >>
> >>
> >>
> >> Thanks Regards
> >>
> >> Jia Fan
> >>
> >>
> >>
> >> Martin Grund  于2023年5月19日周五 20:03写道:
> >>
> >> Hi folks,
> >>
> >>
> >>
> >> When Bo (thanks for the time and contribution) started the work on
> https://github.com/apache/spark/pull/41036 he started the Go client
> directly in the Spark r

Re: [CONNECT] New Clients for Go and Rust

2023-05-25 Thread Chao Sun
+1 on separate repo too

On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun  wrote:
>
> +1 for starting on a separate repo.
>
> Dongjoon.
>
> On Thu, May 25, 2023 at 9:53 AM yangjie01  wrote:
>>
>> +1 on start this with a separate repo.
>>
>> Which new clients can be placed in the main repo should be discussed after 
>> they are mature enough,
>>
>>
>>
>> Yang Jie
>>
>>
>>
>> 发件人: Denny Lee 
>> 日期: 2023年5月24日 星期三 21:31
>> 收件人: Hyukjin Kwon 
>> 抄送: Maciej , "dev@spark.apache.org" 
>> 
>> 主题: Re: [CONNECT] New Clients for Go and Rust
>>
>>
>>
>> +1 on separate repo allowing different APIs to run at different speeds and 
>> ensuring they get community support.
>>
>>
>>
>> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon  wrote:
>>
>> I think we can just start this with a separate repo.
>> I am fine with the second option too but in this case we would have to 
>> triage which language to add into the main repo.
>>
>>
>>
>> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
>>
>> Hi,
>>
>>
>>
>> Personally, I'm strongly against the second option and have some preference 
>> towards the third one (or maybe a mix of the first one and the third one).
>>
>>
>>
>> The project is already pretty large as-is and, with an extremely 
>> conservative approach towards removal of APIs, it only tends to grow over 
>> time. Making it even larger is not going to make things more maintainable 
>> and is likely to create an entry barrier for new contributors (that's 
>> similar to Jia's arguments).
>>
>>
>>
>> Moreover, we've seen quite a few different language clients over the years 
>> and all but one or two survived while none is particularly active, as far as 
>> I'm aware.  Taking responsibility for more clients, without being sure that 
>> we have resources to maintain them and there is enough community around them 
>> to make such effort worthwhile, doesn't seem like a good idea.
>>
>>
>>
>> --
>>
>> Best regards,
>>
>> Maciej Szymkiewicz
>>
>>
>>
>> Web: https://zero323.net
>>
>> PGP: A30CEF0C31A501EC
>>
>>
>>
>>
>>
>> On 5/19/23 14:57, Jia Fan wrote:
>>
>> Hi,
>>
>>
>>
>> Thanks for contribution!
>>
>> I prefer (1). There are some reason:
>>
>>
>>
>> 1. Different repository can maintain independent versions, different release 
>> times, and faster bug fix releases.
>>
>>
>>
>> 2. Different languages have different build tools. Putting them in one 
>> repository will make the main repository more and more complicated, and it 
>> will become extremely difficult to perform a complete build in the main 
>> repository.
>>
>>
>>
>> 3. Different repository will make CI configuration and execute easier, and 
>> the PR and commit lists will be clearer.
>>
>>
>>
>> 4. Other repository also have different client to governed, like clickhouse. 
>> It use different repository for jdbc, odbc, c++. Please refer:
>>
>> https://github.com/ClickHouse/clickhouse-java
>>
>> https://github.com/ClickHouse/clickhouse-odbc
>>
>> https://github.com/ClickHouse/clickhouse-cpp
>>
>>
>>
>> PS: I'm looking forward to the javascript connect client!
>>
>>
>>
>> Thanks Regards
>>
>> Jia Fan
>>
>>
>>
>> Martin Grund  于2023年5月19日周五 20:03写道:
>>
>> Hi folks,
>>
>>
>>
>> When Bo (thanks for the time and contribution) started the work on 
>> https://github.com/apache/spark/pull/41036 he started the Go client directly 
>> in the Spark repository. In the meantime, I was approached by other 
>> engineers who are willing to contribute to working on a Rust client for 
>> Spark Connect.
>>
>>
>>
>> Now one of the key questions is where should these connectors live and how 
>> we manage expectations most effectively.
>>
>>
>>
>> At the high level, there are two approaches:
>>
>>
>>
>> (1) "3rd party" (non-JVM / Python) clients should live in separate 
>> repositories owned and governed by the Apache Spark community.
>>
>>
>>
>> (2) All clients should live in the main Apache Spark repository in the 
>> `connector/connect/client` directory.
>>
>>
>>
>> (3) Non-native (Python, JVM) Spark Connect clients should not be part of the 
>> Apache Spark repository and governance rules.
>>
>>
>>
>> Before we iron out how exactly, we mark these clients as experimental and 
>> how we align their release process etc with Spark, my suggestion would be to 
>> get a consensus on this first question.
>>
>>
>>
>> Personally, I'm fine with (1) and (2) with a preference for (2).
>>
>>
>>
>> Would love to get feedback from other members of the community!
>>
>>
>>
>> Thanks
>>
>> Martin
>>
>>
>>
>>
>>
>>
>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [CONNECT] New Clients for Go and Rust

2023-05-25 Thread Dongjoon Hyun
+1 for starting on a separate repo.

Dongjoon.

On Thu, May 25, 2023 at 9:53 AM yangjie01  wrote:

> +1 on start this with a separate repo.
>
> Which new clients can be placed in the main repo should be discussed after
> they are mature enough,
>
>
>
> Yang Jie
>
>
>
> *发件人**: *Denny Lee 
> *日期**: *2023年5月24日 星期三 21:31
> *收件人**: *Hyukjin Kwon 
> *抄送**: *Maciej , "dev@spark.apache.org" <
> dev@spark.apache.org>
> *主题**: *Re: [CONNECT] New Clients for Go and Rust
>
>
>
> +1 on separate repo allowing different APIs to run at different speeds and
> ensuring they get community support.
>
>
>
> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon  wrote:
>
> I think we can just start this with a separate repo.
> I am fine with the second option too but in this case we would have to
> triage which language to add into the main repo.
>
>
>
> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
>
> Hi,
>
>
>
> Personally, I'm strongly against the second option and have some
> preference towards the third one (or maybe a mix of the first one and the
> third one).
>
>
>
> The project is already pretty large as-is and, with an extremely
> conservative approach towards removal of APIs, it only tends to grow over
> time. Making it even larger is not going to make things more maintainable
> and is likely to create an entry barrier for new contributors (that's
> similar to Jia's arguments).
>
>
>
> Moreover, we've seen quite a few different language clients over the years
> and all but one or two survived while none is particularly active, as far
> as I'm aware.  Taking responsibility for more clients, without being sure
> that we have resources to maintain them and there is enough community
> around them to make such effort worthwhile, doesn't seem like a good idea.
>
>
>
> --
>
> Best regards,
>
> Maciej Szymkiewicz
>
>
>
> Web: https://zero323.net 
> <https://mailshield.baidu.com/check?q=ZqimyN8NSYrM5LNLYs2dCk0kgoTFi6Ap>
>
> PGP: A30CEF0C31A501EC
>
>
>
>
>
> On 5/19/23 14:57, Jia Fan wrote:
>
> Hi,
>
>
>
> Thanks for contribution!
>
> I prefer (1). There are some reason:
>
>
>
> 1. Different repository can maintain independent versions, different
> release times, and faster bug fix releases.
>
>
>
> 2. Different languages have different build tools. Putting them in one
> repository will make the main repository more and more complicated, and it
> will become extremely difficult to perform a complete build in the main
> repository.
>
>
>
> 3. Different repository will make CI configuration and execute easier, and
> the PR and commit lists will be clearer.
>
>
>
> 4. Other repository also have different client to governed, like
> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
>
> https://github.com/ClickHouse/clickhouse-java
> <https://mailshield.baidu.com/check?q=bnJjk%2bk2NRQA4%2fhBbhtfi0g77ETgH45cbNzxcnFzestgEKDCKORylumJUsxxaT7HfA1Uxg%3d%3d>
>
> https://github.com/ClickHouse/clickhouse-odbc
> <https://mailshield.baidu.com/check?q=ok%2fK6G9Dvxwugm5rzt2TY5COv5QeVPUNLztlmY19Qm7bDK%2fDamhM9uwqOW6MucmLhtq3EA%3d%3d>
>
> https://github.com/ClickHouse/clickhouse-cpp
> <https://mailshield.baidu.com/check?q=Pj9nXH8oXyfeUM2lboc9kI8ogubV73Ex5kRiT%2f%2byAVPoyMvoniKFcl165tM4pBXf>
>
>
>
> PS: I'm looking forward to the javascript connect client!
>
>
>
> Thanks Regards
>
> Jia Fan
>
>
>
> Martin Grund  于2023年5月19日周五 20:03写道:
>
> Hi folks,
>
>
>
> When Bo (thanks for the time and contribution) started the work on
> https://github.com/apache/spark/pull/41036
> <https://mailshield.baidu.com/check?q=QA1f5OtGUINKcNtbceorvf6kS6rrjlZn2EkcW%2fbqXOi%2fi6SdRKARKv8Ds5EYKaEV>
> he started the Go client directly in the Spark repository. In the meantime,
> I was approached by other engineers who are willing to contribute to
> working on a Rust client for Spark Connect.
>
>
>
> Now one of the key questions is where should these connectors live and how
> we manage expectations most effectively.
>
>
>
> At the high level, there are two approaches:
>
>
>
> (1) "3rd party" (non-JVM / Python) clients should live in separate
> repositories owned and governed by the Apache Spark community.
>
>
>
> (2) All clients should live in the main Apache Spark repository in the
> `connector/connect/client` directory.
>
>
>
> (3) Non-native (Python, JVM) Spark Connect clients should not be part of
> the Apache Spark repository and governance rules.
>
>
>
> Before we iron out how exactly, we mark these clients as experimental and
> how we align their release process etc with Spark, my suggestion would be
> to get a consensus on this first question.
>
>
>
> Personally, I'm fine with (1) and (2) with a preference for (2).
>
>
>
> Would love to get feedback from other members of the community!
>
>
>
> Thanks
>
> Martin
>
>
>
>
>
>
>
>
>
>


Re: [CONNECT] New Clients for Go and Rust

2023-05-25 Thread yangjie01
+1 on start this with a separate repo.
Which new clients can be placed in the main repo should be discussed after they 
are mature enough,

Yang Jie

发件人: Denny Lee 
日期: 2023年5月24日 星期三 21:31
收件人: Hyukjin Kwon 
抄送: Maciej , "dev@spark.apache.org" 

主题: Re: [CONNECT] New Clients for Go and Rust

+1 on separate repo allowing different APIs to run at different speeds and 
ensuring they get community support.

On Wed, May 24, 2023 at 00:37 Hyukjin Kwon 
mailto:gurwls...@apache.org>> wrote:
I think we can just start this with a separate repo.
I am fine with the second option too but in this case we would have to triage 
which language to add into the main repo.

On Fri, 19 May 2023 at 22:28, Maciej 
mailto:mszymkiew...@gmail.com>> wrote:
Hi,

Personally, I'm strongly against the second option and have some preference 
towards the third one (or maybe a mix of the first one and the third one).

The project is already pretty large as-is and, with an extremely conservative 
approach towards removal of APIs, it only tends to grow over time. Making it 
even larger is not going to make things more maintainable and is likely to 
create an entry barrier for new contributors (that's similar to Jia's 
arguments).

Moreover, we've seen quite a few different language clients over the years and 
all but one or two survived while none is particularly active, as far as I'm 
aware.  Taking responsibility for more clients, without being sure that we have 
resources to maintain them and there is enough community around them to make 
such effort worthwhile, doesn't seem like a good idea.


--

Best regards,

Maciej Szymkiewicz



Web: 
https://zero323.net<https://mailshield.baidu.com/check?q=ZqimyN8NSYrM5LNLYs2dCk0kgoTFi6Ap>

PGP: A30CEF0C31A501EC


On 5/19/23 14:57, Jia Fan wrote:
Hi,

Thanks for contribution!
I prefer (1). There are some reason:

1. Different repository can maintain independent versions, different release 
times, and faster bug fix releases.

2. Different languages have different build tools. Putting them in one 
repository will make the main repository more and more complicated, and it will 
become extremely difficult to perform a complete build in the main repository.

3. Different repository will make CI configuration and execute easier, and the 
PR and commit lists will be clearer.

4. Other repository also have different client to governed, like clickhouse. It 
use different repository for jdbc, odbc, c++. Please refer:
https://github.com/ClickHouse/clickhouse-java<https://mailshield.baidu.com/check?q=bnJjk%2bk2NRQA4%2fhBbhtfi0g77ETgH45cbNzxcnFzestgEKDCKORylumJUsxxaT7HfA1Uxg%3d%3d>
https://github.com/ClickHouse/clickhouse-odbc<https://mailshield.baidu.com/check?q=ok%2fK6G9Dvxwugm5rzt2TY5COv5QeVPUNLztlmY19Qm7bDK%2fDamhM9uwqOW6MucmLhtq3EA%3d%3d>
https://github.com/ClickHouse/clickhouse-cpp<https://mailshield.baidu.com/check?q=Pj9nXH8oXyfeUM2lboc9kI8ogubV73Ex5kRiT%2f%2byAVPoyMvoniKFcl165tM4pBXf>

PS: I'm looking forward to the javascript connect client!

Thanks Regards
Jia Fan

Martin Grund mailto:mgr...@apache.org>> 于2023年5月19日周五 
20:03写道:
Hi folks,

When Bo (thanks for the time and contribution) started the work on 
https://github.com/apache/spark/pull/41036<https://mailshield.baidu.com/check?q=QA1f5OtGUINKcNtbceorvf6kS6rrjlZn2EkcW%2fbqXOi%2fi6SdRKARKv8Ds5EYKaEV>
 he started the Go client directly in the Spark repository. In the meantime, I 
was approached by other engineers who are willing to contribute to working on a 
Rust client for Spark Connect.

Now one of the key questions is where should these connectors live and how we 
manage expectations most effectively.

At the high level, there are two approaches:

(1) "3rd party" (non-JVM / Python) clients should live in separate repositories 
owned and governed by the Apache Spark community.

(2) All clients should live in the main Apache Spark repository in the 
`connector/connect/client` directory.

(3) Non-native (Python, JVM) Spark Connect clients should not be part of the 
Apache Spark repository and governance rules.

Before we iron out how exactly, we mark these clients as experimental and how 
we align their release process etc with Spark, my suggestion would be to get a 
consensus on this first question.

Personally, I'm fine with (1) and (2) with a preference for (2).

Would love to get feedback from other members of the community!

Thanks
Martin






Re: [CONNECT] New Clients for Go and Rust

2023-05-24 Thread Denny Lee
+1 on separate repo allowing different APIs to run at different speeds and
ensuring they get community support.

On Wed, May 24, 2023 at 00:37 Hyukjin Kwon  wrote:

> I think we can just start this with a separate repo.
> I am fine with the second option too but in this case we would have to
> triage which language to add into the main repo.
>
> On Fri, 19 May 2023 at 22:28, Maciej  wrote:
>
>> Hi,
>>
>> Personally, I'm strongly against the second option and have some
>> preference towards the third one (or maybe a mix of the first one and the
>> third one).
>>
>> The project is already pretty large as-is and, with an extremely
>> conservative approach towards removal of APIs, it only tends to grow over
>> time. Making it even larger is not going to make things more maintainable
>> and is likely to create an entry barrier for new contributors (that's
>> similar to Jia's arguments).
>>
>> Moreover, we've seen quite a few different language clients over the
>> years and all but one or two survived while none is particularly active, as
>> far as I'm aware.  Taking responsibility for more clients, without being
>> sure that we have resources to maintain them and there is enough community
>> around them to make such effort worthwhile, doesn't seem like a good idea.
>>
>> --
>> Best regards,
>> Maciej Szymkiewicz
>>
>> Web: https://zero323.net
>> PGP: A30CEF0C31A501EC
>>
>>
>>
>> On 5/19/23 14:57, Jia Fan wrote:
>>
>> Hi,
>>
>> Thanks for contribution!
>> I prefer (1). There are some reason:
>>
>> 1. Different repository can maintain independent versions, different
>> release times, and faster bug fix releases.
>>
>> 2. Different languages have different build tools. Putting them in one
>> repository will make the main repository more and more complicated, and it
>> will become extremely difficult to perform a complete build in the main
>> repository.
>>
>> 3. Different repository will make CI configuration and execute easier,
>> and the PR and commit lists will be clearer.
>>
>> 4. Other repository also have different client to governed, like
>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
>> https://github.com/ClickHouse/clickhouse-java
>> https://github.com/ClickHouse/clickhouse-odbc
>> https://github.com/ClickHouse/clickhouse-cpp
>>
>> PS: I'm looking forward to the javascript connect client!
>>
>> Thanks Regards
>> Jia Fan
>>
>> Martin Grund  于2023年5月19日周五 20:03写道:
>>
>>> Hi folks,
>>>
>>> When Bo (thanks for the time and contribution) started the work on
>>> https://github.com/apache/spark/pull/41036 he started the Go client
>>> directly in the Spark repository. In the meantime, I was approached by
>>> other engineers who are willing to contribute to working on a Rust client
>>> for Spark Connect.
>>>
>>> Now one of the key questions is where should these connectors live and
>>> how we manage expectations most effectively.
>>>
>>> At the high level, there are two approaches:
>>>
>>> (1) "3rd party" (non-JVM / Python) clients should live in separate
>>> repositories owned and governed by the Apache Spark community.
>>>
>>> (2) All clients should live in the main Apache Spark repository in the
>>> `connector/connect/client` directory.
>>>
>>> (3) Non-native (Python, JVM) Spark Connect clients should not be part of
>>> the Apache Spark repository and governance rules.
>>>
>>> Before we iron out how exactly, we mark these clients as experimental
>>> and how we align their release process etc with Spark, my suggestion would
>>> be to get a consensus on this first question.
>>>
>>> Personally, I'm fine with (1) and (2) with a preference for (2).
>>>
>>> Would love to get feedback from other members of the community!
>>>
>>> Thanks
>>> Martin
>>>
>>>
>>>
>>>
>>


Re: [CONNECT] New Clients for Go and Rust

2023-05-24 Thread Hyukjin Kwon
I think we can just start this with a separate repo.
I am fine with the second option too but in this case we would have to
triage which language to add into the main repo.

On Fri, 19 May 2023 at 22:28, Maciej  wrote:

> Hi,
>
> Personally, I'm strongly against the second option and have some
> preference towards the third one (or maybe a mix of the first one and the
> third one).
>
> The project is already pretty large as-is and, with an extremely
> conservative approach towards removal of APIs, it only tends to grow over
> time. Making it even larger is not going to make things more maintainable
> and is likely to create an entry barrier for new contributors (that's
> similar to Jia's arguments).
>
> Moreover, we've seen quite a few different language clients over the years
> and all but one or two survived while none is particularly active, as far
> as I'm aware.  Taking responsibility for more clients, without being sure
> that we have resources to maintain them and there is enough community
> around them to make such effort worthwhile, doesn't seem like a good idea.
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>
>
>
> On 5/19/23 14:57, Jia Fan wrote:
>
> Hi,
>
> Thanks for contribution!
> I prefer (1). There are some reason:
>
> 1. Different repository can maintain independent versions, different
> release times, and faster bug fix releases.
>
> 2. Different languages have different build tools. Putting them in one
> repository will make the main repository more and more complicated, and it
> will become extremely difficult to perform a complete build in the main
> repository.
>
> 3. Different repository will make CI configuration and execute easier, and
> the PR and commit lists will be clearer.
>
> 4. Other repository also have different client to governed, like
> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
> https://github.com/ClickHouse/clickhouse-java
> https://github.com/ClickHouse/clickhouse-odbc
> https://github.com/ClickHouse/clickhouse-cpp
>
> PS: I'm looking forward to the javascript connect client!
>
> Thanks Regards
> Jia Fan
>
> Martin Grund  于2023年5月19日周五 20:03写道:
>
>> Hi folks,
>>
>> When Bo (thanks for the time and contribution) started the work on
>> https://github.com/apache/spark/pull/41036 he started the Go client
>> directly in the Spark repository. In the meantime, I was approached by
>> other engineers who are willing to contribute to working on a Rust client
>> for Spark Connect.
>>
>> Now one of the key questions is where should these connectors live and
>> how we manage expectations most effectively.
>>
>> At the high level, there are two approaches:
>>
>> (1) "3rd party" (non-JVM / Python) clients should live in separate
>> repositories owned and governed by the Apache Spark community.
>>
>> (2) All clients should live in the main Apache Spark repository in the
>> `connector/connect/client` directory.
>>
>> (3) Non-native (Python, JVM) Spark Connect clients should not be part of
>> the Apache Spark repository and governance rules.
>>
>> Before we iron out how exactly, we mark these clients as experimental and
>> how we align their release process etc with Spark, my suggestion would be
>> to get a consensus on this first question.
>>
>> Personally, I'm fine with (1) and (2) with a preference for (2).
>>
>> Would love to get feedback from other members of the community!
>>
>> Thanks
>> Martin
>>
>>
>>
>>
>


Re: [CONNECT] New Clients for Go and Rust

2023-05-19 Thread Maciej

Hi,

Personally, I'm strongly against the second option and have some 
preference towards the third one (or maybe a mix of the first one and 
the third one).


The project is already pretty large as-is and, with an extremely 
conservative approach towards removal of APIs, it only tends to grow 
over time. Making it even larger is not going to make things more 
maintainable and is likely to create an entry barrier for new 
contributors (that's similar to Jia's arguments).


Moreover, we've seen quite a few different language clients over the 
years and all but one or two survived while none is particularly active, 
as far as I'm aware.  Taking responsibility for more clients, without 
being sure that we have resources to maintain them and there is enough 
community around them to make such effort worthwhile, doesn't seem like 
a good idea.


--
Best regards,
Maciej Szymkiewicz

Web:https://zero323.net
PGP: A30CEF0C31A501EC



On 5/19/23 14:57, Jia Fan wrote:

Hi,

Thanks for contribution!
I prefer (1). There are some reason:

1. Different repository can maintain independent versions, different 
release times, and faster bug fix releases.


2. Different languages have different build tools. Putting them in one 
repository will make the main repository more and more complicated, 
and it will become extremely difficult to perform a complete build in 
the main repository.


3. Different repository will make CI configuration and execute easier, 
and the PR and commit lists will be clearer.


4. Other repository also have different client to governed, like 
clickhouse. It use different repository for jdbc, odbc, c++. Please refer:

https://github.com/ClickHouse/clickhouse-java
https://github.com/ClickHouse/clickhouse-odbc
https://github.com/ClickHouse/clickhouse-cpp

PS: I'm looking forward to the javascript connect client!

Thanks Regards
Jia Fan

Martin Grund  于2023年5月19日周五 20:03写道:

Hi folks,

When Bo (thanks for the time and contribution) started the work on
https://github.com/apache/spark/pull/41036 he started the Go
client directly in the Spark repository. In the meantime, I was
approached by other engineers who are willing to contribute to
working on a Rust client for Spark Connect.

Now one of the key questions is where should these connectors live
and how we manage expectations most effectively.

At the high level, there are two approaches:

(1) "3rd party" (non-JVM / Python) clients should live in separate
repositories owned and governed by the Apache Spark community.

(2) All clients should live in the main Apache Spark repository in
the `connector/connect/client` directory.

(3) Non-native (Python, JVM) Spark Connect clients should not be
part of the Apache Spark repository and governance rules.

Before we iron out how exactly, we mark these clients as
experimental and how we align their release process etc with
Spark, my suggestion would be to get a consensus on this first
question.

Personally, I'm fine with (1) and (2) with a preference for (2).

Would love to get feedback from other members of the community!

Thanks
Martin







OpenPGP_signature
Description: OpenPGP digital signature


Re: [CONNECT] New Clients for Go and Rust

2023-05-19 Thread Jia Fan
Hi,

Thanks for contribution!
I prefer (1). There are some reason:

1. Different repository can maintain independent versions, different
release times, and faster bug fix releases.

2. Different languages have different build tools. Putting them in one
repository will make the main repository more and more complicated, and it
will become extremely difficult to perform a complete build in the main
repository.

3. Different repository will make CI configuration and execute easier, and
the PR and commit lists will be clearer.

4. Other repository also have different client to governed, like
clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
https://github.com/ClickHouse/clickhouse-java
https://github.com/ClickHouse/clickhouse-odbc
https://github.com/ClickHouse/clickhouse-cpp

PS: I'm looking forward to the javascript connect client!

Thanks Regards
Jia Fan

Martin Grund  于2023年5月19日周五 20:03写道:

> Hi folks,
>
> When Bo (thanks for the time and contribution) started the work on
> https://github.com/apache/spark/pull/41036 he started the Go client
> directly in the Spark repository. In the meantime, I was approached by
> other engineers who are willing to contribute to working on a Rust client
> for Spark Connect.
>
> Now one of the key questions is where should these connectors live and how
> we manage expectations most effectively.
>
> At the high level, there are two approaches:
>
> (1) "3rd party" (non-JVM / Python) clients should live in separate
> repositories owned and governed by the Apache Spark community.
>
> (2) All clients should live in the main Apache Spark repository in the
> `connector/connect/client` directory.
>
> (3) Non-native (Python, JVM) Spark Connect clients should not be part of
> the Apache Spark repository and governance rules.
>
> Before we iron out how exactly, we mark these clients as experimental and
> how we align their release process etc with Spark, my suggestion would be
> to get a consensus on this first question.
>
> Personally, I'm fine with (1) and (2) with a preference for (2).
>
> Would love to get feedback from other members of the community!
>
> Thanks
> Martin
>
>
>
>


[CONNECT] New Clients for Go and Rust

2023-05-19 Thread Martin Grund
Hi folks,

When Bo (thanks for the time and contribution) started the work on
https://github.com/apache/spark/pull/41036 he started the Go client
directly in the Spark repository. In the meantime, I was approached by
other engineers who are willing to contribute to working on a Rust client
for Spark Connect.

Now one of the key questions is where should these connectors live and how
we manage expectations most effectively.

At the high level, there are two approaches:

(1) "3rd party" (non-JVM / Python) clients should live in separate
repositories owned and governed by the Apache Spark community.

(2) All clients should live in the main Apache Spark repository in the
`connector/connect/client` directory.

(3) Non-native (Python, JVM) Spark Connect clients should not be part of
the Apache Spark repository and governance rules.

Before we iron out how exactly, we mark these clients as experimental and
how we align their release process etc with Spark, my suggestion would be
to get a consensus on this first question.

Personally, I'm fine with (1) and (2) with a preference for (2).

Would love to get feedback from other members of the community!

Thanks
Martin