Re: [CONNECT] New Clients for Go and Rust

bo yang Wed, 31 May 2023 17:50:19 -0700

Just see the discussions here! Really appreciate Martin and other folks
helping on my previous Golang Spark Connect PR (
https://github.com/apache/spark/pull/41036)!


Great to see we have a new repo for Spark Golang Connect client.
Thanks Hyukjin!
I am thinking to migrate my PR to this new repo. Would like to hear any
feedback or suggestion before I make the new PR :)

Thanks,
Bo



On Tue, May 30, 2023 at 3:38 AM Martin Grund <mar...@databricks.com.invalid>
wrote:

> Hi folks,
>
> Thanks a lot to the help form Hykjin! We've create the
> https://github.com/apache/spark-connect-go as the first contrib
> repository for Spark Connect under the Apache Spark project. We will move
> the development of the Golang client to this repository and make it very
> clear from the README file that this is an experimental client.
>
> Looking forward to all your contributions!
>
> On Tue, May 30, 2023 at 11:50 AM Martin Grund <mar...@databricks.com>
> wrote:
>
>> I think it makes sense to split this discussion into two pieces. On the
>> contribution side, my personal perspective is that these new clients are
>> explicitly marked as experimental and unsupported until we deem them mature
>> enough to be supported using the standard release process etc. However, the
>> goal should be that the main contributors of these clients are aiming to
>> follow the same release and maintenance schedule. I think we should
>> encourage the community to contribute to the Spark Connect clients and as
>> such we should explicitly not make it as hard as possible to get started
>> (and for that reason reserve the right to abandon).
>>
>> How exactly the release schedule is going to look is going to require
>> probably some experimentation because it's a new area for Spark and it's
>> ecosystem. I don't think it requires us to have all answers upfront.
>>
>> > Also, an elephant in the room is the future of the current API in Spark
>> 4 and onwards. As useful as connect is, it is not exactly a replacement for
>> many existing deployments. Furthermore, it doesn't make extending Spark
>> much easier and the current ecosystem is, subjectively speaking, a bit
>> brittle.
>>
>> The goal of Spark Connect is not to replace the way users are currently
>> deploying Spark, it's not meant to be that. Users should continue deploying
>> Spark in exactly the way they prefer. Spark Connect allows bringing more
>> interactivity and connectivity to Spark. While Spark Connect extends Spark,
>> most new language consumers will not try to extend Spark, but simply
>> provide the existing surface to their native language. So the goal is not
>> so much extensibility but more availability. For example, I believe it
>> would be awesome if the Livy community would find a way to integrate with
>> Spark Connect to provide the routing capabilities to provide a stable DNS
>> endpoint for all different Spark deployments.
>>
>> > [...] the current ecosystem is, subjectively speaking, a bit brittle.
>>
>> Can you help me understand that a bit better? Do you mean the Spark
>> ecosystem or the Spark Connect ecosystem?
>>
>>
>>
>> Martin
>>
>>
>> On Fri, May 26, 2023 at 5:39 PM Maciej <mszymkiew...@gmail.com> wrote:
>>
>>> It might be a good idea to have a discussion about how new connect
>>> clients fit into the overall process we have. In particular:
>>>
>>>
>>>    - Under what conditions do we consider adding a new language to the
>>>    official channels?  What process do we follow?
>>>    - What guarantees do we offer in respect to these clients? Is adding
>>>    a new client the same type of commitment as for the core API? In other
>>>    words, do we commit to maintaining such clients "forever" or do we 
>>> separate
>>>    the "official" and "contrib" clients, with the later being governed by 
>>> the
>>>    ASF, but not guaranteed to be maintained in the future?
>>>    - Do we follow the same release schedule as for the core project, or
>>>    rather release each client separately, after the main release is 
>>> completed?
>>>
>>> Also, an elephant in the room is the future of the current API in Spark
>>> 4 and onwards. As useful as connect is, it is not exactly a replacement for
>>> many existing deployments. Furthermore, it doesn't make extending Spark
>>> much easier and the current ecosystem is, subjectively speaking, a bit
>>> brittle.
>>>
>>> --
>>> Best regards,
>>> Maciej
>>>
>>>
>>> On 5/26/23 07:26, Martin Grund wrote:
>>>
>>> Thanks everyone for your feedback! I will work on figuring out what it
>>> takes to get started with a repo for the go client.
>>>
>>> On Thu 25. May 2023 at 21:51 Chao Sun <sunc...@apache.org> wrote:
>>>
>>>> +1 on separate repo too
>>>>
>>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>> wrote:
>>>> >
>>>> > +1 for starting on a separate repo.
>>>> >
>>>> > Dongjoon.
>>>> >
>>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01 <yangji...@baidu.com>
>>>> wrote:
>>>> >>
>>>> >> +1 on start this with a separate repo.
>>>> >>
>>>> >> Which new clients can be placed in the main repo should be discussed
>>>> after they are mature enough,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Yang Jie
>>>> >>
>>>> >>
>>>> >>
>>>> >> 发件人: Denny Lee <denny.g....@gmail.com>
>>>> >> 日期: 2023年5月24日 星期三 21:31
>>>> >> 收件人: Hyukjin Kwon <gurwls...@apache.org>
>>>> >> 抄送: Maciej <mszymkiew...@gmail.com>, "dev@spark.apache.org" <
>>>> dev@spark.apache.org>
>>>> >> 主题: Re: [CONNECT] New Clients for Go and Rust
>>>> >>
>>>> >>
>>>> >>
>>>> >> +1 on separate repo allowing different APIs to run at different
>>>> speeds and ensuring they get community support.
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon <gurwls...@apache.org>
>>>> wrote:
>>>> >>
>>>> >> I think we can just start this with a separate repo.
>>>> >> I am fine with the second option too but in this case we would have
>>>> to triage which language to add into the main repo.
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, 19 May 2023 at 22:28, Maciej <mszymkiew...@gmail.com> wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Personally, I'm strongly against the second option and have some
>>>> preference towards the third one (or maybe a mix of the first one and the
>>>> third one).
>>>> >>
>>>> >>
>>>> >>
>>>> >> The project is already pretty large as-is and, with an extremely
>>>> conservative approach towards removal of APIs, it only tends to grow over
>>>> time. Making it even larger is not going to make things more maintainable
>>>> and is likely to create an entry barrier for new contributors (that's
>>>> similar to Jia's arguments).
>>>> >>
>>>> >>
>>>> >>
>>>> >> Moreover, we've seen quite a few different language clients over the
>>>> years and all but one or two survived while none is particularly active, as
>>>> far as I'm aware.  Taking responsibility for more clients, without being
>>>> sure that we have resources to maintain them and there is enough community
>>>> around them to make such effort worthwhile, doesn't seem like a good idea.
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >>
>>>> >> Best regards,
>>>> >>
>>>> >> Maciej Szymkiewicz
>>>> >>
>>>> >>
>>>> >>
>>>> >> Web: https://zero323.net
>>>> >>
>>>> >> PGP: A30CEF0C31A501EC
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On 5/19/23 14:57, Jia Fan wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks for contribution!
>>>> >>
>>>> >> I prefer (1). There are some reason:
>>>> >>
>>>> >>
>>>> >>
>>>> >> 1. Different repository can maintain independent versions, different
>>>> release times, and faster bug fix releases.
>>>> >>
>>>> >>
>>>> >>
>>>> >> 2. Different languages have different build tools. Putting them in
>>>> one repository will make the main repository more and more complicated, and
>>>> it will become extremely difficult to perform a complete build in the main
>>>> repository.
>>>> >>
>>>> >>
>>>> >>
>>>> >> 3. Different repository will make CI configuration and execute
>>>> easier, and the PR and commit lists will be clearer.
>>>> >>
>>>> >>
>>>> >>
>>>> >> 4. Other repository also have different client to governed, like
>>>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer:
>>>> >>
>>>> >> https://github.com/ClickHouse/clickhouse-java
>>>> >>
>>>> >> https://github.com/ClickHouse/clickhouse-odbc
>>>> >>
>>>> >> https://github.com/ClickHouse/clickhouse-cpp
>>>> >>
>>>> >>
>>>> >>
>>>> >> PS: I'm looking forward to the javascript connect client!
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks Regards
>>>> >>
>>>> >> Jia Fan
>>>> >>
>>>> >>
>>>> >>
>>>> >> Martin Grund <mgr...@apache.org> 于2023年5月19日周五 20:03写道：
>>>> >>
>>>> >> Hi folks,
>>>> >>
>>>> >>
>>>> >>
>>>> >> When Bo (thanks for the time and contribution) started the work on
>>>> https://github.com/apache/spark/pull/41036 he started the Go client
>>>> directly in the Spark repository. In the meantime, I was approached by
>>>> other engineers who are willing to contribute to working on a Rust client
>>>> for Spark Connect.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Now one of the key questions is where should these connectors live
>>>> and how we manage expectations most effectively.
>>>> >>
>>>> >>
>>>> >>
>>>> >> At the high level, there are two approaches:
>>>> >>
>>>> >>
>>>> >>
>>>> >> (1) "3rd party" (non-JVM / Python) clients should live in separate
>>>> repositories owned and governed by the Apache Spark community.
>>>> >>
>>>> >>
>>>> >>
>>>> >> (2) All clients should live in the main Apache Spark repository in
>>>> the `connector/connect/client` directory.
>>>> >>
>>>> >>
>>>> >>
>>>> >> (3) Non-native (Python, JVM) Spark Connect clients should not be
>>>> part of the Apache Spark repository and governance rules.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Before we iron out how exactly, we mark these clients as
>>>> experimental and how we align their release process etc with Spark, my
>>>> suggestion would be to get a consensus on this first question.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Personally, I'm fine with (1) and (2) with a preference for (2).
>>>> >>
>>>> >>
>>>> >>
>>>> >> Would love to get feedback from other members of the community!
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >> Martin
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>
>>>

Re: [CONNECT] New Clients for Go and Rust

Reply via email to