Just see the discussions here! Really appreciate Martin and other folks helping on my previous Golang Spark Connect PR ( https://github.com/apache/spark/pull/41036)!
Great to see we have a new repo for Spark Golang Connect client. Thanks Hyukjin! I am thinking to migrate my PR to this new repo. Would like to hear any feedback or suggestion before I make the new PR :) Thanks, Bo On Tue, May 30, 2023 at 3:38 AM Martin Grund <mar...@databricks.com.invalid> wrote: > Hi folks, > > Thanks a lot to the help form Hykjin! We've create the > https://github.com/apache/spark-connect-go as the first contrib > repository for Spark Connect under the Apache Spark project. We will move > the development of the Golang client to this repository and make it very > clear from the README file that this is an experimental client. > > Looking forward to all your contributions! > > On Tue, May 30, 2023 at 11:50 AM Martin Grund <mar...@databricks.com> > wrote: > >> I think it makes sense to split this discussion into two pieces. On the >> contribution side, my personal perspective is that these new clients are >> explicitly marked as experimental and unsupported until we deem them mature >> enough to be supported using the standard release process etc. However, the >> goal should be that the main contributors of these clients are aiming to >> follow the same release and maintenance schedule. I think we should >> encourage the community to contribute to the Spark Connect clients and as >> such we should explicitly not make it as hard as possible to get started >> (and for that reason reserve the right to abandon). >> >> How exactly the release schedule is going to look is going to require >> probably some experimentation because it's a new area for Spark and it's >> ecosystem. I don't think it requires us to have all answers upfront. >> >> > Also, an elephant in the room is the future of the current API in Spark >> 4 and onwards. As useful as connect is, it is not exactly a replacement for >> many existing deployments. Furthermore, it doesn't make extending Spark >> much easier and the current ecosystem is, subjectively speaking, a bit >> brittle. >> >> The goal of Spark Connect is not to replace the way users are currently >> deploying Spark, it's not meant to be that. Users should continue deploying >> Spark in exactly the way they prefer. Spark Connect allows bringing more >> interactivity and connectivity to Spark. While Spark Connect extends Spark, >> most new language consumers will not try to extend Spark, but simply >> provide the existing surface to their native language. So the goal is not >> so much extensibility but more availability. For example, I believe it >> would be awesome if the Livy community would find a way to integrate with >> Spark Connect to provide the routing capabilities to provide a stable DNS >> endpoint for all different Spark deployments. >> >> > [...] the current ecosystem is, subjectively speaking, a bit brittle. >> >> Can you help me understand that a bit better? Do you mean the Spark >> ecosystem or the Spark Connect ecosystem? >> >> >> >> Martin >> >> >> On Fri, May 26, 2023 at 5:39 PM Maciej <mszymkiew...@gmail.com> wrote: >> >>> It might be a good idea to have a discussion about how new connect >>> clients fit into the overall process we have. In particular: >>> >>> >>> - Under what conditions do we consider adding a new language to the >>> official channels? What process do we follow? >>> - What guarantees do we offer in respect to these clients? Is adding >>> a new client the same type of commitment as for the core API? In other >>> words, do we commit to maintaining such clients "forever" or do we >>> separate >>> the "official" and "contrib" clients, with the later being governed by >>> the >>> ASF, but not guaranteed to be maintained in the future? >>> - Do we follow the same release schedule as for the core project, or >>> rather release each client separately, after the main release is >>> completed? >>> >>> Also, an elephant in the room is the future of the current API in Spark >>> 4 and onwards. As useful as connect is, it is not exactly a replacement for >>> many existing deployments. Furthermore, it doesn't make extending Spark >>> much easier and the current ecosystem is, subjectively speaking, a bit >>> brittle. >>> >>> -- >>> Best regards, >>> Maciej >>> >>> >>> On 5/26/23 07:26, Martin Grund wrote: >>> >>> Thanks everyone for your feedback! I will work on figuring out what it >>> takes to get started with a repo for the go client. >>> >>> On Thu 25. May 2023 at 21:51 Chao Sun <sunc...@apache.org> wrote: >>> >>>> +1 on separate repo too >>>> >>>> On Thu, May 25, 2023 at 12:43 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>> wrote: >>>> > >>>> > +1 for starting on a separate repo. >>>> > >>>> > Dongjoon. >>>> > >>>> > On Thu, May 25, 2023 at 9:53 AM yangjie01 <yangji...@baidu.com> >>>> wrote: >>>> >> >>>> >> +1 on start this with a separate repo. >>>> >> >>>> >> Which new clients can be placed in the main repo should be discussed >>>> after they are mature enough, >>>> >> >>>> >> >>>> >> >>>> >> Yang Jie >>>> >> >>>> >> >>>> >> >>>> >> 发件人: Denny Lee <denny.g....@gmail.com> >>>> >> 日期: 2023年5月24日 星期三 21:31 >>>> >> 收件人: Hyukjin Kwon <gurwls...@apache.org> >>>> >> 抄送: Maciej <mszymkiew...@gmail.com>, "dev@spark.apache.org" < >>>> dev@spark.apache.org> >>>> >> 主题: Re: [CONNECT] New Clients for Go and Rust >>>> >> >>>> >> >>>> >> >>>> >> +1 on separate repo allowing different APIs to run at different >>>> speeds and ensuring they get community support. >>>> >> >>>> >> >>>> >> >>>> >> On Wed, May 24, 2023 at 00:37 Hyukjin Kwon <gurwls...@apache.org> >>>> wrote: >>>> >> >>>> >> I think we can just start this with a separate repo. >>>> >> I am fine with the second option too but in this case we would have >>>> to triage which language to add into the main repo. >>>> >> >>>> >> >>>> >> >>>> >> On Fri, 19 May 2023 at 22:28, Maciej <mszymkiew...@gmail.com> wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> >>>> >> >>>> >> Personally, I'm strongly against the second option and have some >>>> preference towards the third one (or maybe a mix of the first one and the >>>> third one). >>>> >> >>>> >> >>>> >> >>>> >> The project is already pretty large as-is and, with an extremely >>>> conservative approach towards removal of APIs, it only tends to grow over >>>> time. Making it even larger is not going to make things more maintainable >>>> and is likely to create an entry barrier for new contributors (that's >>>> similar to Jia's arguments). >>>> >> >>>> >> >>>> >> >>>> >> Moreover, we've seen quite a few different language clients over the >>>> years and all but one or two survived while none is particularly active, as >>>> far as I'm aware. Taking responsibility for more clients, without being >>>> sure that we have resources to maintain them and there is enough community >>>> around them to make such effort worthwhile, doesn't seem like a good idea. >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> >>>> >> Best regards, >>>> >> >>>> >> Maciej Szymkiewicz >>>> >> >>>> >> >>>> >> >>>> >> Web: https://zero323.net >>>> >> >>>> >> PGP: A30CEF0C31A501EC >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> On 5/19/23 14:57, Jia Fan wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> >>>> >> >>>> >> Thanks for contribution! >>>> >> >>>> >> I prefer (1). There are some reason: >>>> >> >>>> >> >>>> >> >>>> >> 1. Different repository can maintain independent versions, different >>>> release times, and faster bug fix releases. >>>> >> >>>> >> >>>> >> >>>> >> 2. Different languages have different build tools. Putting them in >>>> one repository will make the main repository more and more complicated, and >>>> it will become extremely difficult to perform a complete build in the main >>>> repository. >>>> >> >>>> >> >>>> >> >>>> >> 3. Different repository will make CI configuration and execute >>>> easier, and the PR and commit lists will be clearer. >>>> >> >>>> >> >>>> >> >>>> >> 4. Other repository also have different client to governed, like >>>> clickhouse. It use different repository for jdbc, odbc, c++. Please refer: >>>> >> >>>> >> https://github.com/ClickHouse/clickhouse-java >>>> >> >>>> >> https://github.com/ClickHouse/clickhouse-odbc >>>> >> >>>> >> https://github.com/ClickHouse/clickhouse-cpp >>>> >> >>>> >> >>>> >> >>>> >> PS: I'm looking forward to the javascript connect client! >>>> >> >>>> >> >>>> >> >>>> >> Thanks Regards >>>> >> >>>> >> Jia Fan >>>> >> >>>> >> >>>> >> >>>> >> Martin Grund <mgr...@apache.org> 于2023年5月19日周五 20:03写道: >>>> >> >>>> >> Hi folks, >>>> >> >>>> >> >>>> >> >>>> >> When Bo (thanks for the time and contribution) started the work on >>>> https://github.com/apache/spark/pull/41036 he started the Go client >>>> directly in the Spark repository. In the meantime, I was approached by >>>> other engineers who are willing to contribute to working on a Rust client >>>> for Spark Connect. >>>> >> >>>> >> >>>> >> >>>> >> Now one of the key questions is where should these connectors live >>>> and how we manage expectations most effectively. >>>> >> >>>> >> >>>> >> >>>> >> At the high level, there are two approaches: >>>> >> >>>> >> >>>> >> >>>> >> (1) "3rd party" (non-JVM / Python) clients should live in separate >>>> repositories owned and governed by the Apache Spark community. >>>> >> >>>> >> >>>> >> >>>> >> (2) All clients should live in the main Apache Spark repository in >>>> the `connector/connect/client` directory. >>>> >> >>>> >> >>>> >> >>>> >> (3) Non-native (Python, JVM) Spark Connect clients should not be >>>> part of the Apache Spark repository and governance rules. >>>> >> >>>> >> >>>> >> >>>> >> Before we iron out how exactly, we mark these clients as >>>> experimental and how we align their release process etc with Spark, my >>>> suggestion would be to get a consensus on this first question. >>>> >> >>>> >> >>>> >> >>>> >> Personally, I'm fine with (1) and (2) with a preference for (2). >>>> >> >>>> >> >>>> >> >>>> >> Would love to get feedback from other members of the community! >>>> >> >>>> >> >>>> >> >>>> >> Thanks >>>> >> >>>> >> Martin >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >>>