date:20230601

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread bo yang

Hi Martin,

Thanks a lot for preparing the new repo and making it super easy for me to
just copy my code to the new repo! I will create a new PR there.

> I think the PR is fine from a code perspective as a starting point. I've
prepared the go repository with all the things necessary so that it reduces
friction for you. The protos are automatically generated, pre-commit checks
etc. All you need to do is drop your code :)

> Once we have the first version working we can iterate and identify the
next steps.

Best,
Bo


On Thu, Jun 1, 2023 at 11:58 AM Martin Grund 
wrote:

> These are all valid points and it makes total sense to continue to
> consider them. However, reading the mail I'm wondering if we're discussing
> the same problems.
>
> Deprecation of APIs aside, the main benefit it Spark Connect is that the
> contract is explicitly not a Jar file full of transitive dependencies (and
> discoverable internal APIs) but rather the contract established via the
> proto messages and RPCs.  If you compare this for example to the R
> integration there is no need to emebed some Go pieces with the JVM to make
> it work. No custom RMI protocol specific to the client language but simply
> the same contract as for example PySpark uses. The physical contact is the
> protobuf and the logical contact is the dataframe API.
>
> This means that Spark Connect clients don't suffer a large part of the
> challenges that other tools built on top of Spark have as there is no tigut
> coupling between the driver JVM and the client.
>
> I'm happy to help establish clear guidance of contrib style modules that
> operate with a different set of expectations but are developed by the spark
> community and its guidelines.
>
> Martin
>
>
> On Thu 1. Jun 2023 at 12:41 Maciej  wrote:
>
>> Hi Martin,
>>
>>
>> On 5/30/23 11:50, Martin Grund wrote:
>> > I think it makes sense to split this discussion into two pieces. On >
>> the contribution side, my personal perspective is that these new > clients
>> are explicitly marked as experimental and unsupported until > we deem them
>> mature enough to be supported using the standard release > process etc.
>> However, the goal should be that the main contributors > of these clients
>> are aiming to follow the same release and > maintenance schedule. I think
>> we should encourage the community to > contribute to the Spark Connect
>> clients and as such we should > explicitly not make it as hard as possible
>> to get started (and for > that reason reserve the right to abandon).
>>
>> I know it sounds like a nitpicking, but we still have components
>> deprecated in 1.2 or 1.3, not to mention subprojects that haven't been
>> developed for years.  So, there is a huge gap between reserving a right and
>> actually exercising it when needed. If such a right is to be used
>> differently for Spark Connect bindings, it's something that should be
>> communicated upfront.
>> > > How exactly the release schedule is going to look is going to require
>> > probably some experimentation because it's a new area for Spark and >
>> it's ecosystem. I don't think it requires us to have all answers > upfront.
>>
>> Nonetheless, we should work towards establishing consensus around these
>> issues and documenting the answers. They affect not only the maintainers
>> (see for example a recent discussion about switching to a more predictable
>> release schedule) but also the users, for whom multiple APIs (including
>> their development status) have been a common source of confusion in the
>> past.
>> >> Also, an elephant in the room is the future of the current API in >>
>> Spark 4 and onwards. As useful as connect is, it is not exactly a >>
>> replacement for many existing deployments. Furthermore, it doesn't >> make
>> extending Spark much easier and the current ecosystem is, >> subjectively
>> speaking, a bit brittle. > > The goal of Spark Connect is not to replace
>> the way users are > currently deploying Spark, it's not meant to be that.
>> Users should > continue deploying Spark in exactly the way they prefer.
>> Spark > Connect allows bringing more interactivity and connectivity to
>> Spark. > While Spark Connect extends Spark, most new language consumers
>> will > not try to extend Spark, but simply provide the existing surface to
>> > their native language. So the goal is not so much extensibility but >
>> more availability. For example, I believe it would be awesome if the > Livy
>> community would find a way to integrate with Spark Connect to > provide the
>> routing capabilities to provide a stable DNS endpoint for > all different
>> Spark deployments. > >> [...] the current ecosystem is, subjectively
>> speaking, a bit >> brittle. > > Can you help me understand that a bit
>> better? Do you mean the Spark > ecosystem or the Spark Connect ecosystem?
>>
>> I mean Spark in general. While most of the core and some closely related
>> projects are well maintained, tools built on top of Spark, even ones
>> supported by

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Martin Grund

These are all valid points and it makes total sense to continue to consider
them. However, reading the mail I'm wondering if we're discussing the same
problems.

Deprecation of APIs aside, the main benefit it Spark Connect is that the
contract is explicitly not a Jar file full of transitive dependencies (and
discoverable internal APIs) but rather the contract established via the
proto messages and RPCs.  If you compare this for example to the R
integration there is no need to emebed some Go pieces with the JVM to make
it work. No custom RMI protocol specific to the client language but simply
the same contract as for example PySpark uses. The physical contact is the
protobuf and the logical contact is the dataframe API.

This means that Spark Connect clients don't suffer a large part of the
challenges that other tools built on top of Spark have as there is no tigut
coupling between the driver JVM and the client.

I'm happy to help establish clear guidance of contrib style modules that
operate with a different set of expectations but are developed by the spark
community and its guidelines.

Martin


On Thu 1. Jun 2023 at 12:41 Maciej  wrote:

> Hi Martin,
>
>
> On 5/30/23 11:50, Martin Grund wrote:
> > I think it makes sense to split this discussion into two pieces. On >
> the contribution side, my personal perspective is that these new > clients
> are explicitly marked as experimental and unsupported until > we deem them
> mature enough to be supported using the standard release > process etc.
> However, the goal should be that the main contributors > of these clients
> are aiming to follow the same release and > maintenance schedule. I think
> we should encourage the community to > contribute to the Spark Connect
> clients and as such we should > explicitly not make it as hard as possible
> to get started (and for > that reason reserve the right to abandon).
>
> I know it sounds like a nitpicking, but we still have components
> deprecated in 1.2 or 1.3, not to mention subprojects that haven't been
> developed for years.  So, there is a huge gap between reserving a right and
> actually exercising it when needed. If such a right is to be used
> differently for Spark Connect bindings, it's something that should be
> communicated upfront.
> > > How exactly the release schedule is going to look is going to require
> > probably some experimentation because it's a new area for Spark and >
> it's ecosystem. I don't think it requires us to have all answers > upfront.
>
> Nonetheless, we should work towards establishing consensus around these
> issues and documenting the answers. They affect not only the maintainers
> (see for example a recent discussion about switching to a more predictable
> release schedule) but also the users, for whom multiple APIs (including
> their development status) have been a common source of confusion in the
> past.
> >> Also, an elephant in the room is the future of the current API in >>
> Spark 4 and onwards. As useful as connect is, it is not exactly a >>
> replacement for many existing deployments. Furthermore, it doesn't >> make
> extending Spark much easier and the current ecosystem is, >> subjectively
> speaking, a bit brittle. > > The goal of Spark Connect is not to replace
> the way users are > currently deploying Spark, it's not meant to be that.
> Users should > continue deploying Spark in exactly the way they prefer.
> Spark > Connect allows bringing more interactivity and connectivity to
> Spark. > While Spark Connect extends Spark, most new language consumers
> will > not try to extend Spark, but simply provide the existing surface to
> > their native language. So the goal is not so much extensibility but >
> more availability. For example, I believe it would be awesome if the > Livy
> community would find a way to integrate with Spark Connect to > provide the
> routing capabilities to provide a stable DNS endpoint for > all different
> Spark deployments. > >> [...] the current ecosystem is, subjectively
> speaking, a bit >> brittle. > > Can you help me understand that a bit
> better? Do you mean the Spark > ecosystem or the Spark Connect ecosystem?
>
> I mean Spark in general. While most of the core and some closely related
> projects are well maintained, tools built on top of Spark, even ones
> supported by major stakeholders, are often short-lived and left
> unmaintained, if not officially abandoned.
>
> New languages aside, without a single extension point (which, for core
> Spark is JVM interface), maintaining public projects on top of Spark
> becomes even less attractive. That, assuming we don't completely reject the
> idea of extending Spark functionality while using Spark Connect,
> effectively limiting the target audience for any 3rd party library.
>
> > > Martin > > > On Fri, May 26, 2023 at 5:39 PM Maciej
>   > wrote: > > It might
> be a good idea to have a discussion about how new connect > clients fit
> into the overall process we have. In particular: > > *

Re: Late materialization?

2023-06-01 Thread Mich Talebzadeh

Hi Alex,


   - Your first assertion is correct. Regardless of Spark and back to
   Jurassic Park data processing , partition pruning and column pruning are
   either involved all or none. This means that for a given query, all
   partitions and columns are either used or not used. There is no selective
   reading of values based on the specific filtering and aggregation
   requirements of the query.
   - With "late materialization" on the other hand  you only consider values
   required for filtering and aggregation at the query processing time.
   Whatever else requested by the query that is not needed for filtering and
   aggregation would be done later.
   - I suppose your mileage varies because if the underlying data changes
   you are going to have potential problems that traditional databases have
   handled through Isolation levels. Spark Structured Streaming may introduce
   such challenges as well. So any traditional DML type changes as opposed to
   DQ, may impact this. I am not sure Spark can have this or rely on the
   default isolation level of the underlying data source. The brute force is
   that readers allow readers but block writers during query processing.

HTH

I assume that " late materialisation"  refers to the idea of deferring the
reading or computation of values until they are actually needed for
filtering and aggregation, rather than reading all values upfront.
Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 31 May 2023 at 19:14, Alex Cruise  wrote:

> Just to clarify briefly, in hopes that future searchers will find this
> thread... ;)
>
> IIUC at the moment, partition pruning and column pruning are
> all-or-nothing: every partition and every column either is, or is not, used
> for a query.
>
> Late materialization would mean that only the values needed for filtering
> & aggregation would be read in the scan+filter stages, and any expressions
> requested by the user but not needed for filtering and aggregation would
> only be read/computed afterward.
>
> I can see how this will invite sequential consistency problems, in data
> sources where mutations like DML or compactions are happening behind the
> query's back, but presumably Spark users already have this class of
> problem, it's just less serious when the end-to-end execution time of a
> query is shorter.
>
> WDYT?
>
> -0xe1a
>
> On Wed, May 31, 2023 at 11:03 AM Alex Cruise  wrote:
>
>> Hey folks, I'm building a Spark connector for my company's proprietary
>> data lake... That project is going fine despite the near total lack of
>> documentation. ;)
>>
>> In parallel, I'm also trying to figure out a better story for when humans
>> inevitably `select * from 100_trillion_rows`, glance at the first page,
>> then walk away forever. The traditional RDBMS approach seems to be to keep
>> a lot of state in server-side cursors, so they can eagerly fetch only the
>> first few pages of results and go to sleep until the user advances the
>> cursor, at which point we wake up and fetch a few more pages.
>>
>> After some cursory googling about how Trino handles this nightmare
>> scenario, I found https://github.com/trinodb/trino/issues/49 and its
>> child https://github.com/trinodb/trino/pull/602, which appear to be
>> based on the paper http://www.vldb.org/pvldb/vol4/p539-neumann.pdf,
>> which is what HyPerDB (never open source, acquired by Tableau) was based on.
>>
>> IIUC this kind of optimization isn't really feasible in Spark at present,
>> due to the sharp distinction between transforms, which are always lazy, and
>> actions, which are always eager. However, given the very desirable
>> performance/efficiency benefits, I think it's worth starting this
>> conversation: if we wanted to do something like this, where would we start?
>>
>> Thanks!
>>
>> -0xe1a
>>
>

Re: Remove protobuf 2.5.0 from Spark dependencies

2023-06-01 Thread Steve Loughran

we would cut it from the hadoop dependencies, but still allow IPC messages
using it to be marshalled if the protoc-compiled classes were using the
protobuf-2.5 JAR *and* that JAR was on the classpath.

it'd be come the homework of those apps which need protobuf-2.5, here
hbase, to set things up.



On Sat, 27 May 2023 at 14:39, 张铎(Duo Zhang)  wrote:

> For hbase, the problem is we rely on protobuf 2.5 for our coprocessors.
>
> See HBASE-27436.
>
> Cheng Pan  于2023年5月24日周三 10:00写道：
>
>> +CC dev@hbase
>>
>> Thanks,
>> Cheng Pan
>>
>> On Fri, May 19, 2023 at 4:08 AM Steve Loughran
>>  wrote:
>> >
>> >
>> >
>> > On Thu, 18 May 2023 at 03:45, Cheng Pan  wrote:
>> >>
>> >> Steve, thanks for the information, I think HADOOP-17046 should be fine
>> for the Spark case.
>> >>
>> >> Hadoop put the protobuf 3 into the pre-shaded hadoop-thirdparty, and
>> the hadoop-client-runtime shades protobuf 2 during the package, which
>> results in protobuf 2 and 3 co-exist in hadoop-client-runtime in different
>> packages:
>> >>
>> >> - protobuf 2: org.apache.hadoop.shaded.com.google.protobuf
>> >> - protobuf 3: org.apache.hadoop.thirdparty.protobuf
>> >
>> > j
>> > oh, so in fact that "put it back in unshaded" change doesn't do
>> anything useful through the hadoop-client lib. so it is very much useless.
>> >>
>> >>
>> >> As HADOOP-18487 plans to mark the protobuf 2 optional, will this make
>> hadoop-client-runtime does not ship protobuf 2? If yes, things become worse
>> for downstream projects who consumes hadoop shaded client, like Spark,
>> because it requires the user to add vanilla protobuf 2 jar into the
>> classpath if they want to access those API.
>> >
>> >
>> > Well, what applications are using  
>> > org.apache.hadoop.shaded.com.google.protobuf
>> ? hadoop itself doesn't; it's only referenced in unshaded form because
>> hbase wanted the IPC library to still work with the unshaded version they
>> were still using. But if the parquet2 lib is now only available shaded,
>> their protobuf compiled .class files aren't going to link to it, are they?
>> >
>> > does anyone know how spark + hbase + hadoop-client-runtime work so that
>> spark can talk to an hbase server? especially: what is needed on the
>> classpath, and what gets loaded for a call
>> >>
>> >>
>> >> In summary, I think the current state is fine. But for security
>> purposes, the Hadoop community may want to remove the EOL protobuf 2
>> classes from hadoop-client-runtime.
>> >
>> >
>> >  +1. the shaded one which is in use also needs upgrading.
>> >
>> >>
>> >> Thanks,
>> >> Cheng Pan
>> >>
>> >>
>> >> On May 17, 2023 at 04:10:43, Dongjoon Hyun 
>> wrote:
>> >>>
>> >>> Thank you for sharing, Steve.
>> >>>
>> >>> Dongjoon
>> >>>
>> >>> On Tue, May 16, 2023 at 11:44 AM Steve Loughran
>>  wrote:
>> 
>>  I have some bad news here which is even though hadoop cut protobuf
>> 2.5 support, hbase team put it back in (HADOOP-17046). I don't know if the
>> shaded hadoop client has removed that dependency on protobuf 2.5.
>> 
>>  In HADOOP-18487 i want to allow hadoop to cut that dependency, with
>> hbase having to add it to the classpath if they still want it:
>>  https://github.com/apache/hadoop/pull/4996
>> 
>>  It's been neglected -if you can help with review/test etc that'd be
>> great. I'd love to get this into the 3.3.6 release.
>> 
>>  On Sat, 13 May 2023 at 08:36, Cheng Pan  wrote:
>> >
>> > Hi all,
>> >
>> > In SPARK-42452 (apache/spark#41153 [1]), I’m trying to remove
>> protobuf 2.5.0 from the Spark dependencies.
>> >
>> > Spark does not use protobuf 2.5.0 directly, instead, it comes from
>> other dependencies, with the following changes, now, Spark does not require
>> protobuf 2.5.0.
>> >
>> > - SPARK-40323 upgraded ORC 1.8.0, which moved from protobuf 2.5.0
>> to a shaded protobuf 3
>> >
>> > - SPARK-33212 switched from Hadoop vanilla client to Hadoop shaded
>> client, also removed the protobuf 2 dependency. SPARK-42452 removed the
>> support for Hadoop 2.
>> >
>> > - SPARK-14421 shaded and relocated protobuf 2.6.1, which is
>> required by the kinesis client, into the kinesis assembly jar
>> >
>> > - Spark itself's core/connect/protobuf modules use protobuf 3, also
>> shaded and relocated all protobuf 3 deps.
>> >
>> > Feel free to comment if you still have any concerns.
>> >
>> > [1] https://github.com/apache/spark/pull/41153
>> >
>> > Thanks,
>> > Cheng Pan
>>
>

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Maciej

Hi Martin,

On 5/30/23 11:50, Martin Grund wrote:
I think it makes sense to split this discussion into two pieces. On  > the contribution side, my personal perspective is that these new > 
clients are explicitly marked as experimental and unsupported until > we 
deem them mature enough to be supported using the standard release > 
process etc. However, the goal should be that the main contributors > of 
these clients are aiming to follow the same release and > maintenance 
schedule. I think we should encourage the community to > contribute to 
the Spark Connect clients and as such we should > explicitly not make it 
as hard as possible to get started (and for > that reason reserve the 
right to abandon).

I know it sounds like a nitpicking, but we still have components 
deprecated in 1.2 or 1.3, not to mention subprojects that haven't been 
developed for years.  So, there is a huge gap between reserving a right 
and actually exercising it when needed. If such a right is to be used 
differently for Spark Connect bindings, it's something that should be 
communicated upfront.

 > How exactly the release schedule is going to look is going to require 
> probably some experimentation because it's a new area for Spark and > 
it's ecosystem. I don't think it requires us to have all answers > upfront.

Nonetheless, we should work towards establishing consensus around these 
issues and documenting the answers. They affect not only the maintainers 
(see for example a recent discussion about switching to a more 
predictable release schedule) but also the users, for whom multiple APIs 
(including their development status) have been a common source of 
confusion in the past.

Also, an elephant in the room is the future of the current API in  >> Spark 4 and onwards. As useful as connect is, it is not exactly a 
>> replacement for many existing deployments. Furthermore, it doesn't 
>> make extending Spark much easier and the current ecosystem is, >> 
subjectively speaking, a bit brittle. > > The goal of Spark Connect is 
not to replace the way users are > currently deploying Spark, it's not 
meant to be that. Users should > continue deploying Spark in exactly the 
way they prefer. Spark > Connect allows bringing more interactivity and 
connectivity to Spark. > While Spark Connect extends Spark, most new 
language consumers will > not try to extend Spark, but simply provide 
the existing surface to > their native language. So the goal is not so 
much extensibility but > more availability. For example, I believe it 
would be awesome if the > Livy community would find a way to integrate 
with Spark Connect to > provide the routing capabilities to provide a 
stable DNS endpoint for > all different Spark deployments. > >> [...] 
the current ecosystem is, subjectively speaking, a bit >> brittle. > > 
Can you help me understand that a bit better? Do you mean the Spark > 
ecosystem or the Spark Connect ecosystem?

I mean Spark in general. While most of the core and some closely related 
projects are well maintained, tools built on top of Spark, even ones 
supported by major stakeholders, are often short-lived and left 
unmaintained, if not officially abandoned.

New languages aside, without a single extension point (which, for core 
Spark is JVM interface), maintaining public projects on top of Spark 
becomes even less attractive. That, assuming we don't completely reject 
the idea of extending Spark functionality while using Spark Connect, 
effectively limiting the target audience for any 3rd party library.

 > Martin > > > On Fri, May 26, 2023 at 5:39 PM Maciej 
 > wrote: > > It might be a good idea to have a 
discussion about how new connect > clients fit into the overall process 
we have. In particular: > > * Under what conditions do we consider 
adding a new language to the > official channels? What process do we 
follow? * What guarantees do > we offer in respect to these clients? Is 
adding a new client the same > type of commitment as for the core API? 
In other words, do we commit > to maintaining such clients "forever" or 
do we separate the > "official" and "contrib" clients, with the later 
being governed by > the ASF, but not guaranteed to be maintained in the 
future? * Do we > follow the same release schedule as for the core 
project, or rather > release each client separately, after the main 
release is completed? > > Also, an elephant in the room is the future of 
the current API in > Spark 4 and onwards. As useful as connect is, it is 
not exactly a > replacement for many existing deployments. Furthermore, 
it doesn't > make extending Spark much easier and the current ecosystem 
is, > subjectively speaking, a bit brittle. > > -- Best regards, Maciej 
> > > On 5/26/23 07:26, Martin Grund wrote: >> Thanks everyone for your 
feedback! I will work on figuring out what >> it takes to get started 
with a repo for the go client. >> >> On Thu 25. May 2023 at 21:51 Chao 
Sun  wrote: >> >> +1 on separate repo

Re: [CONNECT] New Clients for Go and Rust

2023-06-01 Thread Martin Grund

Hi Bo,

I think the PR is fine from a code perspective as a starting point. I've
prepared the go repository with all the things necessary so that it reduces
friction for you. The protos are automatically generated, pre-commit checks
etc. All you need to do is drop your code :)

Once we have the first version working we can iterate and identify the next
steps.

Thanks
Martin


On Thu, Jun 1, 2023 at 2:50 AM bo yang  wrote:

> Just see the discussions here! Really appreciate Martin and other folks
> helping on my previous Golang Spark Connect PR (
> https://github.com/apache/spark/pull/41036)!
>
> Great to see we have a new repo for Spark Golang Connect client. Thanks 
> Hyukjin!
> I am thinking to migrate my PR to this new repo. Would like to hear any
> feedback or suggestion before I make the new PR :)
>
> Thanks,
> Bo
>
>
>
> On Tue, May 30, 2023 at 3:38 AM Martin Grund 
> wrote:
>
>> Hi folks,
>>
>> Thanks a lot to the help form Hykjin! We've create the
>> https://github.com/apache/spark-connect-go as the first contrib
>> repository for Spark Connect under the Apache Spark project. We will move
>> the development of the Golang client to this repository and make it very
>> clear from the README file that this is an experimental client.
>>
>> Looking forward to all your contributions!
>>
>> On Tue, May 30, 2023 at 11:50 AM Martin Grund 
>> wrote:
>>
>>> I think it makes sense to split this discussion into two pieces. On the
>>> contribution side, my personal perspective is that these new clients are
>>> explicitly marked as experimental and unsupported until we deem them mature
>>> enough to be supported using the standard release process etc. However, the
>>> goal should be that the main contributors of these clients are aiming to
>>> follow the same release and maintenance schedule. I think we should
>>> encourage the community to contribute to the Spark Connect clients and as
>>> such we should explicitly not make it as hard as possible to get started
>>> (and for that reason reserve the right to abandon).
>>>
>>> How exactly the release schedule is going to look is going to require
>>> probably some experimentation because it's a new area for Spark and it's
>>> ecosystem. I don't think it requires us to have all answers upfront.
>>>
>>> > Also, an elephant in the room is the future of the current API in
>>> Spark 4 and onwards. As useful as connect is, it is not exactly a
>>> replacement for many existing deployments. Furthermore, it doesn't make
>>> extending Spark much easier and the current ecosystem is, subjectively
>>> speaking, a bit brittle.
>>>
>>> The goal of Spark Connect is not to replace the way users are currently
>>> deploying Spark, it's not meant to be that. Users should continue deploying
>>> Spark in exactly the way they prefer. Spark Connect allows bringing more
>>> interactivity and connectivity to Spark. While Spark Connect extends Spark,
>>> most new language consumers will not try to extend Spark, but simply
>>> provide the existing surface to their native language. So the goal is not
>>> so much extensibility but more availability. For example, I believe it
>>> would be awesome if the Livy community would find a way to integrate with
>>> Spark Connect to provide the routing capabilities to provide a stable DNS
>>> endpoint for all different Spark deployments.
>>>
>>> > [...] the current ecosystem is, subjectively speaking, a bit brittle.
>>>
>>> Can you help me understand that a bit better? Do you mean the Spark
>>> ecosystem or the Spark Connect ecosystem?
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>> On Fri, May 26, 2023 at 5:39 PM Maciej  wrote:
>>>
 It might be a good idea to have a discussion about how new connect
 clients fit into the overall process we have. In particular:


- Under what conditions do we consider adding a new language to the
official channels?  What process do we follow?
- What guarantees do we offer in respect to these clients? Is
adding a new client the same type of commitment as for the core API? In
other words, do we commit to maintaining such clients "forever" or do we
separate the "official" and "contrib" clients, with the later being
governed by the ASF, but not guaranteed to be maintained in the future?
- Do we follow the same release schedule as for the core project,
or rather release each client separately, after the main release is
completed?

 Also, an elephant in the room is the future of the current API in Spark
 4 and onwards. As useful as connect is, it is not exactly a replacement for
 many existing deployments. Furthermore, it doesn't make extending Spark
 much easier and the current ecosystem is, subjectively speaking, a bit
 brittle.

 --
 Best regards,
 Maciej


 On 5/26/23 07:26, Martin Grund wrote:

 Thanks everyone for your feedback! I will work on figuring out what it
 takes to get

Re: [CONNECT] New Clients for Go and Rust

Re: [CONNECT] New Clients for Go and Rust

Re: Late materialization?

Re: Remove protobuf 2.5.0 from Spark dependencies

Re: [CONNECT] New Clients for Go and Rust

Re: [CONNECT] New Clients for Go and Rust

6 matches

Site Navigation

Mail list logo

Footer information