Le 21/10/2022 à 17:35, David Li a écrit :
Maybe to take a step back - why do we want this in the Arrow repositories/under 
Arrow governance?

I'm excited to see more integrations and use cases for Flight and Flight SQL in 
the wild, but I think it would be good to see a true ecosystem around this, and 
so I don't think -every- integration needs to end up in the Arrow repos. And 
there is a cost to set up CI, releases, etc. (ADBC is still getting set up 
there, and my hope at least is that most integrations will eventually be 
provided by the database systems, not by Arrow.)

That said I'm not necessarily opposed. We've discussed similar 'contrib' things 
in the past [1][2]. It may be worth reviewing the discussions there and 
discussing how this project would address the criteria proposed.

The problem is that Arrow is so broad nowadays that a "contrib" repo would end up hosting a hodgepodge of entirely disparate subprojects with no common maintenance/release policies, and disjoint development and user communities.

A separate Apache repo for each subproject is probably better, even though there might be a small setup overhead.

Regards

Antoine.






[1]: https://lists.apache.org/thread/nfr3tq1tb5tvr34zg5z7on8xglfsj79t
[2]: https://lists.apache.org/thread/yshp4b3g34kxovzvf6x48pzj0894qbw5 (though 
you may have to dig to find the responses - the UI didn't link them up)

On Fri, Oct 21, 2022, at 11:08, Kyle Brooks wrote:
Hi David and Antoine,

Long-term I completely agree that this should belong in Apache Spark.
I also agree that Flight SQL or ADBC would be a good enhancement for
users.  We are planning on implementing Flight SQL support soon.  ADBC
doesn't look mature enough right now for this use case.  We will keep
an eye on it.

Short-term, I'd like to propose either creating an Arrow contrib repo
or making a separate Apache repo just for the Flight Spark Connector.

We would need help facilitating this within Apache / Arrow.

Thank you,
Kyle

On 2022/10/18 23:44:49 David Li wrote:
Given the probable need for IP clearance, getting it into Arrow would also be a 
Process(TM) unfortunately. We also don't really have a great place for "not quite in 
tree" projects; there have been discussions of a 'contrib' repo in the past, but 
nothing has materialized.

That said - have you shown this to Spark users? I'd guess there'd be more 
enthusiasm there, especially if there are particular data source(s) you 
anticipate this would make available to them. (Though again, Flight SQL or ADBC 
over plain Flight RPC would might be a more attractive target for such a Spark 
plugin.)

-David

On Tue, Oct 18, 2022, at 16:50, Matt Phelps wrote:
Hi David and Antoine,

Thanks for your input. On past experience talking to some other Arrow /
Spark developers, we anticipate that it would take a long time to get
into Spark. Our plan was to build up a user base in the Arrow community
before submitting for inclusion to Spark. Is there a place the code can
live in the mean time?

Matt Phelps


From: Antoine Pitrou <an...@python.org>
Date: Monday, October 17, 2022 at 2:48 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISCUSS] Integrate existing Spark connector for Flight
CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and
know the content is safe.

Le 17/10/2022 à 21:27, David Li a écrit :
Hey Matt,

This is cool to see. To be clear, this is an implementation of Spark 
DataSourceV2 using Arrow Flight?

I think the questions I have are:

- Does this belong under Arrow, or under Spark - I lean towards it being closer 
to Spark than Arrow;

FWIW, that is my feeling as well.

Regards

Antoine.

Reply via email to