Le 21/10/2022 à 17:35, David Li a écrit :
Maybe to take a step back - why do we want this in the Arrow repositories/under
Arrow governance?
I'm excited to see more integrations and use cases for Flight and Flight SQL in
the wild, but I think it would be good to see a true ecosystem around this, and
so I don't think -every- integration needs to end up in the Arrow repos. And
there is a cost to set up CI, releases, etc. (ADBC is still getting set up
there, and my hope at least is that most integrations will eventually be
provided by the database systems, not by Arrow.)
That said I'm not necessarily opposed. We've discussed similar 'contrib' things
in the past [1][2]. It may be worth reviewing the discussions there and
discussing how this project would address the criteria proposed.
The problem is that Arrow is so broad nowadays that a "contrib" repo
would end up hosting a hodgepodge of entirely disparate subprojects with
no common maintenance/release policies, and disjoint development and
user communities.
A separate Apache repo for each subproject is probably better, even
though there might be a small setup overhead.
Regards
Antoine.
[1]: https://lists.apache.org/thread/nfr3tq1tb5tvr34zg5z7on8xglfsj79t
[2]: https://lists.apache.org/thread/yshp4b3g34kxovzvf6x48pzj0894qbw5 (though
you may have to dig to find the responses - the UI didn't link them up)
On Fri, Oct 21, 2022, at 11:08, Kyle Brooks wrote:
Hi David and Antoine,
Long-term I completely agree that this should belong in Apache Spark.
I also agree that Flight SQL or ADBC would be a good enhancement for
users. We are planning on implementing Flight SQL support soon. ADBC
doesn't look mature enough right now for this use case. We will keep
an eye on it.
Short-term, I'd like to propose either creating an Arrow contrib repo
or making a separate Apache repo just for the Flight Spark Connector.
We would need help facilitating this within Apache / Arrow.
Thank you,
Kyle
On 2022/10/18 23:44:49 David Li wrote:
Given the probable need for IP clearance, getting it into Arrow would also be a
Process(TM) unfortunately. We also don't really have a great place for "not quite in
tree" projects; there have been discussions of a 'contrib' repo in the past, but
nothing has materialized.
That said - have you shown this to Spark users? I'd guess there'd be more
enthusiasm there, especially if there are particular data source(s) you
anticipate this would make available to them. (Though again, Flight SQL or ADBC
over plain Flight RPC would might be a more attractive target for such a Spark
plugin.)
-David
On Tue, Oct 18, 2022, at 16:50, Matt Phelps wrote:
Hi David and Antoine,
Thanks for your input. On past experience talking to some other Arrow /
Spark developers, we anticipate that it would take a long time to get
into Spark. Our plan was to build up a user base in the Arrow community
before submitting for inclusion to Spark. Is there a place the code can
live in the mean time?
Matt Phelps
From: Antoine Pitrou <an...@python.org>
Date: Monday, October 17, 2022 at 2:48 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: [DISCUSS] Integrate existing Spark connector for Flight
CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and
know the content is safe.
Le 17/10/2022 à 21:27, David Li a écrit :
Hey Matt,
This is cool to see. To be clear, this is an implementation of Spark
DataSourceV2 using Arrow Flight?
I think the questions I have are:
- Does this belong under Arrow, or under Spark - I lean towards it being closer
to Spark than Arrow;
FWIW, that is my feeling as well.
Regards
Antoine.