[ 
https://issues.apache.org/jira/browse/ARROW-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029344#comment-17029344
 ] 

Andy Grove commented on ARROW-7744:
-----------------------------------

Hi Jacques,

I was wary of adding more dependencies unless/until they are really needed.
I've implemented production JDBC drivers before, and there is definitely a
bit of tedious work involved in implementing the result set type conversion
code and some of the associated metadata functionality but my gut feeling
so far is that the long term burden would be less than designing around
Avatica. Avatica seems to provide much more than we need with a server
process, wire protocol, etc. It also has its own type system so we would
have to convert between Avatica and Arrow types. It seems preferable to
design this from the ground up based on Arrow types? I also was not able to
find comprehensive documentation for building a JDBC driver like this with
Avatica, which concerned me.

I guess I could try and do a mini bake-off here and create a PR based on
Avatica as well so we can compare the approaches. I can also ask some
questions on the appropriate mailing list about Avatica's suitability for
this use case.

Thanks,

Andy.







On Mon, Feb 3, 2020 at 2:58 PM Jacques Nadeau (Jira) <j...@apache.org>



> [Java] Implement Flight JDBC Driver
> -----------------------------------
>
>                 Key: ARROW-7744
>                 URL: https://issues.apache.org/jira/browse/ARROW-7744
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Java
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a Java developer, I would like the ability to use JDBC to interact with 
> Flight servers. For example, there is now an example in the Arrow repo to run 
> a Flight server wrapping DataFusion and it supports executing SQL against CSV 
> and Parquet files. I would like to be able to call this from Java.
> A flight Arrow JDBC driver would also then simplify developing integrations 
> with other Apache projects, such as building a Spark V2 Data Source or a 
> Drill storage plugin. It would also be directly usable from many BI tools.
> I propose that the class name of the driver should be 
> "org.apache.arrow.jdbc.Driver" and the connection string should be 
> "jdbc:arrow://host:port?[properties]". I'm purposely leaving "flight" out of 
> these because I don't think it makes sense to support multiple protocols now 
> that we have flight and it is easier for users to remember "arrow" rather 
> than needing to know about the protocol. This is easy to change if there are 
> objections.
> JDBC is designed around sending queries as strings and then receiving 
> results. These strings could be SQL queries, JSON-encoded query plans, or 
> something else. The JDBC driver will not make any assumptions about the 
> format or dialect of these strings. Queries would be executed using the 
> "DoGet" method.
> The JDBC metadata functionality for reading schema information could possibly 
> use ListFlights but I haven't looked into this part yet.
> I do expect that this JDBC driver will serve as a base that could be extended 
> to add specific functionality for different Flight servers rather than 
> attempt to support them all.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to