Aren't we getting a few things mixed up here? 

1) As Micah says, the original proposal is about adapting Java types to Arrow. 
This can be used independently of Flight SQL. I don't think this was being 
pitched as a standard itself unless I'm mistaken?

2) Flight SQL the protocol, which _is_ a language agnostic standard, though 
maybe not the one applications will generally choose to consume.

3) Idiomatic/standard per-language APIs that build on Flight SQL, which will 
include JDBC/ODBC (there is a reference JDBC driver in the works [1]), but I 
agree there's room for something that uses Arrow types, supports partitioning, 
etc. as well. (And I agree there's room for something that supports these 
features but is _not_ Flight SQL underneath.)

---

I'm not super experienced with JDBC/ODBC - would extending them basically mean 
something like (in JDBC) providing interfaces that Connections, ResultSets, 
etc. could be cast to to access the "Arrow-native" bits? And in ODBC, using 
something like the SQL_C_BINARY type to 'tunnel' Arrow data through ODBC 
buffers, and/or providing a set of C API functions that could convert between 
(say) an ODBC statement handle and an Arrow C Data Interface ArrowArrayStream?

[1]: https://github.com/apache/arrow/pull/12254

-David

On Tue, Mar 15, 2022, at 01:06, Micah Kornfield wrote:
> Hi Julian,
>
>
>> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it
>> successful, build it on top of the leading API in each language (which in
>> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose
>> through your API the fact that FlightSQL is underneath.
>
>
> My understanding is that this thread is all about implementing a Flight
> server and making those ergonomics easier.  On the client side, I think the
> power of Flight/FlightSQL is two fold:
> 1.  Reference ODBC/JDBC drivers that can consume the wire format (and I
> think many clients will go this route).  I think these are in the process
> of being contributed already.  Which as you noted there is power in
> standards, so I expect this avenue to see heavy use.
> 2.  For clients that can handle it and want to go through the trouble,
> consuming the data directly as Arrow for efficiency purposes.   I don't
> think we've discussed canonical APIs by extending ODBC/JDBC but I like that
> idea.  That seems like a discussion for after we have working JDBC/ODBC
> reference implementation though?
>
> I might have missed it but I don't think either approach on the client side
> has been discussed on this thread.  I also think this is why Dataframe
> might not be the best name for the adapter because it comes with all sorts
> of assumptions about usage both on a client and a server.
>
> Cheers,
> Micah
>
>
>
>
>
>
>
>
> On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:
>
>> When I read “language-agnostic standard for data access” I cringed a
>> little. (See [1].)
>>
>> Sure, it’s fun to create a new standard. But if your standard is
>> successful, there will need to be a huge amount of work changing existing
>> code to use your standard. That effort might even be difference between
>> success and failure for a small project, and therefore you have helped
>> protect the incumbents.
>>
>> My solution?
>>
>> I would like the FlightSQL authors to make clear that it is a wire
>> protocol, and only a protocol.
>>
>> Rather than creating new APIs, I would like people to spend their effort
>> implementing existing APIs (such as ODBC and JDBC) on top of FlightSQL.
>>
>> If those APIs are inadequate (e.g. they don’t provide access to the raw
>> Arrow data, or don’t support INSERT or SELECT that are partitioned across
>> several clients/servers), then add extensions to those APIs. But still
>> implement the core APIs. When I describe a table from Java, I want to a
>> result set that exactly matches JDBC’s getTables [2].
>>
>> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it
>> successful, build it on top of the leading API in each language (which in
>> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose
>> through your API the fact that FlightSQL is underneath.
>>
>> Julian
>>
>> [1] https://xkcd.com/927/ <https://xkcd.com/927/>
>>
>> [2]
>> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
>> <
>> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A->
>>
>>
>>
>> > On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> wrote:
>> >
>> > While trying to implement and introduce the idea of adopting FlightSQL,
>> the
>> > largest challenge was the API itself
>> >
>> > I know it's meant to be low-level. But I found that most of the
>> development
>> > time was in code to convert to/from
>> > row-based data (IE Map<String, Object>) and Java types, and columnar
>> data +
>> > Arrow types.
>> >
>> > I'm likely in the minority position here -- I know that Arrow and
>> FlightSQL
>> > users are largely looking at transferring large volumes of data and
>> > servicing OLAP-type workloads
>> > But the thing that excites me most about FlightSQL, isn't its performance
>> > (always nice to have), but that it's a language-agnostic standard for
>> data
>> > access.
>> >
>> > That has broad implications -- for all kinds of data-access workloads and
>> > business usecases.
>> >
>> > The challenge is that in trying to advocate for it, when presenting a
>> > proof-of-concept,
>> > rather than what a developer might expect to see, something like:
>> >
>> > // FlightSQL handler code
>> > List<Map<String, Object>> results = ....;
>> > results.add(Map.of("id", 1, "name", "Person 1");
>> > return results;
>> >
>> > A significant portion of the code is in Arrow-specific implementation
>> > details:
>> > creating a VectorSchemaRoot, FieldVector, de-serializing the results on
>> the
>> > client, etc.
>> >
>> > Just curious whether there is any interest/intention of possibly making a
>> > higher level API around the basic FlightSQL one?
>> > Maybe something closer to the traditional notion of a row-based
>> "DataFrame"
>> > or "Table", like:
>> >
>> > DataFrame df = new DataFrame();
>> > df.addColumn("id", ArrowTypes.Int);
>> > df.addColumn("name", ArrowTypes.VarChar);
>> > df.addRow(Map.of("id", 1, "name", "Person 1"));
>> > VectorSchemaRoot root = df.toVectorSchemaRoot();
>> > listener.setVectorSchemaRoot(root);
>> > listener.sendVectorSchemaRootContents();
>>
>>

Reply via email to