Aren't we getting a few things mixed up here? 1) As Micah says, the original proposal is about adapting Java types to Arrow. This can be used independently of Flight SQL. I don't think this was being pitched as a standard itself unless I'm mistaken?
2) Flight SQL the protocol, which _is_ a language agnostic standard, though maybe not the one applications will generally choose to consume. 3) Idiomatic/standard per-language APIs that build on Flight SQL, which will include JDBC/ODBC (there is a reference JDBC driver in the works [1]), but I agree there's room for something that uses Arrow types, supports partitioning, etc. as well. (And I agree there's room for something that supports these features but is _not_ Flight SQL underneath.) --- I'm not super experienced with JDBC/ODBC - would extending them basically mean something like (in JDBC) providing interfaces that Connections, ResultSets, etc. could be cast to to access the "Arrow-native" bits? And in ODBC, using something like the SQL_C_BINARY type to 'tunnel' Arrow data through ODBC buffers, and/or providing a set of C API functions that could convert between (say) an ODBC statement handle and an Arrow C Data Interface ArrowArrayStream? [1]: https://github.com/apache/arrow/pull/12254 -David On Tue, Mar 15, 2022, at 01:06, Micah Kornfield wrote: > Hi Julian, > > >> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it >> successful, build it on top of the leading API in each language (which in >> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose >> through your API the fact that FlightSQL is underneath. > > > My understanding is that this thread is all about implementing a Flight > server and making those ergonomics easier. On the client side, I think the > power of Flight/FlightSQL is two fold: > 1. Reference ODBC/JDBC drivers that can consume the wire format (and I > think many clients will go this route). I think these are in the process > of being contributed already. Which as you noted there is power in > standards, so I expect this avenue to see heavy use. > 2. For clients that can handle it and want to go through the trouble, > consuming the data directly as Arrow for efficiency purposes. I don't > think we've discussed canonical APIs by extending ODBC/JDBC but I like that > idea. That seems like a discussion for after we have working JDBC/ODBC > reference implementation though? > > I might have missed it but I don't think either approach on the client side > has been discussed on this thread. I also think this is why Dataframe > might not be the best name for the adapter because it comes with all sorts > of assumptions about usage both on a client and a server. > > Cheers, > Micah > > > > > > > > > On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com> wrote: > >> When I read “language-agnostic standard for data access” I cringed a >> little. (See [1].) >> >> Sure, it’s fun to create a new standard. But if your standard is >> successful, there will need to be a huge amount of work changing existing >> code to use your standard. That effort might even be difference between >> success and failure for a small project, and therefore you have helped >> protect the incumbents. >> >> My solution? >> >> I would like the FlightSQL authors to make clear that it is a wire >> protocol, and only a protocol. >> >> Rather than creating new APIs, I would like people to spend their effort >> implementing existing APIs (such as ODBC and JDBC) on top of FlightSQL. >> >> If those APIs are inadequate (e.g. they don’t provide access to the raw >> Arrow data, or don’t support INSERT or SELECT that are partitioned across >> several clients/servers), then add extensions to those APIs. But still >> implement the core APIs. When I describe a table from Java, I want to a >> result set that exactly matches JDBC’s getTables [2]. >> >> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it >> successful, build it on top of the leading API in each language (which in >> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose >> through your API the fact that FlightSQL is underneath. >> >> Julian >> >> [1] https://xkcd.com/927/ <https://xkcd.com/927/> >> >> [2] >> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A- >> < >> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-> >> >> >> >> > On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> wrote: >> > >> > While trying to implement and introduce the idea of adopting FlightSQL, >> the >> > largest challenge was the API itself >> > >> > I know it's meant to be low-level. But I found that most of the >> development >> > time was in code to convert to/from >> > row-based data (IE Map<String, Object>) and Java types, and columnar >> data + >> > Arrow types. >> > >> > I'm likely in the minority position here -- I know that Arrow and >> FlightSQL >> > users are largely looking at transferring large volumes of data and >> > servicing OLAP-type workloads >> > But the thing that excites me most about FlightSQL, isn't its performance >> > (always nice to have), but that it's a language-agnostic standard for >> data >> > access. >> > >> > That has broad implications -- for all kinds of data-access workloads and >> > business usecases. >> > >> > The challenge is that in trying to advocate for it, when presenting a >> > proof-of-concept, >> > rather than what a developer might expect to see, something like: >> > >> > // FlightSQL handler code >> > List<Map<String, Object>> results = ....; >> > results.add(Map.of("id", 1, "name", "Person 1"); >> > return results; >> > >> > A significant portion of the code is in Arrow-specific implementation >> > details: >> > creating a VectorSchemaRoot, FieldVector, de-serializing the results on >> the >> > client, etc. >> > >> > Just curious whether there is any interest/intention of possibly making a >> > higher level API around the basic FlightSQL one? >> > Maybe something closer to the traditional notion of a row-based >> "DataFrame" >> > or "Table", like: >> > >> > DataFrame df = new DataFrame(); >> > df.addColumn("id", ArrowTypes.Int); >> > df.addColumn("name", ArrowTypes.VarChar); >> > df.addRow(Map.of("id", 1, "name", "Person 1")); >> > VectorSchemaRoot root = df.toVectorSchemaRoot(); >> > listener.setVectorSchemaRoot(root); >> > listener.sendVectorSchemaRootContents(); >> >>