Hi Julian,
> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it > successful, build it on top of the leading API in each language (which in > Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose > through your API the fact that FlightSQL is underneath. My understanding is that this thread is all about implementing a Flight server and making those ergonomics easier. On the client side, I think the power of Flight/FlightSQL is two fold: 1. Reference ODBC/JDBC drivers that can consume the wire format (and I think many clients will go this route). I think these are in the process of being contributed already. Which as you noted there is power in standards, so I expect this avenue to see heavy use. 2. For clients that can handle it and want to go through the trouble, consuming the data directly as Arrow for efficiency purposes. I don't think we've discussed canonical APIs by extending ODBC/JDBC but I like that idea. That seems like a discussion for after we have working JDBC/ODBC reference implementation though? I might have missed it but I don't think either approach on the client side has been discussed on this thread. I also think this is why Dataframe might not be the best name for the adapter because it comes with all sorts of assumptions about usage both on a client and a server. Cheers, Micah On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com> wrote: > When I read “language-agnostic standard for data access” I cringed a > little. (See [1].) > > Sure, it’s fun to create a new standard. But if your standard is > successful, there will need to be a huge amount of work changing existing > code to use your standard. That effort might even be difference between > success and failure for a small project, and therefore you have helped > protect the incumbents. > > My solution? > > I would like the FlightSQL authors to make clear that it is a wire > protocol, and only a protocol. > > Rather than creating new APIs, I would like people to spend their effort > implementing existing APIs (such as ODBC and JDBC) on top of FlightSQL. > > If those APIs are inadequate (e.g. they don’t provide access to the raw > Arrow data, or don’t support INSERT or SELECT that are partitioned across > several clients/servers), then add extensions to those APIs. But still > implement the core APIs. When I describe a table from Java, I want to a > result set that exactly matches JDBC’s getTables [2]. > > I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it > successful, build it on top of the leading API in each language (which in > Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose > through your API the fact that FlightSQL is underneath. > > Julian > > [1] https://xkcd.com/927/ <https://xkcd.com/927/> > > [2] > https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A- > < > https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-> > > > > > On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> wrote: > > > > While trying to implement and introduce the idea of adopting FlightSQL, > the > > largest challenge was the API itself > > > > I know it's meant to be low-level. But I found that most of the > development > > time was in code to convert to/from > > row-based data (IE Map<String, Object>) and Java types, and columnar > data + > > Arrow types. > > > > I'm likely in the minority position here -- I know that Arrow and > FlightSQL > > users are largely looking at transferring large volumes of data and > > servicing OLAP-type workloads > > But the thing that excites me most about FlightSQL, isn't its performance > > (always nice to have), but that it's a language-agnostic standard for > data > > access. > > > > That has broad implications -- for all kinds of data-access workloads and > > business usecases. > > > > The challenge is that in trying to advocate for it, when presenting a > > proof-of-concept, > > rather than what a developer might expect to see, something like: > > > > // FlightSQL handler code > > List<Map<String, Object>> results = ....; > > results.add(Map.of("id", 1, "name", "Person 1"); > > return results; > > > > A significant portion of the code is in Arrow-specific implementation > > details: > > creating a VectorSchemaRoot, FieldVector, de-serializing the results on > the > > client, etc. > > > > Just curious whether there is any interest/intention of possibly making a > > higher level API around the basic FlightSQL one? > > Maybe something closer to the traditional notion of a row-based > "DataFrame" > > or "Table", like: > > > > DataFrame df = new DataFrame(); > > df.addColumn("id", ArrowTypes.Int); > > df.addColumn("name", ArrowTypes.VarChar); > > df.addRow(Map.of("id", 1, "name", "Person 1")); > > VectorSchemaRoot root = df.toVectorSchemaRoot(); > > listener.setVectorSchemaRoot(root); > > listener.sendVectorSchemaRootContents(); > >