When I read “language-agnostic standard for data access” I cringed a little. 
(See [1].)

Sure, it’s fun to create a new standard. But if your standard is successful, 
there will need to be a huge amount of work changing existing code to use your 
standard. That effort might even be difference between success and failure for 
a small project, and therefore you have helped protect the incumbents.

My solution?

I would like the FlightSQL authors to make clear that it is a wire protocol, 
and only a protocol.

Rather than creating new APIs, I would like people to spend their effort 
implementing existing APIs (such as ODBC and JDBC) on top of FlightSQL.

If those APIs are inadequate (e.g. they don’t provide access to the raw Arrow 
data, or don’t support INSERT or SELECT that are partitioned across several 
clients/servers), then add extensions to those APIs. But still implement the 
core APIs. When I describe a table from Java, I want to a result set that 
exactly matches JDBC’s getTables [2].

I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it 
successful, build it on top of the leading API in each language (which in Java 
would be FlightSQL’s JDBC driver). I don’t see a good reason to expose through 
your API the fact that FlightSQL is underneath. 

Julian

[1] https://xkcd.com/927/ <https://xkcd.com/927/>

[2] 
https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
 
<https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A->
 


> On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> wrote:
> 
> While trying to implement and introduce the idea of adopting FlightSQL, the
> largest challenge was the API itself
> 
> I know it's meant to be low-level. But I found that most of the development
> time was in code to convert to/from
> row-based data (IE Map<String, Object>) and Java types, and columnar data +
> Arrow types.
> 
> I'm likely in the minority position here -- I know that Arrow and FlightSQL
> users are largely looking at transferring large volumes of data and
> servicing OLAP-type workloads
> But the thing that excites me most about FlightSQL, isn't its performance
> (always nice to have), but that it's a language-agnostic standard for data
> access.
> 
> That has broad implications -- for all kinds of data-access workloads and
> business usecases.
> 
> The challenge is that in trying to advocate for it, when presenting a
> proof-of-concept,
> rather than what a developer might expect to see, something like:
> 
> // FlightSQL handler code
> List<Map<String, Object>> results = ....;
> results.add(Map.of("id", 1, "name", "Person 1");
> return results;
> 
> A significant portion of the code is in Arrow-specific implementation
> details:
> creating a VectorSchemaRoot, FieldVector, de-serializing the results on the
> client, etc.
> 
> Just curious whether there is any interest/intention of possibly making a
> higher level API around the basic FlightSQL one?
> Maybe something closer to the traditional notion of a row-based "DataFrame"
> or "Table", like:
> 
> DataFrame df = new DataFrame();
> df.addColumn("id", ArrowTypes.Int);
> df.addColumn("name", ArrowTypes.VarChar);
> df.addRow(Map.of("id", 1, "name", "Person 1"));
> VectorSchemaRoot root = df.toVectorSchemaRoot();
> listener.setVectorSchemaRoot(root);
> listener.sendVectorSchemaRootContents();

Reply via email to