Hi Julian,

> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it
> successful, build it on top of the leading API in each language (which in
> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose
> through your API the fact that FlightSQL is underneath.


My understanding is that this thread is all about implementing a Flight
server and making those ergonomics easier.  On the client side, I think the
power of Flight/FlightSQL is two fold:
1.  Reference ODBC/JDBC drivers that can consume the wire format (and I
think many clients will go this route).  I think these are in the process
of being contributed already.  Which as you noted there is power in
standards, so I expect this avenue to see heavy use.
2.  For clients that can handle it and want to go through the trouble,
consuming the data directly as Arrow for efficiency purposes.   I don't
think we've discussed canonical APIs by extending ODBC/JDBC but I like that
idea.  That seems like a discussion for after we have working JDBC/ODBC
reference implementation though?

I might have missed it but I don't think either approach on the client side
has been discussed on this thread.  I also think this is why Dataframe
might not be the best name for the adapter because it comes with all sorts
of assumptions about usage both on a client and a server.

Cheers,
Micah








On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> When I read “language-agnostic standard for data access” I cringed a
> little. (See [1].)
>
> Sure, it’s fun to create a new standard. But if your standard is
> successful, there will need to be a huge amount of work changing existing
> code to use your standard. That effort might even be difference between
> success and failure for a small project, and therefore you have helped
> protect the incumbents.
>
> My solution?
>
> I would like the FlightSQL authors to make clear that it is a wire
> protocol, and only a protocol.
>
> Rather than creating new APIs, I would like people to spend their effort
> implementing existing APIs (such as ODBC and JDBC) on top of FlightSQL.
>
> If those APIs are inadequate (e.g. they don’t provide access to the raw
> Arrow data, or don’t support INSERT or SELECT that are partitioned across
> several clients/servers), then add extensions to those APIs. But still
> implement the core APIs. When I describe a table from Java, I want to a
> result set that exactly matches JDBC’s getTables [2].
>
> I like Gavin’s idea of a data-frame API. But Gavin, if you want to make it
> successful, build it on top of the leading API in each language (which in
> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to expose
> through your API the fact that FlightSQL is underneath.
>
> Julian
>
> [1] https://xkcd.com/927/ <https://xkcd.com/927/>
>
> [2]
> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
> <
> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A->
>
>
>
> > On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> wrote:
> >
> > While trying to implement and introduce the idea of adopting FlightSQL,
> the
> > largest challenge was the API itself
> >
> > I know it's meant to be low-level. But I found that most of the
> development
> > time was in code to convert to/from
> > row-based data (IE Map<String, Object>) and Java types, and columnar
> data +
> > Arrow types.
> >
> > I'm likely in the minority position here -- I know that Arrow and
> FlightSQL
> > users are largely looking at transferring large volumes of data and
> > servicing OLAP-type workloads
> > But the thing that excites me most about FlightSQL, isn't its performance
> > (always nice to have), but that it's a language-agnostic standard for
> data
> > access.
> >
> > That has broad implications -- for all kinds of data-access workloads and
> > business usecases.
> >
> > The challenge is that in trying to advocate for it, when presenting a
> > proof-of-concept,
> > rather than what a developer might expect to see, something like:
> >
> > // FlightSQL handler code
> > List<Map<String, Object>> results = ....;
> > results.add(Map.of("id", 1, "name", "Person 1");
> > return results;
> >
> > A significant portion of the code is in Arrow-specific implementation
> > details:
> > creating a VectorSchemaRoot, FieldVector, de-serializing the results on
> the
> > client, etc.
> >
> > Just curious whether there is any interest/intention of possibly making a
> > higher level API around the basic FlightSQL one?
> > Maybe something closer to the traditional notion of a row-based
> "DataFrame"
> > or "Table", like:
> >
> > DataFrame df = new DataFrame();
> > df.addColumn("id", ArrowTypes.Int);
> > df.addColumn("name", ArrowTypes.VarChar);
> > df.addRow(Map.of("id", 1, "name", "Person 1"));
> > VectorSchemaRoot root = df.toVectorSchemaRoot();
> > listener.setVectorSchemaRoot(root);
> > listener.sendVectorSchemaRootContents();
>
>

Reply via email to