I may indeed be confused. I haven’t spent a lot of time looking at FlightSQL’s 
capabilities. Apologies for that.

My main point was to keep protocols, language-specific APIs (e.g. ODBC, JDBC), 
frameworks (e.g. Gavin’s proposed data frames system) decoupled from one 
another, and build on existing standards. I think those points were heard.

Julian


> On Mar 15, 2022, at 9:14 AM, James Duong <jam...@bitquilltech.com.INVALID> 
> wrote:
> 
> I could also see extensions to ODBC/JDBC being a point of confusion for app
> developers too.
> 
> For example, if we were to add hooks in the JDBC driver to report endpoints
> so that
> applications can call getStream() directly, what would happen if the user
> started getting
> a stream then went back and tried to use the regular ResultSet interface? A
> stream
> would be consumed, but the driver wouldn't know it.
> 
> On Tue, Mar 15, 2022 at 9:07 AM Kyle Porter <ky...@bitquilltech.com.invalid>
> wrote:
> 
>> In general, I have problems with attempting to expose other extensions
>> through existing standards such as ODBC/JDBC. What it feels like we're
>> saying is: use the standard so you don't have to change any code, except
>> for this part where you must write custom code to take advantage of the
>> non-standard portions.
>> 
>> At that point, why not just write something fully custom and take advantage
>> of the underlying interface?
>> 
>> The higher level clients are meant to ease adoption and may be all that
>> existing applications use, but new applications can have a choice to use
>> the higher level clients or the lower level interface.
>> 
>> *Kyle Porter*
>> CEO
>> Bit Quill Technologies Inc.
>> Office: +1.778.331.3355 | Direct: +1.604.441.7318 | ky...@bitquilltech.com
>> https://www.bitquill.com
>> 
>> This email message is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information.  Any unauthorized review,
>> use, disclosure, or distribution is prohibited.  If you are not the
>> intended recipient, please contact the sender by reply email and destroy
>> all copies of the original message.  Thank you.
>> 
>> 
>> On Tue, Mar 15, 2022 at 7:55 AM David Li <lidav...@apache.org> wrote:
>> 
>>> Aren't we getting a few things mixed up here?
>>> 
>>> 1) As Micah says, the original proposal is about adapting Java types to
>>> Arrow. This can be used independently of Flight SQL. I don't think this
>> was
>>> being pitched as a standard itself unless I'm mistaken?
>>> 
>>> 2) Flight SQL the protocol, which _is_ a language agnostic standard,
>>> though maybe not the one applications will generally choose to consume.
>>> 
>>> 3) Idiomatic/standard per-language APIs that build on Flight SQL, which
>>> will include JDBC/ODBC (there is a reference JDBC driver in the works
>> [1]),
>>> but I agree there's room for something that uses Arrow types, supports
>>> partitioning, etc. as well. (And I agree there's room for something that
>>> supports these features but is _not_ Flight SQL underneath.)
>>> 
>>> ---
>>> 
>>> I'm not super experienced with JDBC/ODBC - would extending them basically
>>> mean something like (in JDBC) providing interfaces that Connections,
>>> ResultSets, etc. could be cast to to access the "Arrow-native" bits? And
>> in
>>> ODBC, using something like the SQL_C_BINARY type to 'tunnel' Arrow data
>>> through ODBC buffers, and/or providing a set of C API functions that
>> could
>>> convert between (say) an ODBC statement handle and an Arrow C Data
>>> Interface ArrowArrayStream?
>>> 
>>> [1]: https://github.com/apache/arrow/pull/12254
>>> 
>>> -David
>>> 
>>> On Tue, Mar 15, 2022, at 01:06, Micah Kornfield wrote:
>>>> Hi Julian,
>>>> 
>>>> 
>>>>> I like Gavin’s idea of a data-frame API. But Gavin, if you want to
>> make
>>> it
>>>>> successful, build it on top of the leading API in each language (which
>>> in
>>>>> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to
>>> expose
>>>>> through your API the fact that FlightSQL is underneath.
>>>> 
>>>> 
>>>> My understanding is that this thread is all about implementing a Flight
>>>> server and making those ergonomics easier.  On the client side, I think
>>> the
>>>> power of Flight/FlightSQL is two fold:
>>>> 1.  Reference ODBC/JDBC drivers that can consume the wire format (and I
>>>> think many clients will go this route).  I think these are in the
>> process
>>>> of being contributed already.  Which as you noted there is power in
>>>> standards, so I expect this avenue to see heavy use.
>>>> 2.  For clients that can handle it and want to go through the trouble,
>>>> consuming the data directly as Arrow for efficiency purposes.   I don't
>>>> think we've discussed canonical APIs by extending ODBC/JDBC but I like
>>> that
>>>> idea.  That seems like a discussion for after we have working JDBC/ODBC
>>>> reference implementation though?
>>>> 
>>>> I might have missed it but I don't think either approach on the client
>>> side
>>>> has been discussed on this thread.  I also think this is why Dataframe
>>>> might not be the best name for the adapter because it comes with all
>>> sorts
>>>> of assumptions about usage both on a client and a server.
>>>> 
>>>> Cheers,
>>>> Micah
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com>
>>> wrote:
>>>> 
>>>>> When I read “language-agnostic standard for data access” I cringed a
>>>>> little. (See [1].)
>>>>> 
>>>>> Sure, it’s fun to create a new standard. But if your standard is
>>>>> successful, there will need to be a huge amount of work changing
>>> existing
>>>>> code to use your standard. That effort might even be difference
>> between
>>>>> success and failure for a small project, and therefore you have helped
>>>>> protect the incumbents.
>>>>> 
>>>>> My solution?
>>>>> 
>>>>> I would like the FlightSQL authors to make clear that it is a wire
>>>>> protocol, and only a protocol.
>>>>> 
>>>>> Rather than creating new APIs, I would like people to spend their
>> effort
>>>>> implementing existing APIs (such as ODBC and JDBC) on top of
>> FlightSQL.
>>>>> 
>>>>> If those APIs are inadequate (e.g. they don’t provide access to the
>> raw
>>>>> Arrow data, or don’t support INSERT or SELECT that are partitioned
>>> across
>>>>> several clients/servers), then add extensions to those APIs. But still
>>>>> implement the core APIs. When I describe a table from Java, I want to
>> a
>>>>> result set that exactly matches JDBC’s getTables [2].
>>>>> 
>>>>> I like Gavin’s idea of a data-frame API. But Gavin, if you want to
>> make
>>> it
>>>>> successful, build it on top of the leading API in each language (which
>>> in
>>>>> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to
>>> expose
>>>>> through your API the fact that FlightSQL is underneath.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1] https://xkcd.com/927/ <https://xkcd.com/927/>
>>>>> 
>>>>> [2]
>>>>> 
>>> 
>> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
>>>>> <
>>>>> 
>>> 
>> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A-
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>> While trying to implement and introduce the idea of adopting
>>> FlightSQL,
>>>>> the
>>>>>> largest challenge was the API itself
>>>>>> 
>>>>>> I know it's meant to be low-level. But I found that most of the
>>>>> development
>>>>>> time was in code to convert to/from
>>>>>> row-based data (IE Map<String, Object>) and Java types, and columnar
>>>>> data +
>>>>>> Arrow types.
>>>>>> 
>>>>>> I'm likely in the minority position here -- I know that Arrow and
>>>>> FlightSQL
>>>>>> users are largely looking at transferring large volumes of data and
>>>>>> servicing OLAP-type workloads
>>>>>> But the thing that excites me most about FlightSQL, isn't its
>>> performance
>>>>>> (always nice to have), but that it's a language-agnostic standard
>> for
>>>>> data
>>>>>> access.
>>>>>> 
>>>>>> That has broad implications -- for all kinds of data-access
>> workloads
>>> and
>>>>>> business usecases.
>>>>>> 
>>>>>> The challenge is that in trying to advocate for it, when presenting
>> a
>>>>>> proof-of-concept,
>>>>>> rather than what a developer might expect to see, something like:
>>>>>> 
>>>>>> // FlightSQL handler code
>>>>>> List<Map<String, Object>> results = ....;
>>>>>> results.add(Map.of("id", 1, "name", "Person 1");
>>>>>> return results;
>>>>>> 
>>>>>> A significant portion of the code is in Arrow-specific
>> implementation
>>>>>> details:
>>>>>> creating a VectorSchemaRoot, FieldVector, de-serializing the results
>>> on
>>>>> the
>>>>>> client, etc.
>>>>>> 
>>>>>> Just curious whether there is any interest/intention of possibly
>>> making a
>>>>>> higher level API around the basic FlightSQL one?
>>>>>> Maybe something closer to the traditional notion of a row-based
>>>>> "DataFrame"
>>>>>> or "Table", like:
>>>>>> 
>>>>>> DataFrame df = new DataFrame();
>>>>>> df.addColumn("id", ArrowTypes.Int);
>>>>>> df.addColumn("name", ArrowTypes.VarChar);
>>>>>> df.addRow(Map.of("id", 1, "name", "Person 1"));
>>>>>> VectorSchemaRoot root = df.toVectorSchemaRoot();
>>>>>> listener.setVectorSchemaRoot(root);
>>>>>> listener.sendVectorSchemaRootContents();
>>>>> 
>>>>> 
>>> 
>> 
> 
> 
> -- 
> 
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jam...@bitquilltech.com
> https://www.bitquilltech.com
> 
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information.  Any unauthorized review,
> use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.

Reply via email to