I may indeed be confused. I haven’t spent a lot of time looking at FlightSQL’s capabilities. Apologies for that.
My main point was to keep protocols, language-specific APIs (e.g. ODBC, JDBC), frameworks (e.g. Gavin’s proposed data frames system) decoupled from one another, and build on existing standards. I think those points were heard. Julian > On Mar 15, 2022, at 9:14 AM, James Duong <jam...@bitquilltech.com.INVALID> > wrote: > > I could also see extensions to ODBC/JDBC being a point of confusion for app > developers too. > > For example, if we were to add hooks in the JDBC driver to report endpoints > so that > applications can call getStream() directly, what would happen if the user > started getting > a stream then went back and tried to use the regular ResultSet interface? A > stream > would be consumed, but the driver wouldn't know it. > > On Tue, Mar 15, 2022 at 9:07 AM Kyle Porter <ky...@bitquilltech.com.invalid> > wrote: > >> In general, I have problems with attempting to expose other extensions >> through existing standards such as ODBC/JDBC. What it feels like we're >> saying is: use the standard so you don't have to change any code, except >> for this part where you must write custom code to take advantage of the >> non-standard portions. >> >> At that point, why not just write something fully custom and take advantage >> of the underlying interface? >> >> The higher level clients are meant to ease adoption and may be all that >> existing applications use, but new applications can have a choice to use >> the higher level clients or the lower level interface. >> >> *Kyle Porter* >> CEO >> Bit Quill Technologies Inc. >> Office: +1.778.331.3355 | Direct: +1.604.441.7318 | ky...@bitquilltech.com >> https://www.bitquill.com >> >> This email message is for the sole use of the intended recipient(s) and may >> contain confidential and privileged information. Any unauthorized review, >> use, disclosure, or distribution is prohibited. If you are not the >> intended recipient, please contact the sender by reply email and destroy >> all copies of the original message. Thank you. >> >> >> On Tue, Mar 15, 2022 at 7:55 AM David Li <lidav...@apache.org> wrote: >> >>> Aren't we getting a few things mixed up here? >>> >>> 1) As Micah says, the original proposal is about adapting Java types to >>> Arrow. This can be used independently of Flight SQL. I don't think this >> was >>> being pitched as a standard itself unless I'm mistaken? >>> >>> 2) Flight SQL the protocol, which _is_ a language agnostic standard, >>> though maybe not the one applications will generally choose to consume. >>> >>> 3) Idiomatic/standard per-language APIs that build on Flight SQL, which >>> will include JDBC/ODBC (there is a reference JDBC driver in the works >> [1]), >>> but I agree there's room for something that uses Arrow types, supports >>> partitioning, etc. as well. (And I agree there's room for something that >>> supports these features but is _not_ Flight SQL underneath.) >>> >>> --- >>> >>> I'm not super experienced with JDBC/ODBC - would extending them basically >>> mean something like (in JDBC) providing interfaces that Connections, >>> ResultSets, etc. could be cast to to access the "Arrow-native" bits? And >> in >>> ODBC, using something like the SQL_C_BINARY type to 'tunnel' Arrow data >>> through ODBC buffers, and/or providing a set of C API functions that >> could >>> convert between (say) an ODBC statement handle and an Arrow C Data >>> Interface ArrowArrayStream? >>> >>> [1]: https://github.com/apache/arrow/pull/12254 >>> >>> -David >>> >>> On Tue, Mar 15, 2022, at 01:06, Micah Kornfield wrote: >>>> Hi Julian, >>>> >>>> >>>>> I like Gavin’s idea of a data-frame API. But Gavin, if you want to >> make >>> it >>>>> successful, build it on top of the leading API in each language (which >>> in >>>>> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to >>> expose >>>>> through your API the fact that FlightSQL is underneath. >>>> >>>> >>>> My understanding is that this thread is all about implementing a Flight >>>> server and making those ergonomics easier. On the client side, I think >>> the >>>> power of Flight/FlightSQL is two fold: >>>> 1. Reference ODBC/JDBC drivers that can consume the wire format (and I >>>> think many clients will go this route). I think these are in the >> process >>>> of being contributed already. Which as you noted there is power in >>>> standards, so I expect this avenue to see heavy use. >>>> 2. For clients that can handle it and want to go through the trouble, >>>> consuming the data directly as Arrow for efficiency purposes. I don't >>>> think we've discussed canonical APIs by extending ODBC/JDBC but I like >>> that >>>> idea. That seems like a discussion for after we have working JDBC/ODBC >>>> reference implementation though? >>>> >>>> I might have missed it but I don't think either approach on the client >>> side >>>> has been discussed on this thread. I also think this is why Dataframe >>>> might not be the best name for the adapter because it comes with all >>> sorts >>>> of assumptions about usage both on a client and a server. >>>> >>>> Cheers, >>>> Micah >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Mar 14, 2022 at 9:38 PM Julian Hyde <jhyde.apa...@gmail.com> >>> wrote: >>>> >>>>> When I read “language-agnostic standard for data access” I cringed a >>>>> little. (See [1].) >>>>> >>>>> Sure, it’s fun to create a new standard. But if your standard is >>>>> successful, there will need to be a huge amount of work changing >>> existing >>>>> code to use your standard. That effort might even be difference >> between >>>>> success and failure for a small project, and therefore you have helped >>>>> protect the incumbents. >>>>> >>>>> My solution? >>>>> >>>>> I would like the FlightSQL authors to make clear that it is a wire >>>>> protocol, and only a protocol. >>>>> >>>>> Rather than creating new APIs, I would like people to spend their >> effort >>>>> implementing existing APIs (such as ODBC and JDBC) on top of >> FlightSQL. >>>>> >>>>> If those APIs are inadequate (e.g. they don’t provide access to the >> raw >>>>> Arrow data, or don’t support INSERT or SELECT that are partitioned >>> across >>>>> several clients/servers), then add extensions to those APIs. But still >>>>> implement the core APIs. When I describe a table from Java, I want to >> a >>>>> result set that exactly matches JDBC’s getTables [2]. >>>>> >>>>> I like Gavin’s idea of a data-frame API. But Gavin, if you want to >> make >>> it >>>>> successful, build it on top of the leading API in each language (which >>> in >>>>> Java would be FlightSQL’s JDBC driver). I don’t see a good reason to >>> expose >>>>> through your API the fact that FlightSQL is underneath. >>>>> >>>>> Julian >>>>> >>>>> [1] https://xkcd.com/927/ <https://xkcd.com/927/> >>>>> >>>>> [2] >>>>> >>> >> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A- >>>>> < >>>>> >>> >> https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html#getTables-java.lang.String-java.lang.String-java.lang.String-java.lang.String:A- >>>> >>>>> >>>>> >>>>> >>>>>> On Mar 12, 2022, at 12:14 PM, Gavin Ray <ray.gavi...@gmail.com> >>> wrote: >>>>>> >>>>>> While trying to implement and introduce the idea of adopting >>> FlightSQL, >>>>> the >>>>>> largest challenge was the API itself >>>>>> >>>>>> I know it's meant to be low-level. But I found that most of the >>>>> development >>>>>> time was in code to convert to/from >>>>>> row-based data (IE Map<String, Object>) and Java types, and columnar >>>>> data + >>>>>> Arrow types. >>>>>> >>>>>> I'm likely in the minority position here -- I know that Arrow and >>>>> FlightSQL >>>>>> users are largely looking at transferring large volumes of data and >>>>>> servicing OLAP-type workloads >>>>>> But the thing that excites me most about FlightSQL, isn't its >>> performance >>>>>> (always nice to have), but that it's a language-agnostic standard >> for >>>>> data >>>>>> access. >>>>>> >>>>>> That has broad implications -- for all kinds of data-access >> workloads >>> and >>>>>> business usecases. >>>>>> >>>>>> The challenge is that in trying to advocate for it, when presenting >> a >>>>>> proof-of-concept, >>>>>> rather than what a developer might expect to see, something like: >>>>>> >>>>>> // FlightSQL handler code >>>>>> List<Map<String, Object>> results = ....; >>>>>> results.add(Map.of("id", 1, "name", "Person 1"); >>>>>> return results; >>>>>> >>>>>> A significant portion of the code is in Arrow-specific >> implementation >>>>>> details: >>>>>> creating a VectorSchemaRoot, FieldVector, de-serializing the results >>> on >>>>> the >>>>>> client, etc. >>>>>> >>>>>> Just curious whether there is any interest/intention of possibly >>> making a >>>>>> higher level API around the basic FlightSQL one? >>>>>> Maybe something closer to the traditional notion of a row-based >>>>> "DataFrame" >>>>>> or "Table", like: >>>>>> >>>>>> DataFrame df = new DataFrame(); >>>>>> df.addColumn("id", ArrowTypes.Int); >>>>>> df.addColumn("name", ArrowTypes.VarChar); >>>>>> df.addRow(Map.of("id", 1, "name", "Person 1")); >>>>>> VectorSchemaRoot root = df.toVectorSchemaRoot(); >>>>>> listener.setVectorSchemaRoot(root); >>>>>> listener.sendVectorSchemaRootContents(); >>>>> >>>>> >>> >> > > > -- > > *James Duong* > Lead Software Developer > Bit Quill Technologies Inc. > Direct: +1.604.562.6082 | jam...@bitquilltech.com > https://www.bitquilltech.com > > This email message is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. Any unauthorized review, > use, disclosure, or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy > all copies of the original message. Thank you.